Outlook Calendar Entries for a Day with Python

For me it is not the mobile. It is the email, which is the most distraction application in office.

The variable reward system has become a habit. I tried closing it but then I keep missing meetings etc.

So had to keep it open just so the reminders for meetings show up.

Only reason I need the outlook is to have a day view why not print it. Bad option, see the already printed stack on my desk. Why not get all the events out for a day and use a text file on the desktop.

Therefore, that is what I did.

Here’s the python function to get the appointments of the day from outlook calendar.

import win32com.client, datetime

def getCalendarEntries(days=1):
    """
    Returns calender entries for days default is 1
    """
    Outlook = win32com.client.Dispatch("Outlook.Application")
    ns = Outlook.GetNamespace("MAPI")
    appointments = ns.GetDefaultFolder(9).Items
    appointments.Sort("[Start]")
    appointments.IncludeRecurrences = "True"
    today = datetime.datetime.today()
    begin = today.date().strftime("%m/%d/%Y")
    tomorrow= datetime.timedelta(days=days)+today
    end = tomorrow.date().strftime("%m/%d/%Y")
    appointments = appointments.Restrict("[Start] >= '" +begin+ "' AND [END] <= '" +end+ "'")
    events={'Start':[],'Subject':[],'Duration':[]}
    for a in appointments:
        adate=datetime.datetime.fromtimestamp(int(a.Start))
        #print a.Start, a.Subject,a.Duration
        events['Start'].append(adate)
        events['Subject'].append(a.Subject)
        events['Duration'].append(a.Duration)
    return events

The functions spits out a dict and can be easily converted used in any way one wants.

Sweet! Let's see if am able to kill outlook now.

Advertisements

Mosaic Plot in Python

I am sure everyone one of us has seen charts like this.

Any management training you attend, a version of this chart is bound to sneak up in the presentation, often in lecture notes or hands on activity.

These charts are a good representation of categorical entries. A mosaic plot allows visualizing multivariate categorical data in a rigorous and informative way. Click here to read A Brief History of the Mosaic Display [PDF]

Here’s a quick example how to plot mosaic in python

from statsmodels.graphics.mosaicplot import mosaic
import matplotlib.pyplot as plt
import pandas

gender = ['male', 'male', 'male', 'female', 'female', 'female']
pet = ['cat', 'dog', 'dog', 'cat', 'dog', 'cat']
data = pandas.DataFrame({'gender': gender, 'pet': pet})
mosaic(data, ['pet', 'gender'])
plt.show()

 

Here’s another example from the tips dataset.


from statsmodels.graphics.mosaicplot import mosaic
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')
mosaic(tips, ['sex','smoker','time'])
plt.show()

 

Timeline in Python with Matplotlib

Was documenting a series of events and after the list was complete, thought of showing them on a timeline.

After an hour two, fiddling with python standard libraries this is what I had.

Format of the data


from theflile import GenerateTimeLine
import pandas as pd
data = pd.read_csv(r'events.txt', parse_dates=True, index_col=0)
ax = GenerateTimeLine(data)
plt.show()

The code as always available at this GitHub

Detect Outliers with Matplotlib

Problem: You have a huge large multivariate data and want to get list of outliers?

Outlier detection is a significant statistical process and lot of theory under pining but there is a simple, quick way to do this is using the Inter-quartile (IQR) rule.

Read the linked PDF for a simple example summary

Solution:

bloxplot_stats from matplotlib.cbook

Returns list of dictionaries of statistics

Here is a quick example

From matplotlib.cbook import boxplot_stats
st = boxplot_stats(data.AMT)
outliers = st[0]["fliers"]

I like this, because it is quick and does not need any external libraries apart from matplotlib.

Q in Statsmodels

If you are coming from R, you will love the formula API of statsmodels that work in a similar way.

I love it and been using it for quite some time since last year.

It’s good to test quick regression and GLM models and works on pandas dataframe which, at least in my case, are the basic unit where data is captured.

Most real life data is like this.

Created to be read by humans, so hyphens, spaces in name is common. However, this is a major issue if you need to use them in the formulae.

My old approach was to rename the column names with no special characters and gaps. This worked but messy.

data.rename(columns={'Max crack size': 'maxcracksize'},inplace=True)

Recently I discovered Q, a helper function from the patsy library, which the formula API of statsmodels uses.

Just wrap your formula names with Q if they have gaps of something.

Here is an example usage with Q

import statsmodels.formula.api as smf
import pandas as pd
data = pd.read_csv(fname, index_col=0, usecols=[0,1,2])
model  = smf.ols('Q("Max crack size") ~ CSN',data =data).fit()

I would say this is very convenient. No need of the unnecessarily and complex pre-processing code.

Numpy Arrays to Numpy Arrays Records

Recently working with hdf5 format and the datasets in the format requires the arrays to be records.

What does that mean?

NumPy provides powerful capabilities to create arrays of structured datatype. These arrays permit one to manipulate the data by named fields.

One defines a structured array through the dtype object. Creating them from list is simple and take the below form

Example:

x = np.array([(1,2.,'Hello'), (2,3.,"World")],
             <span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span>dtype=[('foo', 'i4'),('bar', 'f4'), ('baz', 'S10')])
print x['foo']
array([1, 2])
print x['faz']
array(['Hello', 'World'],dtype='|S10')

But the problem with my code was that I already had arrays. So how to convert numpy arrays to numpy records arrays.

Numpy.core.records.fromarray to the rescue.

Example:

data = np.random.randn(15).reshape(5,3)
rec_data  = np.core.records.fromarrays(data.T, names=['a','b','c'])

Notice the transpose. That is required.

Cross Validation Score with Statsmodels

Recently was working with a limited set of data. Using statsmodels , employed a regression model on the data.

To test the confidence in the model needed to do cross validation. The solution that immediately sprang to mind was the cross_val_score function from sci-kit learn library.

However, that function is not applicable on statsmodels object.

Solution: wrap sklearn base estimators on statsmodels objects and then use the model.

Here is the code for wrapping a sklearn baseestimators over statsmodels objects.

from sklearn.base import BaseEstimator, RegressorMixin
import statsmodels.formula.api as smf
import statsmodels.api as sm

class statsmodel(BaseEstimator, RegressorMixin):
    def __init__(self, sm_class, formula):
        self.sm_class = sm_class
        self.formula = formula
        self.model = None
        self.result = None

    def fit(self,data,dummy):
        self.model = self.sm_class(self.formula,data)
        self.result = self.model.fit()

    def predict(self,X):
        return self.result.predict(X)

Notice the dummy in the fit function, the regression api for sklearn needs X and y and for this implementation I was relying on the formula API of statsmodels, so had to add a dummy variable.

Here’s a quick example on how to use it.


from sklearn import linear_model
from sklearn.model_selection import cross_val_score

# Get data
ccard = sm.datasets.ccard.load_pandas()
print (ccard.data.head())

# create a model
clf = statsmodel(smf.ols, "AVGEXP ~ AGE + INCOME")

# Print cross val score on this model
print (cross_val_score(clf, ccard.data, ccard.data['AVGEXP']))

# Same thing on sklearn's linear regression model
lm = linear_model.LinearRegression()

print (cross_val_score(lm , ccard.data.iloc[:,1:3].values, ccard.data.iloc[:,0].values))

I love sklearn. Convenient, efficient and consistent API can make things so much easier.