Parsing date – a simple example

Problem: Let’s say we have a data like the below image, it can be anything, from IOT sensors data to something like the below expense log.

If the data has day, month and year as separate columns, as shown below, how to read this kind of data log with pandas and create date as the index.

Solution:

import pandas as pd
data = pd.read_csv('EXP2018.csv',parse_dates={'date':['YY','MM','DD']}, index_col='date')

Yes use pd.todatetime functionally after the read fro this and other non-standard parsing, but this one is elegant.

Date parsing in pandas is super useful. Here are some other options that can be considered for parsing dates with pandas.

parse_dates : boolean or list of ints or names or list of lists or dict, default False

* boolean. If True -> try parsing the index.

* list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3

each as a separate date column.

* list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as

a single date column.

* dict, e.g. {'foo' : [1, 3]} -> parse columns 1, 3 as date and call result

'foo'


If a column or index contains an unparseable date, the entire column or

index will be returned unaltered as an object data type. For non-standard

datetime parsing, use ``pd.to_datetime`` after ``pd.read_csv``

Note: A fast-path exists for iso8601-formatted dates.

infer_datetime_format : boolean, default False



If True and parse_dates is enabled, pandas will attempt to infer the format

of the datetime strings in the columns, and if it can be inferred, switch

to a faster method of parsing them. In some cases this can increase the

parsing speed by 5-10x.



keep_date_col : boolean, default False

If True and parse_dates specifies combining multiple columns then

keep the original columns.



date_parser : function, default None

Function to use for converting a sequence of string columns to an array of

datetime instances. The default uses ``dateutil.parser.parser`` to do the

conversion. Pandas will try to call date_parser in three different ways,

advancing to the next if an exception occurs: 1) Pass one or more arrays

(as defined by parse_dates) as arguments; 2) concatenate (row-wise) the

string values from the columns defined by parse_dates into a single array

and pass that; and 3) call date_parser once for each row using one or more

strings (corresponding to the columns defined by parse_dates) as arguments.



dayfirst : boolean, default False

DD/MM format dates, international and European format

 

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s