Pandas XLS File Read Error

Recently had an issue while reading an xls file with pandas. Pandas was complaining about corrupt excel file. Opening it in excel showed no issue.

data = pd.read_excel("stock_cons.xls")                                                                     
_locate_stream(Workbook): seen
    0  5 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 
   20  4 4 4 4 4 4 4 4 4 4 4 3 2 
....

    430             self.seen[s] = seen_id
    431             tot_found += 1

CompDocError: Workbook corruption: seen[3] == 4

Solution:

Check xlrd version. Get xlrd vers 2.0.0 or above and then use the following

workbook = xlrd.open_workbook_xls("stock_cons.xls", ignore_workbook_corruption=True)                       

data = pd.read_excel(workbook)

Explaination

pandas read_excel function takes a xlrd workbook as an input, so open the workbook with xlrd with the ignore_workbook_corruption as true.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s