Little Experiments with BeautifulSoup

Look at any conference related to python, web scraping is one of the topic that is always there.

This prompted me to BeautifulSoup. I am toying with it for few months now, but yesterday got a real chance to use it.

Here’s a small function that I used yesterday to scrape out all the test match and one day match results from espn website. Someone needed the data.

import urllib2
from bs4 import BeautifulSoup
import csv
def cricketScrap(url,f):
    page = urllib2.urlopen(url).read()
    soup = BeautifulSoup(page)
    table = soup.find('table', attrs={'class': 'engineTable'})
    x = (len(table.findAll('tr')) )
    if x==1:
    for row in table.findAll('tr')[1:x]:
        col = row.findAll('td')
        team1 = col[0].getText()
        team2 = col[1].getText()
        winner = col[2].getText()
        margin = col[3].getText()
        ground = col[4].getText()
        match_date = col[5].getText()
        match = (team1, team2, winner, margin, ground, match_date )

Here’s how to use it.

For one day match


lis=["Team 1","Team 2","Winner","Margin","Ground","Match Date"]
f = csv.writer(open("oneindiancricketscrape.csv", "w"))

for year in range(1974,2015):

And here’s the interesting one, scape all results of the matches played between 1772 to 2014


lis=["Team 1","Team 2","Winner","Margin","Ground","Match Date"]
f = csv.writer(open("allCricketMatchResults1900.csv", "w"))

for year in range(1772,2015):

And if you are just interested in one day and test matches results played by india till now, then click the following links to download the csv files.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s