<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: How To: Use Python to Collect Data from the Web &#8211; Part 1: Parsing Data</title>
	<atom:link href="http://www.drewconway.com/zia/?feed=rss2&#038;p=1037" rel="self" type="application/rss+xml" />
	<link>http://www.drewconway.com/zia/?p=1037</link>
	<description>How can the social sciences, mathematics and computer science combine to affect national security policy?</description>
	<lastBuildDate>Mon, 06 Sep 2010 14:57:18 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: Drew Conway</title>
		<link>http://www.drewconway.com/zia/?p=1037&#038;cpage=1#comment-2331</link>
		<dc:creator>Drew Conway</dc:creator>
		<pubDate>Tue, 01 Dec 2009 14:32:04 +0000</pubDate>
		<guid isPermaLink="false">http://www.drewconway.com/zia/?p=1037#comment-2331</guid>
		<description>Since this post is relatively old the likely explanation is that NFL.com has changed its web formatting.  This will cause the script to go looking for data where it no longer exists; hence, resulting in an IndexError.

I suggest working with the code as is, but inspecting the HTML to see how the new indexing works.</description>
		<content:encoded><![CDATA[<p>Since this post is relatively old the likely explanation is that NFL.com has changed its web formatting.  This will cause the script to go looking for data where it no longer exists; hence, resulting in an IndexError.</p>
<p>I suggest working with the code as is, but inspecting the HTML to see how the new indexing works.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: satyam mukherjee</title>
		<link>http://www.drewconway.com/zia/?p=1037&#038;cpage=1#comment-2330</link>
		<dc:creator>satyam mukherjee</dc:creator>
		<pubDate>Tue, 01 Dec 2009 13:15:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.drewconway.com/zia/?p=1037#comment-2330</guid>
		<description>hi nice information. I am new to Python. I tried to run the code myself but landed up with a following error :

Traceback (most recent call last):
  File &quot;players-trial.py&quot;, line 53, in ?
    player_urls=get_player_profiles(players)
  File &quot;players-trial.py&quot;, line 37, in get_player_profiles
    search_url=&quot;http://www.nfl.com/players/search?category=name&amp;filter=&quot;+names[0]+&quot;+&quot;+names[1]+&quot;&amp;playerType=current&amp;team=3410&quot;
IndexError: list index out of range

Can any one help me in this regard :-)
thanks in advance</description>
		<content:encoded><![CDATA[<p>hi nice information. I am new to Python. I tried to run the code myself but landed up with a following error :</p>
<p>Traceback (most recent call last):<br />
  File &#8220;players-trial.py&#8221;, line 53, in ?<br />
    player_urls=get_player_profiles(players)<br />
  File &#8220;players-trial.py&#8221;, line 37, in get_player_profiles<br />
    search_url=&#8221;http://www.nfl.com/players/search?category=name&amp;filter=&#8221;+names[0]+&#8221;+&#8221;+names[1]+&#8221;&amp;playerType=current&amp;team=3410&#8243;<br />
IndexError: list index out of range</p>
<p>Can any one help me in this regard <img src='http://www.drewconway.com/zia/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /><br />
thanks in advance</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: How Python can Turn the Internet into your Dataset: Part 1 &#124; Computational Legal Studies</title>
		<link>http://www.drewconway.com/zia/?p=1037&#038;cpage=1#comment-1107</link>
		<dc:creator>How Python can Turn the Internet into your Dataset: Part 1 &#124; Computational Legal Studies</dc:creator>
		<pubDate>Wed, 01 Jul 2009 23:05:55 +0000</pubDate>
		<guid isPermaLink="false">http://www.drewconway.com/zia/?p=1037#comment-1107</guid>
		<description>[...] over at Zero Intelligence Agents has gotten off to a great start with his first two tutorials on collecting and managing web data with Python.  However, critics of such automated collection might argue that [...]</description>
		<content:encoded><![CDATA[<p>[...] over at Zero Intelligence Agents has gotten off to a great start with his first two tutorials on collecting and managing web data with Python.  However, critics of such automated collection might argue that [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: How to Use Python to Collect Data from the Web [From Drew Conway] &#124; Computational Legal Studies</title>
		<link>http://www.drewconway.com/zia/?p=1037&#038;cpage=1#comment-1090</link>
		<dc:creator>How to Use Python to Collect Data from the Web [From Drew Conway] &#124; Computational Legal Studies</dc:creator>
		<pubDate>Wed, 01 Jul 2009 04:24:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.drewconway.com/zia/?p=1037#comment-1090</guid>
		<description>[...] wanted to highlight a couple of very interesting posts by Drew Conway of Zero Intelligence Agents. While not simple, the programming language python offers significant returns upon investment. [...]</description>
		<content:encoded><![CDATA[<p>[...] wanted to highlight a couple of very interesting posts by Drew Conway of Zero Intelligence Agents. While not simple, the programming language python offers significant returns upon investment. [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Drew Conway</title>
		<link>http://www.drewconway.com/zia/?p=1037&#038;cpage=1#comment-1080</link>
		<dc:creator>Drew Conway</dc:creator>
		<pubDate>Tue, 30 Jun 2009 21:35:48 +0000</pubDate>
		<guid isPermaLink="false">http://www.drewconway.com/zia/?p=1037#comment-1080</guid>
		<description>Clem, chalk that up to tutorial-esque redundancy, thanks though, for being the police.</description>
		<content:encoded><![CDATA[<p>Clem, chalk that up to tutorial-esque redundancy, thanks though, for being the police.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Clem Kadiddlehopper</title>
		<link>http://www.drewconway.com/zia/?p=1037&#038;cpage=1#comment-1077</link>
		<dc:creator>Clem Kadiddlehopper</dc:creator>
		<pubDate>Tue, 30 Jun 2009 19:41:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.drewconway.com/zia/?p=1037#comment-1077</guid>
		<description>Was &quot;comman-delimited spreadsheet&quot; supposed to be a joke, or do you not know what csv stands for?</description>
		<content:encoded><![CDATA[<p>Was &#8220;comman-delimited spreadsheet&#8221; supposed to be a joke, or do you not know what csv stands for?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Drew Conway</title>
		<link>http://www.drewconway.com/zia/?p=1037&#038;cpage=1#comment-1073</link>
		<dc:creator>Drew Conway</dc:creator>
		<pubDate>Tue, 30 Jun 2009 13:32:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.drewconway.com/zia/?p=1037#comment-1073</guid>
		<description>Paul,

Thanks, I had never seen YQL before, but it does seem like a very convenient way to parse HTML (as long as you have some familiarity with SQL commands).

There are so many ways to approach this problem, and it is fun to think about them all!</description>
		<content:encoded><![CDATA[<p>Paul,</p>
<p>Thanks, I had never seen YQL before, but it does seem like a very convenient way to parse HTML (as long as you have some familiarity with SQL commands).</p>
<p>There are so many ways to approach this problem, and it is fun to think about them all!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Tarjan</title>
		<link>http://www.drewconway.com/zia/?p=1037&#038;cpage=1#comment-1065</link>
		<dc:creator>Paul Tarjan</dc:creator>
		<pubDate>Tue, 30 Jun 2009 05:16:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.drewconway.com/zia/?p=1037#comment-1065</guid>
		<description>I usually use YQL (which in turn uses html tidy) to parse webpages. It gives and E4X interface over the website, which I find is really nice for data scraping.

For example :

http://yourock.paulisageek.com/yql/trueachievements.xml

use &#039;http://yourock.paulisageek.com/yql/trueachievements.xml&#039;; SELECT * FROM trueachievements WHERE url = &#039;trueachievements.com/ptarjan.html&#039;;

http://query.yahooapis.com/v1/public/yql?q=use%20&#039;http%3A%2F%2Fyourock.paulisageek.com%2Fyql%2Ftrueachievements.xml&#039;%3B%20SELECT%20*%20FROM%20trueachievements%20WHERE%20url%20%3D%20&#039;trueachievements.com%2Fptarjan.html&#039;%3B&amp;format=xml</description>
		<content:encoded><![CDATA[<p>I usually use YQL (which in turn uses html tidy) to parse webpages. It gives and E4X interface over the website, which I find is really nice for data scraping.</p>
<p>For example :</p>
<p><a href="http://yourock.paulisageek.com/yql/trueachievements.xml" rel="nofollow">http://yourock.paulisageek.com/yql/trueachievements.xml</a></p>
<p>use &#8216;http://yourock.paulisageek.com/yql/trueachievements.xml&#8217;; SELECT * FROM trueachievements WHERE url = &#8216;trueachievements.com/ptarjan.html&#8217;;</p>
<p><a href="http://query.yahooapis.com/v1/public/yql?q=use%20" rel="nofollow">http://query.yahooapis.com/v1/public/yql?q=use%20</a>&#8216;http%3A%2F%2Fyourock.paulisageek.com%2Fyql%2Ftrueachievements.xml&#8217;%3B%20SELECT%20*%20FROM%20trueachievements%20WHERE%20url%20%3D%20&#8242;trueachievements.com%2Fptarjan.html&#8217;%3B&amp;format=xml</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Drew Conway</title>
		<link>http://www.drewconway.com/zia/?p=1037&#038;cpage=1#comment-1064</link>
		<dc:creator>Drew Conway</dc:creator>
		<pubDate>Tue, 30 Jun 2009 01:35:23 +0000</pubDate>
		<guid isPermaLink="false">http://www.drewconway.com/zia/?p=1037#comment-1064</guid>
		<description>andrix,

I considered doing this tutorial with scrapy, and I agree that it is immensely useful, but I thought it was not the right tool for this particular problem.

For tasks that require persistent scrapping, like downloading stock price data, etc., it would be perfect.  For these &quot;one-shot&quot; downloads this combination of tools is better.  Maybe I will do a scrapy tutorial in the future.

Thanks for the suggestion!</description>
		<content:encoded><![CDATA[<p>andrix,</p>
<p>I considered doing this tutorial with scrapy, and I agree that it is immensely useful, but I thought it was not the right tool for this particular problem.</p>
<p>For tasks that require persistent scrapping, like downloading stock price data, etc., it would be perfect.  For these &#8220;one-shot&#8221; downloads this combination of tools is better.  Maybe I will do a scrapy tutorial in the future.</p>
<p>Thanks for the suggestion!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: andrix</title>
		<link>http://www.drewconway.com/zia/?p=1037&#038;cpage=1#comment-1063</link>
		<dc:creator>andrix</dc:creator>
		<pubDate>Tue, 30 Jun 2009 01:21:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.drewconway.com/zia/?p=1037#comment-1063</guid>
		<description>I&#039;ve recommend you to use &lt;a href=&quot;http://scrapy.org&quot; title=&quot;Scrapy&quot; rel=&quot;nofollow&quot;&gt;Scrapy&lt;/a&gt;. Scrapy is an excellent framework for make Screen Scraping. As the scrapy.org says
 
&lt;cite&gt;
Scrapy is a high level scraping and web crawling framework for writing spiders to crawl and parse web pages for all kinds of purposes, from information retrieval to monitoring or testing web sites.
&lt;/cite&gt;

Try it, you will not regret!</description>
		<content:encoded><![CDATA[<p>I&#8217;ve recommend you to use <a href="http://scrapy.org" title="Scrapy" rel="nofollow">Scrapy</a>. Scrapy is an excellent framework for make Screen Scraping. As the scrapy.org says</p>
<p><cite><br />
Scrapy is a high level scraping and web crawling framework for writing spiders to crawl and parse web pages for all kinds of purposes, from information retrieval to monitoring or testing web sites.<br />
</cite></p>
<p>Try it, you will not regret!</p>
]]></content:encoded>
	</item>
</channel>
</rss>
