<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>blogging and scraping &#187; firebug</title>
	<atom:link href="http://www.tsnpc.com/tag/firebug/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.tsnpc.com</link>
	<description></description>
	<lastBuildDate>Mon, 12 Jul 2010 22:54:43 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>firebug + hpricot</title>
		<link>http://www.tsnpc.com/firebughpricot/</link>
		<comments>http://www.tsnpc.com/firebughpricot/#comments</comments>
		<pubDate>Wed, 01 Oct 2008 12:24:29 +0000</pubDate>
		<dc:creator></dc:creator>
				<category><![CDATA[ruby]]></category>
		<category><![CDATA[firebug]]></category>
		<category><![CDATA[hpricot]]></category>

		<guid isPermaLink="false">http://www.tsnpc.com/?p=48</guid>
		<description><![CDATA[Following code is trying to scrape the content in a webpage, the content cannot be picked by the scraping software tool I used: require &#8216;rubygems&#8217; require &#8216;hpricot&#8217; require &#8216;open-uri&#8217; url = &#8220;http://homemsg.focus.cn/msgview/607/50129006.html&#8221; doc = Hpricot(open(url)) td_contents = (doc/&#8221;/html/body/table[8]/tbody/tr/td[4]/table[2]/tbody/tr/td&#8221;).inner_html puts td_contents It did not work, there must be something wrong. by firebug, I can copy the [...]]]></description>
			<content:encoded><![CDATA[<p>Following code is trying to scrape the content in a webpage, the content cannot be picked by the scraping software tool I used:<br />
require &#8216;rubygems&#8217;<br />
require &#8216;hpricot&#8217;<br />
require &#8216;open-uri&#8217;</p>
<p>url = &#8220;http://homemsg.focus.cn/msgview/607/50129006.html&#8221;<br />
doc = Hpricot(open(url))<br />
td_contents = (doc/&#8221;/html/body/table[8]/tbody/tr/td[4]/table[2]/tbody/tr/td&#8221;).inner_html<br />
puts td_contents</p>
<p>It did not work, there must be something wrong.</p>
<p>by firebug, I can copy the xpath(/html/body/table[8]/tbody/tr/td[4]/table[2]/tbody/tr/td), and copy its innerHTML.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.tsnpc.com/firebughpricot/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
