Archive

Posts Tagged ‘firebug’

firebug + hpricot

October 1st, 2008 No comments

Following code is trying to scrape the content in a webpage, the content cannot be picked by the scraping software tool I used:
require ‘rubygems’
require ‘hpricot’
require ‘open-uri’

url = “http://homemsg.focus.cn/msgview/607/50129006.html”
doc = Hpricot(open(url))
td_contents = (doc/”/html/body/table[8]/tbody/tr/td[4]/table[2]/tbody/tr/td”).inner_html
puts td_contents

It did not work, there must be something wrong.

by firebug, I can copy the xpath(/html/body/table[8]/tbody/tr/td[4]/table[2]/tbody/tr/td), and copy its innerHTML.

Categories: ruby Tags: ,