Archive

Posts Tagged ‘hpricot’

Gems for web scrape

October 23rd, 2009 No comments

I just put the names of the gems here for future reference:
hpricot
nokogiri
mechanize
webrat
scrubyt

Categories: watir Tags: , , , ,

firebug + hpricot

October 1st, 2008 No comments

Following code is trying to scrape the content in a webpage, the content cannot be picked by the scraping software tool I used:
require ‘rubygems’
require ‘hpricot’
require ‘open-uri’

url = “http://homemsg.focus.cn/msgview/607/50129006.html”
doc = Hpricot(open(url))
td_contents = (doc/”/html/body/table[8]/tbody/tr/td[4]/table[2]/tbody/tr/td”).inner_html
puts td_contents

It did not work, there must be something wrong.

by firebug, I can copy the xpath(/html/body/table[8]/tbody/tr/td[4]/table[2]/tbody/tr/td), and copy its innerHTML.

Categories: ruby Tags: ,