Gems for web scrape
October 23rd, 2009
No comments
I just put the names of the gems here for future reference:
hpricot
nokogiri
mechanize
webrat
scrubyt
I just put the names of the gems here for future reference:
hpricot
nokogiri
mechanize
webrat
scrubyt
Following code is trying to scrape the content in a webpage, the content cannot be picked by the scraping software tool I used:
require ‘rubygems’
require ‘hpricot’
require ‘open-uri’
url = “http://homemsg.focus.cn/msgview/607/50129006.html”
doc = Hpricot(open(url))
td_contents = (doc/”/html/body/table[8]/tbody/tr/td[4]/table[2]/tbody/tr/td”).inner_html
puts td_contents
It did not work, there must be something wrong.
by firebug, I can copy the xpath(/html/body/table[8]/tbody/tr/td[4]/table[2]/tbody/tr/td), and copy its innerHTML.