Home > ruby > firebug + hpricot

firebug + hpricot

October 1st, 2008 Leave a comment Go to comments

Following code is trying to scrape the content in a webpage, the content cannot be picked by the scraping software tool I used:
require ‘rubygems’
require ‘hpricot’
require ‘open-uri’

url = “http://homemsg.focus.cn/msgview/607/50129006.html”
doc = Hpricot(open(url))
td_contents = (doc/”/html/body/table[8]/tbody/tr/td[4]/table[2]/tbody/tr/td”).inner_html
puts td_contents

It did not work, there must be something wrong.

by firebug, I can copy the xpath(/html/body/table[8]/tbody/tr/td[4]/table[2]/tbody/tr/td), and copy its innerHTML.



Categories: ruby Tags: ,
  1. No comments yet.
  1. No trackbacks yet.
      
                  
      

*
To prove you're a person (not a spam script), type the security word shown in the picture. Click on the picture to hear an audio file of the word.
Click to hear an audio file of the anti-spam word