snippet: grab webpage and save it for future traversing
Posted by jacqui maher on March 04, 2008 at 03:47 PM
I use this method to grab a webpage and then access it later. It’s useful if you’re trying to scrape a page, for example, and don’t want to keep hitting it over the network. Note that I use this as part of a class which initializes the variable @local_dir at instance creation.
1 def get_local_or_remote(id, webpage_uri_format)
2 html = ""
3 if File.exists? "#{@local_dir}/#{id}.html"
4 html = File.read("#{@local_dir}/#{id.html")
5 else
6 webpage_uri = if webpage_uri_format.match(/:id/)
7 webpage_uri_format.gsub(":id", id)
8 else
9 webpage_uri_format
10 end
11
12 html = insist(:delay => 60) do
13 Net::HTTP.get_response(URI.parse(webpage_uri)).body
14 end
15
16 stfile = File.new("#{@local_dir}/#{id}.html", "wb")
17 stfile.puts(html)
18 stfile.close
19 end
20
21 return html
22 end