Data scraping in Ruby on Rails using Nokogiri and Mechanize Gem

26 Mar 2018
0
Comments
Data scraping in Ruby on Rails using Nokogiri and Mechanize Gem

What is Data scraping?

Website/Data  Scraping is a technique to operating large amounts of data from websites whereby the data is extracted and displayed in own sites or it can be stored to a File/Database. Data scraping is basically used where the websites does not provides API.

Some Applications do not provide API to collect records. For the same , Data Scraping technique is used.

The data can be scraped using Nokogiri Gem.

The steps are required:

  • Add the gem “gem ‘nokogiri’, ‘~> 1.8’, ‘>= 1.8.1'”.
  • Then run the bundle install
  • Add the “require ‘nokogiri'”, “require ‘open-uri'” line where you will write the code for the scraping.

The controller of the page will look like below:

The view of the code of view page will look like :

The result in our application will look like:

Mechanize Gem in rails

The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, and can follow links and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history.

For the above site ,I have used Mechanize gem to scrap the data or search the record.

We are having the following Sample application running on the local

The steps required are:

  • Add the gem “gem ‘mechanize’, ‘~> 2.7’, ‘>= 2.7.5’“.
  • Then run the bundle install
  • Add require ‘mechanize’ in the controller.

The controller code to scrap the data using mechanize gem for search:

The output of the above scraping as would be seen on the console:

By using the mechanize gem we can select radio button as given below:

staff_data.page.forms[0].radiobutton_with(:id => “First_drop”).check

By using the mechanize gem we can see all input fields in the form.

staff_data.page.forms[0].fields

By using the mechanize gem we can also select the drop down value of the site.

staff_data.page.forms[0].field_with(:id => “country”).value = “Single”

By using the mechanize gem we can find the form content of the site.

form = staff_data.page.form_with(:id => “search_form”)

By using the mechanize gem we can find the button as given.

button = form.button_with(:value => “Search”)

In the Mechanize gem, link_with method is available to make it simpler to fetch the random record link .

link = staff_data.link_with(text: ‘Random article’)

In the mechanize gem the click method instructs mechanize to follow the link

page = link.click

By using the mechanize gem we can find the page title of the site.

staff_data.page.title




Leave a comment:

Recent Blogs
Contact-Us Contact-Us
Newsletter Newsletter