Data scraping in Ruby on Rails using Nokogiri and Mechanize Gem
Website/Data Scraping is a technique to operating large amounts of data from websites whereby the data is extracted and displayed in own sites or it can be stored to a File/Database. Data scraping is basically used where the websites does not provides API.
Some Applications do not provide API to collect records. For the same , Data Scraping technique is used.
The data can be scraped using Nokogiri Gem.
The steps are required:
The controller of the page will look like below:
The view of the code of view page will look like :
The result in our application will look like:
Mechanize Gem in rails
The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, and can follow links and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history.
For the above site ,I have used Mechanize gem to scrap the data or search the record.
We are having the following Sample application running on the local
The steps required are:
The controller code to scrap the data using mechanize gem for search:
The output of the above scraping as would be seen on the console:
By using the mechanize gem we can select radio button as given below:
staff_data.page.forms.radiobutton_with(:id => “First_drop”).check
By using the mechanize gem we can see all input fields in the form.
By using the mechanize gem we can also select the drop down value of the site.
staff_data.page.forms.field_with(:id => “country”).value = “Single”
By using the mechanize gem we can find the form content of the site.
form = staff_data.page.form_with(:id => “search_form”)
By using the mechanize gem we can find the button as given.
button = form.button_with(:value => “Search”)
In the Mechanize gem, link_with method is available to make it simpler to fetch the random record link .
link = staff_data.link_with(text: ‘Random article’)
In the mechanize gem the click method instructs mechanize to follow the link
page = link.click
By using the mechanize gem we can find the page title of the site.