a blog for those who code

Wednesday 6 January 2016

Web Scraping Libraries for Node.js Developers

In this post we will be discussing about some of the Node.js Scraping Libraries for developers. Web scraping is a technique of extracting information from websites. As the volume of data on the web has increased, this practice has become increasingly widespread, and a number of powerful services have emerged to simplify it.

If you manually want to scrap the web in Node.js then check Web Scraping in Node.js. Here we will list out some of the Node.js scraping libraries which you can use to scrap the web.

1. Ineed

Web scraping and HTML-reprocessing, the easy way. ineed allows you to collect useful data from web pages using simple and nice API. It can also be used to build HTML-reprocessing pipelines with elegance.


2. X-ray

Node and Xray have made web scraping a really simple affair. To know more about X-ray check this video.


3. Noodle

Noodle is a Node.js server and module for querying and scraping data from web documents.


4. Phantomjs-node

It is a PhantomJS bridge for NodeJS.


5. Osmosis

It is a HTML/XML parser and web scraper for NodeJS.


6. Yakuza

Yakuza is a heavy-weight, highly-scalable framework for scraping projects. Whether you are building small or massive scrapers, yakuza will keep your code clean, ordered and under control.


Please Like and Share the CodingDefined.com Blog, if you find it interesting and helpful.

3 comments:

  1. 7. goose-parser: https://github.com/redco/goose-parser - really simple and powerful

    ReplyDelete
  2. Fantastic resource! I didn't know most of these existed until now.

    ReplyDelete
  3. Nice list. When websites do not provide api then web scraping is the only way to get the data. I am doing scraping and here is my website to look at : http://prowebscraping.com

    ReplyDelete