animation controlruen
dotNetCrawler is a web robot which loads pages from web sites and analyzes they content.
Robot scans loaded page, collects links on a page and then loads and scans pages pointed by these links.
This process can be endless, and because of that there are several restrictions:
dotNetCrawler scans only desired host and only pages which are located in the context (web directory), set for this host.
Amount of files to scan also is limited.
Pages which are result of scanning will be accessible only several hours after scanning.
They will be deleted from database by special process afterwards.

Crawler dotNetCrawler

Web interface of program is made using JSF 2.0, RESTful web services with JSON and AJAX with YUI JavaScrypt library.
In the java code of robot itself are used several open source Apache components for uploading and parsing web pages, and standard SAX parser as well.


Copyright© 2004-2014 Vadims Zemlanojs