VZ
animation control ru en
 

dotNetCrawler is a web robot which loads pages from web sites and analyzes they content.
Robot scans loaded page, collects links on a page and then loads and scans pages pointed by these links.
This process can be endless, and because of that there are several restrictions:
dotNetCrawler scans only desired host and only pages which are located in the context (web directory), set for this host.
Amount of files to scan also is limited.
Result of scanning is a database of email addresses which are using only personally.
Pages which are result of scanning will be accessible only several hours after scanning.
They will be deleted from database by special process afterwards.

Crawler

Web interface of program is made using JSF 2.0, RESTful web services with JSON and AJAX with YUI JavaScrypt library.
In the java code of robot itself are used several open source Apache components for uploading and parsing web pages, and standard SAX parser as well.

 

Copyright© 2004-2009 Vadims Zemlanojs
e-mail:webadmin@tenplanets.net