This site is the support site of the HomePage Corpus and the Annotation Tool of the corpus. This corpus is a manually and extensively annotated corpus for Web Content Mining. It is freely available for research purposes. We developed an Annotation Tool, which is a Firefox extension which allows the annotator to work with the pages in their original appearance. This tool handles the annotation hierarchy independently of the DOM tree of the web pages, and it allows overlapped annotation between the HTML tags. For more details, please read our article.
You can download the corpus here.
Statistics 26/06/2008 show
|Annotation Tool as a Firefox extension||download|
|Firefox Portable with Annotation Tool||download|