VEGA 1-1060-04

VEGA 1-1060-04

-> Home page -> Research

Document Classification and Annotation for the Semantic Web

Project summary:

The project is focused on design and evaluation of methods for text document annotation using metadata, which define what the documents are about in a machine processable way. The focus is on exploitation of domain theories represented as ontologies whose components can be used to annotate documents. In connection with this the project aims at:

Document classification using machine learning methods
Using natural language processing methods for document annotation
Annotation based on employment of lexical databases
Abstract generation
Document re-annotation as a result of a domain theory change

The project copes also with creation of ontologies applicable for document annotation. In connection with this field the focus is on:

Automatic generation of ontological models based on text document collections
Ontology modification using text mining methods

Key words:

Document classification and annotation, domain knowledge modelling, ontology creation, natural language processing, machine learning, text mining

Project participants:

Marian Mach - project leader
Sabol Tomas
Paralic Jan - project viceleader
Kende Robert
Hreno Jan
Machova Kristina
Hudak Slavomir
Bednar Peter
Kostial Ivan
Sarnovsky Martin
Mraz Miroslav
Babic Frantisek
Smatana Peter
Rockai Viliam

Annotation of project resuls:

Design of various methods for increasing efficiency of text document classification (using Bayesian networks, reduction of number of documents) and text document clustering (controlled initialisation, attribute oriented induction).
Extraction of key terms from documents, relations among terms, phrases, and synonym identification using statistical methods and the theory of associative concept learning.
Creation of hierarchical concept models based on clustering and fuzzy formal conceptual analysis and their use for document content annotation.
Transformation of unstructured documents into structured ones using regular and linguistic analyses.
Java library for development of text mining applications. It provides facilities for text analysis as well as for building, evaluating and applying of various methods for supervised and unsupervised learning.
Implementation of a service for text document classification in a grid environment provided by GridMiner.
Method for creation of dedicated text collections from web sources, suggesting alternative documents based on user stereotypes.


Copyright © MM	Last updated 17.8.2009

Document Classification and Annotation for the Semantic Web

Project summary:

Key words:

Project participants:

Annotation of project resuls:

Copyright © MM

Last updated 17.8.2009