Progress so far (March, 2004)
User and functional requirements
Selection and assessment of contextualisation data
Development of the VICODI
Development of Contextualisation Mechanisms
1st Prototype of VICODI system
Promotion and Awareness
Until March 2004 most part of the VICODI system development activities
have been carried out. On the basis of system specifications (delivered in
March, 2003), the 1st prototype Knowledge portal was developed. It includes Contextualisation Engine first prototype, which supports the contextualisation of text resources and generation of test context estimates and test LATCH results. It includes Java Contextualization Java Engine server (JCE) infrastructure to provide additional future context functionalities and the CCE C++context engine server for potential services; annotation and context tool sub-components. Transformation Engine first iteration was also created.
Final iteration of VICODI ontology was nearly comleted. We also finished the
MSKS (Management System of the Knowledge space), which provides an API to open-source KAON framework to work with VICODI ontology stored within the PostgreSQL. The MSKS ontology, resource management and search modules were partially implemented, to allow storage and retrieval of the textual VICODI resources and ontology instances. The main Ontology development activities were carried out through the ontology editor, which is a stand alone JAVA GUI application, and may be started using Java Web Start or installed locally on client machines.
First iteration MT (Machine Translation) for other language pairs based on historian mini-worlds (thematic sample content collections for ontology and MT development) was implemented. Also the first prototype of English>Latvian substitution engine was created.
User and functional requirements of VICODI system
Identifying user requirements was the first major task of VICODI team. The initial set of requirements based on user interface mock-ups was discussed already at the kick off meeting.
Also at the kick-off workshop, the technical group established an initial set of technical platform requirements, including choice of operating systems and implementation languages. During the first four month of the project, partners indicated the importance of each requirement by its ranking, using the scale "must", "should", "desirable", "not relevant".
After identifying user requirements partners draw the functional requirements of the visual contextualisation system. During this phase partners carried out analysis and UML use cases, initial ontology requirements relating to the contextualisation engine, template
After the review meeting (Dec.,2003), the User requirements were
changed from static to dynamic.
Selection and assessment of contextualisation data
Historical websites systematically identified and assessed for inclusion in VICODI on basis of D8.1 criteria.
The final historical data (2000+ documents) has been identifed,
assessed and ontologised.
Development of the VICODI Ontology
The main purpose of the history ontology for the VICODI project is to help machine algorithms in the automatic contextualisation task by storing relevant historical knowledge in machine processable form. In order to achieve this goal an ontology with a well-defined formal semantics is needed. The task of devising an ontology of history is very daunting. On the one hand, it is always challenging to build an ontology covering a broad and very complex area of knowledge. On the other hand, history has several unique features which are problematic from an ontological point of view.
Examination and rejection of existing thesauri and glossaries as potential sources of relevant information for VICODI ontology was done. Scope and nature of European history was defined and agreed in D8.1. For the purposes of ontology development, competency questions were listed. The creation of initial three mini-worlds was set-out as objective, which was implemented by content partners within 2003.
We agreed about the use of highlighted keywords and keyword relations from miniworlds to design initial basic Ontology framework. The Ontology design was developed by experimental processing of various instances, concepts/sub-concepts and property relations examples from the miniworlds. Partners held in-depth discussions to resolve problematic issues of time and location dependencies.
The ontology editor has been in use to work on the ontology. It has been customized for VICODI and more customizations are planned. It is based on the general OIModeler of the KAON ontology management framework. The historians use the production ontology, while the developers use copies of the production ontology for testing.
The complexity of history is immense and requires an almost unlimited number of instances and property relations. To complicate matters historians do not focus only on “what” questions but also on “when”, “where”, “who”, “how” and most importantly on “why” questions.
Historical time is uncertain and often debated. It includes many unknown dates, imprecise intervals (ca., approximately from to etc.), and overlapping time (historical periods and events extending into each other without clear start and end dates). Moreover, many ontology relations are time-dependent.
Historical Sources: there are no comprehensive and large-scale thesauri of history. During the evaluation of related works in the area we realised that existing approaches to historical ontologies were not suitable for our purposes. Some used non-formal, "intuitive" taxonomies, which mixed various, semantically different hierarchical relationships ("is-a", "part-of", "member-of") which made them unsuitable for machine processing (like Hassett or the UNESCO thesaurus). Others covered only a tiny area of history (like the Getty location names) which was too limited for our goals. Finally, the CIDOC CRM ontology standard has a formal conceptual hierarchy. However it is too complex and inflexible for our domain experts to fill it with the necessary domain knowledge (instances) which it does not presently contain.A
Complexity: We use a shallow concept hierarchy starting from only six basic concepts (called flavours), which are meaningful for domain/history experts: person, artefact, group, event, abstract notion and location. The hierarchy below these concepts is shallow (2-3 levels), stops at an abstraction level which is already
meaningful for historians, but is still general enough to make the place of new instances in the ontology easy to find, which speeds up the population of the ontology with new historical knowledge. The complexity of history is represented by connecting instances of these flavours by various property
Historical Time: To deal with the complexity of time we have interval times and an event centric ontology. This means that instances with a time-dependent relation are connected using an event with an existence time which represents the validity of that connection. For the VICODI prototype the intervals are precisely defined, although a novel fuzzy temporal model has now been devised (reference) and its use is being explored for future follow-on projects.
Historical sources: To get round the lack of general repositories we decided to build our own ontology of history based upon our empirical deductive analysis of a 2000 document corpus.
High-level architecture of the VICODI system was established. This included creation of the initial and final components, UML component architecture and deployment diagrams. Investigation of application framework and application server was done. It resulted in the choice of Expresso.
Functionalities of system components were described as follows:
(1) Contextualisation engine and CETools. CETools are used also by many components; (2) Transformation Engine; (3) Annotation tool; (4) metadata analysis; (5) auxiliary database table requirements; encoding, language, etc.; (6) programming and development guidelines; (7) Machine translation discussion with Systran and analysis of usage in VICODI. Preparation of System Spec.; (8) Web component (interface descriptions) and Expresso framework description.
Public interfaces for Contextualisation engine and CETools, Transformation Engine, Annotation tool, helper interfaces for Machine translation and user management were described and specified.
Development of Contextualisation Mechanisms
As of March 2003 the following achievements were made:
Support for automatically identifying the ontology instances associated with the text documents by matches made to the ontology labels. Support for annotation references, annotation objects, initial test context descriptions. Support for the Transformation engine to match ontology instances within the text document and ontology instances inferred from the text. Support for application workspaces that wrap the CE functionalities and APIs.
Support for context estimate infrastructure (JCE - Java Context Engine), including initial tools for context estimate, query and manipulation; context descriptor classes and MSKS/CE common contextualisation descriptors. Context filtering includes basic, but not final, similarity measures for context descriptions, ultimately, these are used by search engines. A test response is provided for context estimate and LATCH data.
First iteration of the transformation engine for text was implemented. LATCH result (test response) and transformation result for text for use in the 1st prototype. Ontology labels are matched within the texts. The infrastructure is there to continue the other LATCH result functionalities
VICODI Mini-worlds from the historians are training data prior to the formal content selection of WP3. Mini-words help to validate the whole procedure, methodology, workflow and say that the 2000 documents are the test data.
The next iteration for creating MT dictionaries depends on getting the 2000 documents from the historians in WP3 - content selection; therefore the mini-worlds were the first step. Analysis of historian mini-world content and creation of initial dictionaries for language pairs.
Historian MiniWords content has been analyzed. Not Found Words have been Extracted. After having normalised the above data, respective customization dictionaries have been created for the Vicodi lps, en<>de, en<>fr.
First iteration of the Latvian substitution engine. The substitution engine concerns only English to Latvian.
Initial Knowledge gathering, authoring, annotation tool functionalities using the CE, MSKS, tools, and web application user interfaces was carried out. Since the knowledge portal requires integration of all web application functionalities this IR requires integration activities especially since different partners were involved.
Includes CE Context annotation model workspaces and CE General contextualisation model workspaces (to display contextualised text) for Annotation web application. Context application (model) workspaces - these are to be refined also during software integration with the web application controllers depending upon additional needs.
Web interfaces are available and are integrated with the other components and sub-components during WP7 T7.3 integration. Some test data is need to simulate functionalities that are for the next iteration.
1st Prototype of
Concluding the first year of project activities the 1st VICODI
prototype was prepared. It had been developed as Java web application. It is a MVC based Application built within the Java based Expresso Application and Architecture Framework from http://www.jcorporate.com. Expresso extends Apache Jakarata Struts.
The system operates as:
- RedHat Linux server 9
- Postgress SQL
- JBOSS server 3.0
- Tomcat web server
- Expresso Framework 5.3RC3
- KAON - ontology management framework
The Contextualisation Engine (CE) and MSKS subsystems offer their functionalities to Transformation Engine (TE) Web applications and also interact between each other. For the above-mentioned tools, a web app application controllers were created to use the CE, TE, MSKS APIs.
The first VICODI prototype includes interfaces of all components implemented in java code.
The first prototype of the VICODI portal's Web application consists of the following controllers: IndexController, ContextController, UserUploadController, ExperAnnotationController.
Contextualisation Engine first prototype was implemented as Contextualization Java Engine (CJE) to provide core context functionalities, supporting API's, including communications APIs with other system
Transformation Engine first iteration implements the core of the text transformation. Text transformation options include either preserving existing hyperlinks as links or as hyperlinked icons. Likewise contextualised links can be represented as either hyperlinked key terms in the text or as hyperlinked icons adjacent to the text.
CE native service, or CCE (remote C++Context Engine server) has been developed and initial implementation of JCE to CCE communications protocol was implemented.
MSKS first prototype provides an API to open-source KAON framework to work with VICODI ontology stored within the PostgreSQL. The MSKS ontology, resource management and search modules are partially implemented, to allow storage and retrieval of the textual VICODI resources and ontology instances. The ontology editor is standalone JAVA GUI application, and may be started using Java Web Start.
Machine Translation Server is accessible remotely for the system and hosted by Systran. The HTML code fragment translation is available for English, French and German languages.
Promotion and Awareness
All the VICODI partners carried
out their dissemination activities aimed at promotion and awareness raising about the novel approach of visual contextualisation developed by VICODI. In month 3 (November, 2002) these activities were coordinated on the basis of Dissemination Plan produced by RIDemo.
VICODI project web site (www.vicodi.org) was created in October, 2002. Partners put a specific link to the VICODI web site on their own web sites.
Active project promotion activities were carried out via the specific professional fields of VICODI partners. Several partners took the advantage of promoting VICODI
in conferences. For example, at "Hist2003" (Berlin, DE) - SRFG, "Inforum 2003" (Prague, CZ) - RIDEMO, NKP, "DOA/ODBASE" (Catania, IT) - FZI.
NKP has published an article concerning a potential use of the VICODI results in the historical librarianship: Uhlrih, Zdenek: Technické obrazy a transformance kodikologie a bibliologie [The technical images and transformation of codicology and bibliology]. Narodni knihovna. Vol. 14, 2003, nr. 2, pp. 114-127).
A booklet containing general information about VICODI was published and distributed by the partners. Much of the dissemination and promotion activities were carried out through the project website www.vicodi.org which contains information about the progress of VICODI, as well as a specific work-area, which was successfully used to manage user involvement, especially for the assessment and evaluation purposes of various elements of VICODI system.