Version 4 (modified by 15 years ago) (diff) | ,
---|
VisDic XML format
XML files in VisDic consist of tags and their values. Value of the TAG tag is enclosed in strings <TAG> and </TAG>. Tags can be nested, which means that each tag can contain another tag. White characters like spaces, tabs and new-lines at the start or at the end of each tag value are trimmed. However, XML files parsed by VisDic are quite different from the common ones in these points:
- XML dictionaries contain entries. Each entry is in fact represented by one small XML file. There is no tag enclosing the whole dictionary.
- XML tag has no attributes.
Example of one Wordnet synset:
<SYNSET> <ID>ENG21-00001740-n</ID> <POS>n</POS> <SYNONYM> <LITERAL>entity<SENSE>1</SENSE></LITERAL> </SYNONYM> <DEF>that which is perceived or known or inferred to have its own distinct existence (living or nonliving)</DEF> <BCS>2</BCS> <DOMAIN>factotum</DOMAIN> </SYNSET>
Tags and their values:
- ID: unique synset identification
- POS: Part of speech (n=noun, v=verb, a=adjective, b=adverb)
- SYNONYM: synonyms
- LITERAL: one literal
- SENSE: sense number
- DEF: definition
- BCS: Common Base Concepts set number
- DOMAIN: synset domain
DEBVisDic XML format
DEBVisDic XML is almost the same as VisDic XML. Only difference is that literal sense number is converted from tags to attributes. For example:
<SYNONYM> <LITERAL sense="1">entity</LITERAL> </SYNONYM>
You can convert XML in VisDic format to DEBVisDic format using this XSLT template http://deb.fi.muni.cz/vis2deb.xslt