wiki:WordNetFormat

DEBVisDic XML format

DEBVisDic XML format is based on the format of the previous offline system VisDic. The VisDic XML format is described below. The main difference is that some values, e.g. literal sense number, are converted from tags to attributes. For example:

 <SYNONYM>
  <LITERAL sense="1">entity</LITERAL>
 </SYNONYM>

The complete DEBVisDic XML format is decribed with this DEBVisDic XML schema. You can convert XML in VisDic format to DEBVisDic format using this XSLT template http://deb.fi.muni.cz/vis2deb.xslt

VisDic XML format

XML files in VisDic consist of tags and their values. Value of the TAG tag is enclosed in strings <TAG> and </TAG>. Tags can be nested, which means that each tag can contain another tag. White characters like spaces, tabs and new-lines at the start or at the end of each tag value are trimmed. However, XML files parsed by VisDic are quite different from the common ones in these points:

  • XML dictionaries contain entries. Each entry is in fact represented by one small XML file. There is no tag enclosing the whole dictionary.
  • XML tag has no attributes.

Example of one Wordnet synset:

<SYNSET>
 <ID>ENG21-00001740-n</ID>
 <POS>n</POS>
 <SYNONYM>
  <LITERAL>entity<SENSE>1</SENSE></LITERAL>
 </SYNONYM>
 <DEF>that which is perceived or known or inferred to have its own distinct existence (living or nonliving)</DEF>
 <BCS>2</BCS>
 <DOMAIN>factotum</DOMAIN>
</SYNSET>

Tags and their values:

  • ID: unique synset identification
  • POS: Part of speech (n=noun, v=verb, a=adjective, b=adverb)
  • SYNONYM: synonyms
  • LITERAL: one literal
  • SENSE: sense number
  • DEF: definition
  • BCS: Common Base Concepts set number
  • DOMAIN: synset domain

The VisDic XML format is described by this VisDic DTD.

Last modified 8 years ago Last modified on Oct 19, 2011 11:02:35 AM