Welcome to LDA-frames

LDA-frames is an unsupervised approach to identifying semantic frames from semantically unlabelled text corpora. There are many frame formalisms but most of them suffer from the problem that all frames must be created manually and the set of semantic roles must be predefined. The LDA-Frames approach, based on the Latent Dirichlet Allocation, avoids both these problems by employing statistics on a syntactically tagged corpus. The only information that should be given is a number of semantic frames and a number of semantic roles to be identified. This limitation, however, can be avoided by automatic estimation of both these parameters, which is solved by Non-parametric LDA-Frames.

In the LDA-frames, a frame is represented as a tuple of semantic roles, each of them connected with a grammatical relation i.e. subject, object, modifier, etc. These frames are related to a predicate via a probability distribution. Every semantic role is represented as a probability distribution over its realizations.

The method of automatic identification of semantic frames is based on the probabilistic generative process. We treat each grammatical relation realization as generated from a given semantic frame according to the word distribution of the corresponding semantic role. Supposing the number of frames is given by the parameter F and the number of semantic roles by R. The realizations are generated by the LDA-Frames algorithm as follows.

For each semantic role r ∈ {1, 2,...,R}:

  • Choose a frame distribution φu from Dir(α).
  • For each lexical unit realization t ∈ {1, 2,..., T} choose a frame fu,t from Multinomial(φu), where fu,t ∈ {1, 2,..., F}.
  • For each slot s ∈ {1, 2,..., S} of the frame fu,t:
    • Choose a grammatical realization wu,t,s from Multinomial(θrfs).
The graphical model for LDA-Frames is shown in the figure below. The model is parametrized by the hyperparameters of prior distributions α and β, usually set by hand to a value between 0.01 — 0.1.

If you are interested in more details or examples, see the LDA-frames paper or my Ph.D. thesis.