Lexical Knowledge Representation

EJB In the mid-1980s, Stuart Shieber had initiated his programme of rational reconstruction of syntactic theory and you were very engaged and involved with that at the time. And from that, I guess, DATR emerged. Can we talk about the intellectual processes that you went through that led you down the road to DATR?

GG Yes, I'm game for that. In 1985, it seemed to me that the interesting new descriptive questions that had been thrown up by the various unification grammars all related to the lexicon. There were some old descriptive questions like ``how do you do unbounded dependencies?'', ``how do you do passive?'', but I was bored with those. On the other hand, there were interesting technical questions about how lexicons could be organized so as to capture both irregularity and subregularity whilst minimizing redundancy, and there were concepts and terminology floating around at the time like ``lexical rules'' and so forth. Regrettably, generative linguists have never really written lexicons. The only exception that I can think of is the appendix to that big blue Stockwell , Schachter & Partee volume (1973) which reports a major project from several years earlier that had hired a bunch of linguists to produce a formal transformational grammar of English. The appendix is a listing of the lexicon that came out of that project. It has detailed entries for most function words plus some representative content word exemplars.

EJB So, we're thinking here of something from the sixties?

GG Yes, as far as I know, that was the last time a bit of mainstream linguistics had actually exhibited a lexicon. Linguists talked about lexicons a fair bit over the years but, for a couple of decades, you never actually caught sight of one. Building lexicons was perceived as grunt work. A lexicon was a list of words. It was boring. What excited linguists was lexical rules - but I found much of that discussion either incoherent or half-baked since it was conducted in the absence of any nontrivial theory of lexical knowledge representation.

NLPpeople typically build systems so they have no option but to construct lexicons. And that, in turn, forces them to adopt some theory of lexical knowledge representation. By the mid-1980s, this was a real problem. For example, think about the difference between GPSG and HPSG with respect to subcategorization information. The GPSG lexicon is pretty dull: verbs are associated with sets of integers and the integers are, in effect, the names of sets of phrase structure rules. HPSG, by contrast, dispenses with those rules and pushes all the information they contained into the individual lexical entries via the feature structures. If that is all that it does, then an HPSG lexicon becomes hugely redundant, which the GPSG lexicon is not. The other issue that had begun to fascinate me was subregularity. This occurs in syntax but it is pervasive in morphology. Generative linguists have rarely had much to say about it - they have preferred to partition linguistic phenomena into those that are subsumed by a single exceptionless rule and those that are wholly idiosyncratic. So redundancy and subregularity in the lexicon had become, for me, the key theoretical issues and I thought the best way of tackling both of them was to use semantic nets.

EJB Semantic nets in the KLONE tradition (Brachman 1979)?

GG That tradition, yes.

EJB Although to me, it seems like there's another parallel thread, which is the one which leads from PATR (Shieber 1986) through to DATR. In PATR, some of the rational reconstruction of what was going on in LFG, GPSG and so on was done. Also in PATR, as I think you've just hinted, there was no real or proper treatment of lexical rules, just an acknowledgement that one needed something and a step outside the entire formalism to provide it.

GG Yes, the abbreviatory macros, or ``templates'' as Shieber called them, gave you monotonic inheritance at no semantic cost. But, as Shieber himself points out, the extensions to PATR that permit default inheritance and lexical rules are procedural and lead to order of application problems

PATR was influential on DATR, but not, I think, in a logical reconstruction sense. With regard to lexical knowledge representation, there really wasn't anything much to reconstruct in 1986. There was a need to replace PATR templates with something descriptively more powerful and better able to capture partial generalizations but there wasn't something already out there that you could rebuild more elegantly or on a better semantic basis. What we pinched from PATR , most obviously, was the concrete syntax of DATR. I had always thought that the concrete syntax of PATR was kind of cute - and there was so little of it, which added to its appeal. More important, really, was the philosophy of PATR, which was to have a minimalist language and basically be as semantically abstemious as possible. Which GPSG had not been. By the time the book was finished, there were about eight different formal mechanisms all sitting there and I didn't want to get into anything like that ever again. Roger and I just wanted to have something very austere. And so we named our new language as a humorous tribute to Stuart.

EJB So, to me, there is a sea change here. If one of the threads though your research career is formalization - there is a big change at this point because formalization stopped being the linguistic theory notion of formalization, which I suppose derives mostly from Montague, and suddenly becomes formalization much more in the sense of PATR. What you call austere, I would describe as implementable. Is there some truth to that too? In so far as GPSG made it into implementations, that was an after the fact thing, whilst with the work on lexical inheritance, it was done via an implemented theory from day one.

GG Oh, yes. The possibility and practicality of implementation was central, although it sometimes proved an irritation because people would associate the particular implementation that we had at Sussex as being somehow definitive of the language. Roger and I have always fought to resist this construal: we have taken the view that the formal specification of the language is what we invented and are in the business of promoting. Roger's freely available Prolog implementation serves just as a demonstration that the language can be implemented, and as a bonus for people who wish to use the language but don't fancy implementing it themselves. We've always been extremely keen to see other implementations, in part so that people didn't fall into the trap of thinking that DATR was this bit of code that Roger and I had over in Sussex. It was the design of the formal language that we were engaged in and that reflected the switch in my own mindset of not being a linguist any longer but being a computational linguist.

EJB Not only was it austere in the sense that there weren't so many different mechanisms around, there's also a lot more emphasis on a proof-theoretic specification of DATR than there was before (Evans & Gazdar 1989a. The semantics of GPSG was described in model theoretic terms.

GG Well, we have always had a model theoretic semantics for DATR . But we didn't do it very well in 1989 (Evans & Gazdar 1989b). Bill Keller (1995; 1996) subsequently did it better, but it's always been there.

EJB Although you are ultimately going to justify your proof theoretic rules in terms of some model theoretic interpretation, it's often a lot more convenient for people to know what the proof theory is because the inference rules make it much easier to figure out what's going on.

GG Oh yes, absolutely. And it is much easier to implement if you have the proof theory sitting there.

EJB So, ultimately, was that kind of approach to formalization better and more useful? Were you learning lessons about how to do this? Or is it more really about being a computational linguist?

GG Well, I'm not sure what the contrast is. I just did it.

EJB The contrast really is that implementation of the GPSG and pragmatics formalizations were either very very hard or impossible.

GG But we were just engaged in a different enterprise. In defining DATR, we weren't describing anything. There was no external object. With the implicature and presupposition stuff, there was an external object which the formalization was attempting to model. Likewise with GPSG. With DATR we were building a tool - a formal language that ourselves and others could subsequently use to characterize external objects like the morphology of Latin, or English subcategorization, or whatever.

EJB I'm not sure I completely buy that. Aren't you trying to formalize notions like suppletion or blocking or the elsewhere condition? I mean it's at one removed from ``here's the data about how Latin conjugations work'' or whatever. But you're still formalizing a linguistic intuition which exists at a sort of metalevel rather than at a direct empirical level.

GG Well, sure, we were defining the language for lexical knowledge representation not for arithmetic calculation or manipulation of list structures or whatever. But the way DATR should be judged is by reference to its suitability for describing those natural lexical phenomena. An analogy would be to ask whether Fortran is good language for doing arithmetic in. DATR itself does not make any sort of truth claim, whereas the pragmatics work made a truth claim. It may have been false, but it did make a claim.

EJB And GPSG made a claim because it was a restricted formalism?

GG Yes, the overall framework made a claim. And a crucial component of that claim turned out to be false. In addition, the individual analyses that we built within the framework also made truth claims.

EJB And in DATR the framework ultimately makes less of a claim because it is Turing-equivalent. Is that the line?

GG I don't think DATR itself makes any claim, whether or not it is Turing-equivalent (my agnosticism on the Turing-equivalence issue is due to the fact that there is currently no refereed publication that gives a proof of such an equivalence). The progenitors of DATR make the claim that the language is good at doing the things it was designed to do. But it's only a design issue, not a truth issue. Here are our design goals - we think DATR meets them.

EJB Do you want to talk more about that? Because I actually find that profoundly hard to get. I don't think that the formal apparatus of GPSG and DATR are different in kind, I really don't - I'm not being disingenuous here. It seems to me that defaults in DATR make substantive claims about what the scope of the default is, and so forth. So I'm thinking of the kind of thing that I did with Ann Copestake and Alex Lascarides (Briscoe et al. 1995) about things getting unblocked for pragmatic effect, so that you can say things like ``there's pig on the menu'' if you have a certain attitude to pork. So it seems to me that DATR is saying that there is a certain useful notion of default which is entirely encapsulated in the lexicon, and I'm not saying that's wrong - but it seems to me that it's a claim.

GG Well, it is true that DATR is unsuitable for implementing the sort of thing you did. You used what, in an intuitive sense, was a significantly more powerful language. Okay, they are probably both Turing equivalent but in terms of ...

EJB If you want to do what we did, the you probably want a language in which you can do quantification easily. There will be some bizarre way you can do that in DATR but you wouldn't want to.

GG No, you wouldn't want to. It was our belief, I assume, when we designed DATR that you wouldn't need to do things like that. That might still be my belief - but that's for another occasion.

EJB That's the kind of thing that makes me wonder whether you can really make such a clearcut distinction between what was going on when you were designing DATR and what you were doing when you were designing the formal apparatus of GPSG.

GG Yes, I see what you are getting at. But I don't really buy it. Let us suppose that the observations that led you to your analysis were correct and that your analysis is the appropriate way to deal with those observations. If that were true, it would follow, at the very least, that DATR was unsuitable for describing that aspect of the lexicon. If one was able to somehow find analogous cases in some other aspect of the lexicon, morphology say, then that would indicate that DATR was not a very good design for that aspect either. So the effect of that sort of observation would be to show that DATR was not a satisfactory language for lexical knowledge representation, that it was not powerful enough to allow one to say, in a natural and intuitive way, the things that needed to be said.

EJB And, to put it all in a slightly different way, notions like suppletion and blocking and so on are vague until somebody comes along and tries to formalize them in a particular way. Then you see that, in fact, people may mean rather different things by these terms, although, broadly, there is some notion of a default, a defeasible inference. I myself think there is a notion of suppletion which is pretty much restricted to morphology, but there are also more general notions of blocking and so on around in linguistics which may well go beyond the lexicon. But it's really only when you start to try to formalize these things that these distinctions become apparent.

GG Yes, I would certainly agree with that. In some ways, I regret the focus in DATR on morphology but, in others, I don't. The focus on morphology is partly an accident. When we were developing DATR, I needed something to play with. I'm very much a concrete thinker - I need things that I can tinker with. I chose Latin noun declensions because I had done Latin O level and because Peter Matthews had worked on Latin and I had his books on my shelf (Matthews 1972, 1974). It was an obvious test domain. I fiddled with analyses of Latin and that work got presented on various occasions to morphologists who then picked up on it. It is also kind of fun doing morphology in DATR and so it was very tempting to do more of it and I succumbed to the temptation on many occasions.

EJB Going back to Henry Thompson 's distinction between ``computation in service to linguistics'' and computational linguistics as a branch of engineering, it seems that the bulk of your work has been in the former category. I know that DATR has been used in engineering systems, but the way you talked about the motivation for working on it and so on was driven entirely from your analysis of linguistic theory and what were the important questions. Mind you, I think good linguistic theory will always have consequences for engineering. But I didn't get the sense that you were thinking of DATR primarily as a tool to build large lexicons for analysing police traffic reports.

GG No, but you've omitted Henry's other relevant categories, ``computational linguistics proper'' and ``theory of linguistic computation'': that's really where I saw myself moving to. Of course, DATR has been used, and is used, as a tool for doing linguistics and I'm delighted that it is so used. But Roger and I would stake our claim in terms of what it has revealed about the nature of encoding certain sorts of lexical information. We would claim that it's pretty good at encoding morphological information, syntactic information construed in particular ways, and probably phonological information. And that this is a useful thing for NLP engineers to know. They don't have to use an existing DATR system, or even to reconstruct DATR, but they ought to be able to learn from DATR the nature of the thing that their engineering product is encoding.

EJB For some kind of elements that's true. I was talking about these second generation systems and it seems to me that they are all very informed by DATR. The primary difference appears to be something about more integration between the syntax of the system used to encode lexical information and the syntax of the system used to encode the syntactic information. That seems to be the flavour of all the second generation stuff. But otherwise I think the insight about defaults ...

GG The lack of that integration in our work was entirely deliberate. We wanted to develop a lexical knowledge representation language that was uncomitted as to syntactic theory.

EJB Yes, but that comes with an engineering overhead which perhaps makes it unattractive to people especially if they are committed to working with just one particular syntactic formalism.

GG The other issue relates to confusion. An advantage of having different notations for lexical organization and syntactic structure is that you can then see which bit is which, as it were.

EJB Yes. Viewed purely as a tool to explore the notion of the kinds of inferences one wants to be able to make over lexical information, it is probably better to keep the two kinds of things quite distinct, I agree.

GG One of the presentational difficulties we had with DATR was rather like one of the difficulties we had with ProGram. With ProGram, people said that the implementation wasn't very efficient. To which our response was ``So what? It is just a system for grammar development - it doesn't need to be efficient''. With DATR , people said ``How do you do reverse inference?'' Our response was ``Well, you need to build a DATR interpreter that will do reverse inference''. The Sussex implementation doesn't and we didn't need it to do so because, in order to debug DATR theories, you only need forward inference. Of course, if you are trying to interrogate a lexicon from the bottom up, as it were, and you throw it an inflected form and you want to find out what it is an inflected form of, then you do need to do reverse inference. But people couldn't seem to grasp that the language and the Sussex implementation of the language were not one and the same thing. They thought that the fact that the Sussex implementation did not support reverse inference was a fact about the language rather than a fact about a particular implementation. I would be patient with such people but I found it frustrating. Eventually several implementations appeared that could do reverse inference and then we could point people at those.

EJB That's because Henry's category of ``theory of linguistic computation'' is not one that is widely recognized. I must say, I tend to think that it is simply formal linguistics as it should be done. So, if it's true that there is a theory of linguistic computation, then it's really what formal linguistics ought to be, even if it's not.

GG Yes, I have some sympathy with that position. Even if I don't buy it completely, it is clear that there is a huge overlap. That overlap would subsume pretty much all of mathematical linguistics, for example.

EJB In terms of the analyses done in DATR, you mentioned work on morphology at Sussex , at Surrey , and there was also work in Germany

GG A lot of work has been done in Bielefeld and Düsseldorf , yes.

EJB On both morphology and phonology, actually. Does that work have a similar flavour to the work in GPSG which was essentially restating existing analyses in a way that made them more precise? Or is it the case that a lot of that work has really broken new ground?

GG The two areas are different. Apart from some interesting work on underspecification and lexical prosody at Bielefeld, most of the phonology done in DATR has been morphophonology done as a component of an essentially morphological description. By contrast, I think there has been a good deal of original morphological work. It is much easier to do original work in morphology than in syntax because morphology has been relatively neglected, in both linguistics and computational linguistics, for decades - since the days of Hockett and Yngve.

To a greater or lesser extent, the existence of morphology has been denied by mainstream generative grammar. Think about affix-hopping in Syntactic Structures: that analysis claimed, in effect, that inflection was part of syntax. Much of the time it has been assumed that the domain of morphology can be partitioned into the bit that belongs to syntax and the bit that belongs to phonology. With nothing left in the middle. This has had the agreeable consequence that all the leading morphologists are scholars of considerable independent stature. In order to pursue the discipline at all, they have had to assert its existence - which, as I have just indicated, was not an orthodox thing to do. Almost all DATR work on morphology is realizational, that is, it is in the Carstairs-McCarthy , Matthews, Stump , van Marle, Zwicky tradition. But not much of it has simply reconstructed work by these linguists. I have sometimes just translated bits of Stump's work: paradigm functional morphology just maps over into DATR almost trivially. On the other hand, I have a very detailed and comprehensive analysis of Gikuyu noun inflection that was originally provoked by a tiny fragment of Stump's, but which turned into something rather different. And the work that Lynne Cahill and I have done on German inflection (Cahill & Gazdar 1997, 1999) was initially inspired by a paper of Zwicky's (1986) but what we were doing soon took on a life of its own. The data, of course, is very well known, in contrast to the Gikuyu. The Surrey work on Russian and other Slavic languages is completely original. Again, much of the data is well known. You can find great Russian tomes that set out the facts. But making sense of them is something else.

EJB Has any of the data presented real challenges? Did any of the Slavic data that you know resist formulation?

GG I don't think so, no. I think there are good DATR analyses and bad ones, and it's very easy to do pretty clunky analyses in DATR . I have seen many in my time and reworked many in my time, but that's always going to be the case.

EJB What about syntax?

GG The use of DATR to encode syntactic information is sort of boring. At least, if you keep subcategorization formulated simply, then it's boring. It was difficult to do something more interesting with HPSG because HPSG has its own way of dealing with the lexicon using the type system. Flickinger's (1987) dissertation had been one of the main motivations for developing DATR and my original intention had been to do an HPSG subcategorization lexicon that trod in his footsteps. But, as it turned out, the HPSG lexicon evolved differently. So, in a sense, there wasn't a syntactic lexicon around for DATR to do something interesting with in the early days. But then along came LTAG (Schabes 1990). Roger and David Weir and I got this interest in LTAG, not because we were signing up for some syntactic programme, but because LTAG is an absolute Disneyland of lexical redundancy if you don't do something sensible with the lexicon. The redundancy is potentially far worse than a worst-case HPSG lexicon. Actually, I had also had a soft spot for LTAG ever since Vijay and David (Vijay-Shanker & Weir 1994) proved that it was equivalent to linear indexed grammar (Gazdar 1988).

EJB But the critical thing, surely, is that the LTAG lexicon is a tree structured object, not a feature structured object, and you can describe trees better in DATR.

GG I don't think untyped feature structures are any harder to encode in DATR than are trees. Encoding trees gets quite hairy as our LTAG work amply illustrates . The problem with HPSG from a DATR perspective is that the type system is already in use to eliminate some, but not all, of the lexical redundancy. Whereas the redundancy in LTAG was just ripe for the taking, as it were. Although, of course, there were competitors seeking to account for it.

EJB But those competitors were second generation implementations of a kind of approach to the lexicon which was informed by DATR. You're thinking of the work that Vijay-Shanker & Schabes (1992) did, or of something else?

GG That was one. There were a couple of others - Becker (1993) and Candito (1996).

EJB Well, Becker was also, I think, clearly second generation in that he knew about Vijay-Shanker & Schabes. So it is kind of ironic that at the time you were doing DATR, Carl Pollard and Ivan were inventing their own way of dealing with redundancy in the lexicon, but theirs involved absolute inheritance and therefore really could not do justice to the linguistic notion of suppletion.

GG Although they did kind of hum and haw on that. I think there was some kind of internal dispute in HPSG on exactly that issue - I don't know how significant it was.

EJB It is exactly analogous, isn't it, to what you were saying earlier about the tension between dealing with more data and keeping the system formally elegant. Only, in this case, it seems that Carl won. There is no doubt that the subsumption based approach is very elegant at a formal level and it keeps the whole system extremely clean and uniform. But, in the end, it doesn't do justice to a very central insight from linguistics - that's just not how languages work.

GG That would be my view, yes.

EJB Yes, mine too, and indeed history seems to be on our side because, in the most recent work, since 1997 when Ivan published the paper in Journal of Linguistics (Sag 1997), defaults have played an increasingly important role in HPSG . At least they have in the kind of analyses that are coming out of Stanford. It is not clear what the next round of formalization of HPSG will look like. The Sag & Wasow book uses the version of default implication that we developed (Lascarides et al. 1996) but in a very informal sort of way. It does look like a rather ironic accident that they didn't pick up on the ideas in DATR.

GG Yes. I think it has a lot to do with contact. If I had still been jetting over every summer, then it might have been different. I just didn't see Ivan for a number of years. Ivan very much reacts to people in his environment. If you are in his environment and you have what he thinks is a good idea then he will pick up on it.

EJB That brings me to a question I will forget if I don't ask it now, so I will ask it. I think that is generally true of an awful lot of linguistic schools or groups or whatever, that you have got to be right there, right next to them physically to have an impact. This always appears to put linguists in the UK at something of a disadvantage since the great bulk of relevant people are on the other side of the Atlantic. You have very clearly chosen to stay in the UK throughout your career. Do you think that was the right decision? Would you recommend to others that being a British linguist is a good thing to be? Or is that just a completely outmoded way of thinking about it?

GG I think there is quite a bit to that. It must be less true now because of the greater ease of both of communication and travel, but it is probably still an issue. If one's only goal in life is to either further one's career or further the set of ideas that one is associated with and wishes to promote, then moving to the US is probably still the appropriate thing to do. For a British linguist, a British computational linguist, or any other kind of British scholar, except perhaps a researcher in British history.