Lexical selection

Overview

Lexical selection combines the input semantics with the lexicon and the tree schema file. The input to lexical selection is (i) a flat input semantics and (ii) an FB-LTAG grammar. The output is a a set of elementary trees which can then passed on to the tree assembly phase.

The lexical selection process can be described as

Select from the lexicon a set of lemmas from the lexicon whose semantics subsumes the input semantics. Instantiate their semantics by unifying it with the relevant subset of the input semantics. We can call the results of this process instantiated lemmas
Combine the lemmas with the tree families they ask for. Each tree family contains a set of tree schemata. Enrich these tree schemata and unify the their semantics with the semantics of the instantiated lemmas. The result of this is a set of elementary trees

Because the input semantics is saturated, the instantiated lexical item also has a saturated semantics.

Enrichment

Enrichment is a process which adds features to either the interface, an explicitly named node or the co-anchor of a lexically selected tree.

There are three possible enrichment outcomes:

success : all nodes found; the tree is modified
soft failure (enrichment on nodes only) : some nodes not found and ignored
hard failure: unification failure

The enrichement information comes from the lexicon in the form of a path equations which specify

the location
top, bottom, or lex
the attribute (if top or bottom)
what value to associate with it

The conventions taken by GenI for path equations are:

equation	effect
`interface.foo=bar`	`foo=bar` is unified into the interface (not the tree)
`anchor.bot.foo=bar`	`foo=bar` is unified into the bottom feature of the node which is marked anchor.
`toto.top.foo=bar`	`foo=bar` is unified into the top feature of node named “toto”
`toto.lex=quux`	the lexeme for the node named “toto” is set to “quux”
`anchor.foo=bar`	same as `anchor.bot.foo=bar`
`anc.whatever...`	same as `anchor.whatever...`
`top.foo=bar`	same as `anchor.top.foo=bar`
`bot.foo=bar`	same as `anchor.bot.foo=bar`
`foo=bar`	same as `anchor.bot.foo=bar`
`toto.foo=bar`	same as `toto.top.foo=bar` (creates a warning)

Lemanchor mechanism

One problem in building reversible grammars is the treatment of co-anchors. In the French language, for example, we have some structures like C’est Jean qui regarde Marie (It is John who looks at Mary)

One might be tempted to hard code the “ce” (“it”) and the “être” (is) into the tree for regarder (look at), something like

s(ce, être, n↓, qui, v(regarder), n↓)

Indeed, this would work just fine for generation, but not for parsing. When you parse, you would encounter inflected forms for these items for example c’ for ce or sont or est for être. Hard-coding the ce into such trees would break parsing.

To work around this, we propose a mechanism to have our co-anchors and parsing too. Co-anchors that are susceptible to morphological variation should be

marked in a substitution site (this is to keep parsers happy)
have a feature top.lemanchor:foo where foo is the coanchor you want

GenI will convert these into non-substitution sites with a lexical item leaf node.