# Aim Cfg Cs 100 Hs \/\/FREE\\\\

My students and I work broadly on computational approaches to human language. Our high-level agenda is here.Computer scientists should look at our new algorithms and machine learning methods. Computational linguists may be more interested in our formal and statistical models of language. Natural language engineers will be interested in the varied NLP problems that we've tackled with these methods, including educational technology. We're developing Dyna, a beautiful high-level programming language, to facilitate all of the above. Don't know what to read? Try these selected papers. Also listed here are supervised dissertations, some invited talks, talk videos, patents, edited volumes, and teaching papers.

## aim cfg cs 100 hs

Natural language problems often demand new algorithms. The main challenges area combinatorially large discrete space of linguistic structures a high-dimensional continuous space of statistical parameters Many of our algorithmic papers give general solutions to some formal problem, and thus have multiple uses. I have occasionally proved hardness results.

Dynamic programming is extremely useful for analyzing sequence data. The papers below introduce novel dynamic programming algorithms (primarily for parsing and machine translation). Other cool papers, not listed below, show how dynamic programming algorithms can be embedded as efficient subroutines within variational inference (belief propagation), relaxation (dual decomposition, row generation), and large-neighborhood local search.

"Deep learning" usually refers to the use of multi-layer neural networks to model functions or probability distributions. The advantage of these highly parametric models is that they are expressive enough to fit a wide range of real-world phenomena. We have been particularly interested in combining deep learning with graphical models and other approaches to structured prediction, in order to marry the flexibility of deep learning with the insight of domain-specific modeling. Deep architectures for NLP typically include parameters for vector embeddings of words, which is an important subtopic.

My students and I often identify a pesky formal problem in statistical NLP or ML and try to give a general solution to it. The formal settings for our algorithms often involve finite-state machines, various kinds of grammars and synchronous grammars, and graphical models. These objects are usually equipped with real-valued weights that define a structured prediction problem (see here). One can treat a wider variety of problems by allowing the weights to be elements of an arbitrary semiring. Our work on weighted logic programming (see papers on the Dyna language) has led us to develop flexible algorithms for maintaining truth values in arithmetic circuits.

My students and I have worked in several ML settings, some of them novel. I have a relatively well-developed statistical philosophy, leading to the design of novel training objectives. We've also offered techniques for optimizing such objectives. Once upon a time, I was into neural networks and have started to use deep learning again.

We have introduced new machine learning ideas in several settings, including in unsupervised, supervised and semi-supervised learning; domain adaptation; structure learning for graphical models; hybrids of probabilistic modeling with deep learning; cost-aware learning; reinforcement learning; and creative use of annotators.

Our novel contributions to unsupervised learning were primarily developed on grammar induction, including an approach for converting unsupervised learning to supervised learning on synthetically generated data. We have also proposed unsupervised bootstrapping and done a little work on clustering. Other papers, not listed below, also do unsupervised learning, but using traditional approaches such as EM and MCEM. We develop such approaches for our transformation models and nonparametric models.

Intelligent systems may be structured to do approximate probabilistic inference under some carefully crafted model. However, they should be trained discriminatively, so that their actual decisions or predictions maximize task-specific performance. This allows the parameters to compensate any approximations in modeling, inference, and decoding. My philosophy comes from Bayesian decision theory: The task of science is generative modeling. Given a data sample and some knowledge of the underlying processes, what is the true data distribution D? The task of engineering is to find a decision rule that is expected to be accurate and perhaps also fast (on distribution D). This is the proper relation between generative and discriminative modeling. One should design a sensible space of decision rules (policies) and explicitly search for one having high expected performance over D. In practice this can greatly improve accuracy. See also other papers on machine learning objectives.

We have designed various classes of generative models. These models are of general interest although they were motivated by linguistic problems. They include transformation models and variations on topic models. Some of our models are nonparametric or have deep learning architectures. We have also extended Markov random fields to string-valued random variables.

Each of the papers below uses finite-state machines to help model some linguistic domain. In most cases, the model combines multiple machines, or combines finite-state machines with deep learning. Many of these papers also present algorithmic methods.

Parsing a sentence is a key part of obtaining its meaning or translation. Leading systems for QA, IE, and MT now rely on parsing. We have devised fundamental, widely-used exact parsing algorithms for dependency grammars, combinatory categorial grammars, context-free grammars and tree-adjoining grammars. We also showed that different parsing algorithms are often interrelated by formal transformations that appear widely applicable. Beyond devising exact algorithms, we have developed several principled approximations for speeding up parsing, both for basic models and for enriched models where exact parsing would be impractical. A number of our papers (not all shown below) try to improve the actual models of linguistic syntax that are used in parsing. For example, several of these algorithms aim to preserve speed for lexicalized models of grammar, which acknowledge that two different verbs (for example) may behave differently.

A parser is only as accurate as its linguistic model. Many existing grammar formalisms aim to capture different aspects of syntax (see parsing papers). We have tried to enrich these formalisms in appropriate ways, by explicitly modeling lexicalization, dependency length, non-local statistical interactions (beyond what the grammar formalism provides), and syntactic transformations. Remark: The probabilities under lexicalized models can capture some crude semantic preferences along with syntax (i.e., selectional preferences). In fact, in our very early work, we actually conditioned probabilities on words according to their role in a semantic representation. I subsequently argued for bilexical parsing as an approximation to this, and gave the first generative model for dependency parsing (which was also the first non-edge-factored model).

The 2008, 2009, and 2011 papers below built up an elegant model of inflectional morphology, with each paper building on the previous one. The work is gathered together in Dreyer's dissertation. Further work beginning in 2015 extended the approach to use latent underlying morphs, allowing it to treat derivational morphology as well.

Most syntax-based models of translation assume that in training data, a sentence and its translation have isomorphic syntactic structure. The papers below work to weaken that assumption, which often fails in practice. See also other papers on machine translation.

Learning French in high school was so slow and artificial compared to learning my native language, English. Why all these vocabulary lists and toy-data sentences? Why couldn't I pick up French words and constructions in context, by reading something interesting? In high school, I wanted to write a novel that gradually morphed from English to French, starting with English words in French order, and gradually dropping in French function words and content words when they were clear from context. Now with machine translation, we're starting to create this kind of hybrid text automatically ...

The Dyna language is our bid to provide a unifying framework for data and algorithms across many settings. Programming in Dyna is meant to be easy. A program is a short, high-level schematic description of the structure of a computation. It simply defines various values in terms of other values. The user can query and update values at runtime. It is the system's job to choose efficient data structures and computations to support the possible queries and updates. This leads to interesting algorithmic problems, connected to query planning, deductive databases, and adaptive systems. The forthcoming version of the language is described in Eisner & Filardo (2011), which illustrates its power on a wide range of problems in statistical AI and beyond. We released a prototype back in 2005, which was limited to semiring-weighted computations but has been used profitably in a number of NLP papers. The new working implementation under development is available here on github.

Privacy-preserving domain adaptation of semantic parsers. Fatemehsadat Mireshghallah, Richard Shin, Yu Su, Tatsunori Hashimoto, and Jason Eisner (2022). arXiv.   [ arxiv bib ]Task-oriented dialogue systems often assist users with personal or confidential matters. For this reason, the developers of such a system are generally prohibited from observing actual usage. So how can they know where the system is failing and needs more training data or new functionality? In this work, we study ways in which realistic user utterances can be generated synthetically, to help increase the linguistic and functional coverage of the system, without compromising the privacy of actual users. To this end, we propose a two-stage Differentially Private (DP) generation method which first generates latent semantic parses, and then generates utterances based on the parses. Our proposed approach improves MAUVE by 3.8 and parse tree node-type overlap by 1.4 relative to current approaches for private synthetic data generation, improving both on fluency and semantic coverage. We further validate our approach on a realistic domain adaptation task of adding new functionality from private user data to a semantic parser, and show gains of 1.3 on its accuracy with the new feature.Keywords: