What is Natural Language Processing?¶
Natural language processing is a subfield of artificial intelligence where computer algorithms are used to process natural language data including natural language understanding and natural language generation.
What methods are applied in Natural Language Processing?
Symbolic (or rule based) NLP was the initial methodology applied to Natural Language starting from 1950s and was the predominant approach until 1990s where Statistical methods gained dominance. With the turn of the century, machine learning algorithms and more recently artificial neural networks are predominantly applied to Natural Language Processing problems.
What are the common high level Natural Language Processing tasks?
Automatic Summarization
Producing an automatic summary of large body of text.
Machine Translation
Translating text from one human language to another human language.
Chatbots
Holding chat conversation with humans in order to for example gather information or help with queries.
Question Answering
Providing automatic answers to human language questions most often where specific answer is present.
Sentiment Analysis
Computing polarity score for often subjective information such as reviews or tweets.
Topic Modelling
Automatic discovery of abstract topics occurring in a body of text.
Language Modelling
Assigning an occurrence probability to any sequence of words.
Information Retrieval
Searching for requested information and ranking of the results.
Word Sense Disambiguation
Identifying which sense of the word is used in a sentence.
What are the common low level Natural Langaue Processing tasks?
Word Segmentation (Tokenization)
Segmentation of a body of text into smaller tokens (often words).
Sentence Boundary Disambiguation
Determining the start and end of the sentences within a body of text.
Part of Speech Tagging
Associating every word with a part of speech based on the definition and context.
Named Entity Recognition
Determining which words in a body of text map to proper names such as names of people or places.
Stemming
Identifying the root form or word stem for inflected or derived words.
Lemmatization
Grouping all inflected or derived forms of a word into one group.
Constituency Parsing
Constructing a tree structure representing the syntactic structure according to phrase structure grammar.
Dependency Parsing
Constructing a tree structure representing the syntactic structure according to a dependency grammar.
Coreference Resolution
Identifying all expressions that refer to the same entity in the text.
Timeline of Natural Language Processing
Here we include a list of important events in the history of natural language processing:
2018 Jacob Devlin et al. introduce BERT: Bidirectional Encoder Representations from Transformers.
2017 Ashish Vaswani et al. introduce transformers dispensing with recurrence and convolutions entirely.
2015 Dzmitry Bahdanau et al. introduce the concept of attention.
2013 Tomas Mikolov et al. at Google introduce word2vec using neural networks to learn word associations.
2008 Collobert and Weston apply multi task learning to NLP.
2001 Bengio et al. propose the first neural language model.
1990s Statistical Methods such as A tree-based statistical language model (Bahl et al., 1989), educing linguistic struc- ture from the statistics of large corpora (Brill et al., 1990), Statistical parsing of messages (Chitrao and Grishman, 1990), A statistical approach to machine translation (Brown et al., 1991)
1988 IBM’s Thomas J. Watson research center reintroduced statistical machine translation.
1980s Symbolic methods such as Passing Markers (Charniak, 1983), In depth understanding (Dyer, 1983), (Direct Memory Access (Parsing Riesbeck and Martin, 1986), TEAM (Grosz et al., 1987), Semantic Interpretation and the Resolution of Ambiguity (Hirst, 1987)
1970s Conceptual ontologies such as MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979)
1968 SHRDLU, an early natural language understanding program is released.
1966 The first natural language processing computer program ELIZA is created at MIT AI Lab.
1966 ALPAC issues their report on Machine Translation.