What is Natural Language Processing?


Natural language processing is a subfield of artificial intelligence where computer algorithms are used to process natural language data including natural language understanding and natural language generation.


What methods are applied in Natural Language Processing?

Symbolic (or rule based) NLP was the initial methodology applied to Natural Language starting from 1950s and was the predominant approach until 1990s where Statistical methods gained dominance. With the turn of the century, machine learning algorithms and more recently artificial neural networks are predominantly applied to Natural Language Processing problems.


What are the common high level Natural Language Processing tasks?

  • Automatic Summarization

    Producing an automatic summary of large body of text.

  • Machine Translation

    Translating text from one human language to another human language.

  • Chatbots

    Holding chat conversation with humans in order to for example gather information or help with queries.

  • Question Answering

    Providing automatic answers to human language questions most often where specific answer is present.

  • Sentiment Analysis

    Computing polarity score for often subjective information such as reviews or tweets.

  • Topic Modelling

    Automatic discovery of abstract topics occurring in a body of text.

  • Language Modelling

    Assigning an occurrence probability to any sequence of words.

  • Information Retrieval

    Searching for requested information and ranking of the results.

  • Word Sense Disambiguation

    Identifying which sense of the word is used in a sentence.


What are the common low level Natural Langaue Processing tasks?

  • Word Segmentation (Tokenization)

    Segmentation of a body of text into smaller tokens (often words).

  • Sentence Boundary Disambiguation

    Determining the start and end of the sentences within a body of text.

  • Part of Speech Tagging

    Associating every word with a part of speech based on the definition and context.

  • Named Entity Recognition

    Determining which words in a body of text map to proper names such as names of people or places.

  • Stemming

    Identifying the root form or word stem for inflected or derived words.

  • Lemmatization

    Grouping all inflected or derived forms of a word into one group.

  • Constituency Parsing

    Constructing a tree structure representing the syntactic structure according to phrase structure grammar.

  • Dependency Parsing

    Constructing a tree structure representing the syntactic structure according to a dependency grammar.

  • Coreference Resolution

    Identifying all expressions that refer to the same entity in the text.


Timeline of Natural Language Processing

Here we include a list of important events in the history of natural language processing:

2018 Jacob Devlin et al. introduce BERT: Bidirectional Encoder Representations from Transformers.

2017 Ashish Vaswani et al. introduce transformers dispensing with recurrence and convolutions entirely.

2015 Dzmitry Bahdanau et al. introduce the concept of attention.

2013 Tomas Mikolov et al. at Google introduce word2vec using neural networks to learn word associations.

2008 Collobert and Weston apply multi task learning to NLP.

2001 Bengio et al. propose the first neural language model.

1990s Statistical Methods such as A tree-based statistical language model (Bahl et al., 1989), educing linguistic struc- ture from the statistics of large corpora (Brill et al., 1990), Statistical parsing of messages (Chitrao and Grishman, 1990), A statistical approach to machine translation (Brown et al., 1991)

1988 IBM’s Thomas J. Watson research center reintroduced statistical machine translation.

1980s Symbolic methods such as Passing Markers (Charniak, 1983), In depth understanding (Dyer, 1983), (Direct Memory Access (Parsing Riesbeck and Martin, 1986), TEAM (Grosz et al., 1987), Semantic Interpretation and the Resolution of Ambiguity (Hirst, 1987)

1970s Conceptual ontologies such as MARGIE (Schank, 1975), SAM (Cullingford, 1978), PAM (Wilensky, 1978), TaleSpin (Meehan, 1976), QUALM (Lehnert, 1977), Politics (Carbonell, 1979)

1968 SHRDLU, an early natural language understanding program is released.

1966 The first natural language processing computer program ELIZA is created at MIT AI Lab.

1966 ALPAC issues their report on Machine Translation.

1957 Noam Chomsky publishes his book Syntactic Structures.

1954 The Georgetown-IBM experiment automatically translates 60 carefully selected Russian sentences into English.

1949 Warren Weaver writes the Translation Memorandum