From Rules to Vectors: How NLP Changed Over Time

This is a quick summary of Manning’s article on the history and future of natural language processing and foundation models.

  • Manning divides NLP history into four eras based on different approaches:

    • Machine translation: simple word-level translation with rules and lookups (1950-1969).

    • Rule-based systems: hand-built systems that handle syntax and reference with linguistic theories and knowledge-based AI (1970-1992).

    • Empirical machine learning models: statistical models that learn from data with labels or annotations for specific tasks or domains (1993-2012).

    • Deep learning models: neural network models that learn from raw data without labels or annotations and from self-supervision objectives for general tasks or domains (2013-present).

  • LPLMs (Large Pretrained Language Models) are foundation models, which are models that can do many tasks after being trained on lots of data via self-supervision. They can also use other types of data, such as images, sounds, or actions.

  • Foundation models have pros and cons for NLP and beyond:

    • Pros: high performance on many NLP tasks; learning from diverse and rich data sources; easy adaptation to different tasks or domains.

    • Cons: costly and slow to train; hard to understand and measure; not very reliable or flexible.

  • Manning concludes that foundation models are an exciting and important direction for AI. He suggests that we need to learn more about how they work and how to improve them.

Last updated