spaCy

From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
spaCy
Original authorMatthew Honnibal
DevelopersExplosion AI, various
Initial releaseFebruary 2015; 11 years ago (2015-02)[1]
Repository
  • {{URL|example.com|optional display text}}Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
Written inPython, Cython
Engine
    Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
    Operating systemLinux, Windows, macOS, OS X
    PlatformCross-platform
    TypeNatural language processing
    LicenseMIT License

    spaCy (/spˈs/ spay-SEE) is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython.[2][3] The library is published under the MIT license and its main developers are Matthew Honnibal and Ines Montani, the founders of the software company Explosion.

    Unlike NLTK, which is widely used for teaching and research, spaCy focuses on providing software for production usage.[4][5] spaCy also supports deep learning workflows that allow connecting statistical models trained by popular machine learning libraries like TensorFlow, PyTorch or MXNet through its own machine learning library Thinc.[6][7] Using Thinc as its backend, spaCy features convolutional neural network models for part-of-speech tagging, dependency parsing, text categorization and named entity recognition (NER). Prebuilt statistical neural network models to perform these tasks are available for 23 languages, including English, Portuguese, Spanish, Russian and Chinese, and there is also a multi-language NER model. Additional support for tokenization for more than 65 languages allows users to train custom models on their own datasets as well.[8]

    History

    [edit | edit source]
    • Version 1.0 was released on October 19, 2016, and included preliminary support for deep learning workflows by supporting custom processing pipelines.[9] It further included a rule matcher that supported entity annotations, and an officially documented training API.
    • Version 2.0 was released on November 7, 2017, and introduced convolutional neural network models for 7 different languages.[10] It also supported custom processing pipeline components and extension attributes, and featured a built-in trainable text classification component.
    • Version 3.0 was released on February 1, 2021, and introduced state-of-the-art transformer-based pipelines.[11] It also introduced a new configuration system and training workflow, as well as type hints and project templates. This version dropped support for Python 2.

    Main features

    [edit | edit source]

    Extensions and visualizers

    [edit | edit source]
    Dependency parse tree visualization generated with the displaCy visualizer
    Dependency parse tree visualization generated with the displaCy visualizer

    spaCy comes with several extensions and visualizations that are available as free, open-source libraries:

    References

    [edit | edit source]
    1. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    2. ^ Choi et al. (2015). It Depends: Dependency Parser Comparison Using A Web-based Evaluation Tool.
    3. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    4. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    5. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    6. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    7. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    8. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    9. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    10. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    11. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    12. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    13. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    14. ^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
    15. ^ Trask et al. (2015). sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings.
    [edit | edit source]