Main Functionalities
- Tokenizer: Segmenting text into words, punctuations marks etc. This is done by applying rules specific to each language. Each Doc consists of individual tokens, and we can iterate over them.
- Visualizer: Visualizing a dependency parse or named entities in a text is not only a fun NLP demo – it can also be incredibly helpful in speeding up development and debugging your code and training process.
- Lemmatization: Assigning the base forms of words.
- Similarity: This API is used to compare two texts, and make a prediction of how similar they are. Predicting similarity is useful for building recommendation systems or flagging duplicates.
- Named Entity Extraction: A named entity is a “real-world object” that’s assigned a name – for example, a person, a country, a product or a book title. spaCy can recognize various types of named entities in a document, by asking the model for a prediction.
- Language Detection: Detects the human language of any given text
Key Features:
- Perform NLP tasks without complex formulas.
- Pre-trained statistical models.
- Ability perform NLP operations in 6 languages.
- Visualization capabilities.
- No background NLP know-how needed.
- Predictive modeling.