Kadot : Unsupervised natural language processing.¶
⚠️ You are reading the documentation of Kadot 1.0 which is under development.
Kadot is an open-source library to easily process text documents. It relies on vector representations of documents or words in order to solve NLP tasks such as summarization, spellchecking or classification.
# How to get n-grams using kadot.
>>> from kadot.tokenizers import regex_tokenizer
>>> hello_tokens = regex_tokenizer("Kadot just lets you process a text easily.")
>>> hello_tokens.ngrams(n=2)
[('Kadot', 'just'), ('just', 'lets'), ('lets', 'you'), ('you', 'process'), ('process', 'a'), ('a', 'text'), ('text', 'easily')]
What’s 🆕 in 1.0 ?¶
⚠️ All these new features may not yet be available on Github.
- Vectorizers : We are now offering Word2Vec, the state-of-the-art Fasttext and Doc2Vec algorithms using Gensim ‘s powerful backend.
- Performances : Using a much more efficient algorithm, the new word vectorizer is up to 95% faster and sparse vectors now take up to 94% less memory.
- Models : Kadot now includes an automatic text summarizer and an entity labeler which can be useful in many projects.
- Bot Engine ?
- Dependencies 😞 : In order to guarantee good performance without reinventing the wheel, we are adding Gensim and Pytorch to our list of dependencies. Although installed by default, these libraries (with scikit-learn) will be optional and only Numpy and Scipy are strictly required to use Kadot.
⚖️ License¶
Kadot is under MIT license .
I am not a native English speaker, if you see any language mistakes in the documentation or in the code, please open an issue on Github.