.. Kadot documentation master file, created by
sphinx-quickstart on Wed Apr 11 14:09:58 2018.
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Kadot : Unsupervised natural language processing.
=================================================
*⚠️ You are reading the documentation of Kadot 1.0 which is under development.*
* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
**Kadot** is an open-source library to easily process text documents. It relies on vector representations of documents or words in order to solve NLP tasks such as **summarization**, **spellchecking** or **classification**.
::
# How to get n-grams using kadot.
>>> from kadot.tokenizers import regex_tokenizer
>>> hello_tokens = regex_tokenizer("Kadot just lets you process a text easily.")
>>> hello_tokens.ngrams(n=2)
[('Kadot', 'just'), ('just', 'lets'), ('lets', 'you'), ('you', 'process'), ('process', 'a'), ('a', 'text'), ('text', 'easily')]
What's 🆕 in 1.0 ?
------------------
*⚠️ All these new features may not yet be available on Github.*
* **Vectorizers** : We are now offering Word2Vec, the state-of-the-art Fasttext and Doc2Vec algorithms using `Gensim `_ 's powerful backend.
* **Performances** : Using a much more efficient algorithm, the new word vectorizer is up to 95% faster and sparse vectors now take up to 94% less memory.
* **Models** : Kadot now includes an *automatic text summarizer* and an *entity labeler* which can be useful in many projects.
* **Bot Engine** ?
* **Dependencies** 😞 : In order to guarantee good performance without reinventing the wheel, we are adding `Gensim `_ and `Pytorch `_ to our list of dependencies. Although installed by default, these libraries (with scikit-learn) will be optional and only Numpy and Scipy are strictly required to use Kadot.
⚖️ License
---------
Kadot is under `MIT license `_ .
*I am not a native English speaker, if you see any language mistakes in the documentation or in the code, please open an issue on Github.*