Installation


Wikipedia2Vec can be installed from PyPI:

% pip install wikipedia2vec

Alternatively, you can install the development version of this software from the GitHub repository:

% git clone https://github.com/studio-ousia/wikipedia2vec.git
% cd wikipedia2vec
% pip install Cython
% ./cythonize.sh
% pip install .

Wikipedia2Vec requires the 64-bit version of Python, and can be run on Linux, Windows, and Mac OSX. It currently depends on the following Python libraries: click, jieba, joblib, lmdb, marisa-trie, mwparserfromhell, numpy, scipy, six, and tqdm.

If you want to train embeddings on your machine, it is highly recommended to install a BLAS library. We recommend using OpenBLAS or Intel Math Kernel Library. Note that, the BLAS library needs to be recognized properly from SciPy. This can be confirmed by using the following command:

% python -c 'import scipy; scipy.show_config()'

To process Japanese Wikipedia dumps, it is also required to install MeCab and its Python binding. Furthermore, to use ICU library to split either words or sentences or both, you need to install the C/C++ ICU library and the PyICU library.