Text Tokenization using Byte Pair Encoding and Unigram Modelling


[Up] [Top]

Documentation for package ‘sentencepiece’ version 0.1.2

Help Pages

read_word2vec Read a word2vec embedding file
sentencepiece Construct a Sentencepiece model
sentencepiece_decode Decode encoded sequences back to text
sentencepiece_download_model Download a Sentencepiece model
sentencepiece_encode Tokenise text alongside a Sentencepiece model
sentencepiece_load_model Load a Sentencepiece model
wordpiece_encode Wordpiece encoding