gotokenizer

A tokenizer based on the dictionary and Bigram language models for Golang. (Now only support chinese segmentation)

golangsegmentationtokenizer
21
Stars
7
Forks
0
Issues
21
Watchers

Similar Packages

gse

Go efficient text segmentation; support english, chinese, japanese and other.

2,791

gojieba

This is a Go implementation of [jieba](https://github.com/fxsjy/jieba) which a Chinese word splitting algorithm.

2,622

sentences

Sentence tokenizer: converts text into a list of sentences.

465

segment

Go library for performing Unicode Text Segmentation as described in [Unicode Standard Annex #29](https://www.unicode.org/reports/tr29/)

88

textcat

Go package for n-gram based text categorization, with support for utf-8 and raw text.

73