Natural Language Processing / Tokenizers

9 packages

Packages (9)

gse

Go efficient text segmentation; support english, chinese, japanese and other.

2,778 225

gojieba

This is a Go implementation of [jieba](https://github.com/fxsjy/jieba) which a Chinese word splitting algorithm.

2,618 306

sentences

Sentence tokenizer: converts text into a list of sentences.

463 41

segment

Go library for performing Unicode Text Segmentation as described in [Unicode Standard Annex #29](https://www.unicode.org/reports/tr29/)

88 14

textcat

Go package for n-gram based text categorization, with support for utf-8 and raw text.

73 11

MMSEGO

This is a GO implementation of [MMSEG](http://technology.chtsai.org/mmseg/) which a Chinese word splitting algorithm.

62 14

stemmer

Stemmer packages for Go programming language. Includes English and German stemmers.

54 7

gotokenizer

A tokenizer based on the dictionary and Bigram language models for Golang. (Now only support chinese segmentation)

21 7

shamoji

The shamoji is word filtering package written in Go.

13 2