Skip to contents

This is an extremely simple tokenizer, breaking only and exactly on the space character. This tokenizer is intended to work in tandem with prepare_text, so that spaces are cleaned up and inserted as necessary before the tokenizer runs. This function and prepare_text are combined together in prepare_and_tokenize.

Usage

tokenize_space(text)

Arguments

text

A character vector to clean.

Value

The text as a list of character vectors (one vector per element of text). Each element of each vector is roughly equivalent to a word.

Examples

tokenize_space("This is some text.")
#> [[1]]
#> [1] "This"  "is"    "some"  "text."
#>