Add Spaces Around CJK Ideographs — space_cjk • piecemaker

To tokenize Chinese, Japanese, and Korean (CJK) characters, it's convenient to add spaces around the characters.

Usage

space_cjk(text)

Arguments

text: A character vector to clean.

Value

A character vector the same length as the input text, with spaces added between ideographs.

Examples

to_space <- intToUtf8(13312:13320)
to_space
#> [1] "㐀㐁㐂㐃㐄㐅㐆㐇㐈"
space_cjk(to_space)
#> [1] " 㐀  㐁  㐂  㐃  㐄  㐅  㐆  㐇  㐈 "