Skip to contents

To keep punctuation during tokenization, it's convenient to add spacing around punctuation. This function does that, with options to keep certain types of punctuation together as part of the word.

Usage

space_punctuation(text, space_hyphens = TRUE, space_abbreviations = TRUE)

Arguments

text

A character vector to clean.

space_hyphens

Logical; treat hyphens between letters and at the start/end of words as punctuation? Other hyphens are always treated as punctuation.

space_abbreviations

Logical; treat apostrophes between letters as punctuation? Other apostrophes are always treated as punctuation.

Value

A character vector the same length as the input text, with spaces added around punctuation characters.

Examples

to_space <- "This is some 'gosh-darn' $5 text. Isn't it lovely?"
to_space
#> [1] "This is some 'gosh-darn' $5 text. Isn't it lovely?"
space_punctuation(to_space)
#> [1] "This is some  ' gosh - darn '   $ 5 text .  Isn ' t it lovely ? "
space_punctuation(to_space, space_hyphens = FALSE)
#> [1] "This is some  ' gosh-darn '   $ 5 text .  Isn ' t it lovely ? "
space_punctuation(to_space, space_abbreviations = FALSE)
#> [1] "This is some  ' gosh - darn '   $ 5 text .  Isn't it lovely ? "