Accent characters and other diacritical marks are often difficult to type,
and thus can be missing from text. To normalize the various ways a user might
spell a word that should have a diacritical mark, you can convert all such
characters to their simpler equivalent character.
Arguments
- text
A character vector to clean.
Value
The character vector with simpler character representations.
Examples
# This text can appear differently between machines if we aren't careful, so
# we explicitly encode the desired characters.
sample_text <- "fa\u00e7ile r\u00e9sum\u00e9"
sample_text
#> [1] "façile résumé"
remove_diacritics(sample_text)
#> [1] "facile resume"