简体   繁体   中英

Removing Unwanted Characters from String

Little bit of backstory:

I am reading a SAV file into R using read_sav() from haven. I am taking the labels found in the SAV file (accessed by attr(sav_file, "label") ). I would like to use these section labels as headers in a Latex document.

Here's the issue: Latex does not accept certain characters. Rendering rMarkdown produces the error "Package inputenc Error: Unicode char € (U+80) (inputenc) not set up for use with LaTeX."

Here's a small string sample that's causing the problem and examples of some of things I have tried:

unencoded_string <- "following statement? “Tourism is good"

Others have fixed this problem using methods like:

Encoding(unencoded_string) <- "UTF-8"

and

iconv(unencoded_string, to = "UTF-8")

These function calls result in removing bits of the unwanted characters, but I am still left with characters I do not want:

"following statement? “Tourism is good"

Other regular expression methods do not work.

Does anyone have something that might help, or point me in the right direction? I've run into this kind of problem before, but have always found a work-around.

It seems to work. Try this

txt = "following statement? “Tourism is good"
gsub("[^\\x00-\\x7F]+", "",txt, perl = TRUE)

> gsub("[^\\x00-\\x7F]+", "",txt, perl = TRUE)
[1] "following statement? Tourism is good"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM