简体   繁体   中英

Using linux system commands in R to remove special characters

I'm trying to clean files using linux system commands in R

I would like to use a command that removes special characters apart from the file separator (pipe delimited)

In the example below it's the slashes and additional quotation marks that I'm trying to get rid of

1234|"PJDG"|"CHOCOLATES"|"CHOCOLATE CAKE"
1256|"GADG"|"CAKE \"HA"|"SPECIAL \"HAPPY CHRISTMAS\""
7657|"ASGD"|"WINE"|"RED WINE"
6777|"DAG"|"FRUIT"|"APPLES/LOOSE"

I've used the command below, but it doesn't appear to be removing the characters.

sed 's/\\"?//g' input_file.txt > output_file.txt;

If the file x.txt looks like this

cat(readLines("x.txt"), sep = "\n")
# 1234|"PJDG"|"CHOCOLATES"|"CHOCOLATE CAKE"
# 1256|"GADG"|"CAKE \"HA"|"SPECIAL \"HAPPY CHRISTMAS\""
# 7657|"ASGD"|"WINE"|"RED WINE"
# 6777|"DAG"|"FRUIT"|"APPLES/LOOSE"

Then you can use sed in system() , like this

system("sed -e 's|[\\\"]||g' x.txt")
# 1234|PJDG|CHOCOLATES|CHOCOLATE CAKE
# 1256|GADG|CAKE HA|SPECIAL HAPPY CHRISTMAS
# 7657|ASGD|WINE|RED WINE
# 6777|DAG|FRUIT|APPLES/LOOSE

You can write that to file. Or if you want to return an R vector, add intern = TRUE to the call

R system has own functions for that, not necessary to use system. Look at sub , gsub . Read your file using readLines , edit it by sub or gsub and then save the resulting structure back to separated file.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM