I'm trying to clean files using linux system commands in R
I would like to use a command that removes special characters apart from the file separator (pipe delimited)
In the example below it's the slashes and additional quotation marks that I'm trying to get rid of
1234|"PJDG"|"CHOCOLATES"|"CHOCOLATE CAKE"
1256|"GADG"|"CAKE \"HA"|"SPECIAL \"HAPPY CHRISTMAS\""
7657|"ASGD"|"WINE"|"RED WINE"
6777|"DAG"|"FRUIT"|"APPLES/LOOSE"
I've used the command below, but it doesn't appear to be removing the characters.
sed 's/\\"?//g' input_file.txt > output_file.txt;
If the file x.txt
looks like this
cat(readLines("x.txt"), sep = "\n")
# 1234|"PJDG"|"CHOCOLATES"|"CHOCOLATE CAKE"
# 1256|"GADG"|"CAKE \"HA"|"SPECIAL \"HAPPY CHRISTMAS\""
# 7657|"ASGD"|"WINE"|"RED WINE"
# 6777|"DAG"|"FRUIT"|"APPLES/LOOSE"
Then you can use sed
in system()
, like this
system("sed -e 's|[\\\"]||g' x.txt")
# 1234|PJDG|CHOCOLATES|CHOCOLATE CAKE
# 1256|GADG|CAKE HA|SPECIAL HAPPY CHRISTMAS
# 7657|ASGD|WINE|RED WINE
# 6777|DAG|FRUIT|APPLES/LOOSE
You can write that to file. Or if you want to return an R vector, add intern = TRUE
to the call
R system has own functions for that, not necessary to use system. Look at sub
, gsub
. Read your file using readLines
, edit it by sub
or gsub
and then save the resulting structure back to separated file.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.