简体   繁体   中英

How to remove all NAs in character strings in a dataframe column in R?

I have a CSV file like

LocationList,Identity,Category
"New York,New York,United States","42","S"
"NA,California,United States","89","lyt"
"Hartford,Connecticut,United States","879","polo"
"San Diego,California,United States","45454","utyr"
"Seattle,Washington,United States","uytr","69"
"NA,NA,United States","87","tree"

I want to remove all 'NA' from the 'LocationList' Column

The Desired Result -

 LocationList,Identity,Category
"New York,New York,United States","42","S"
"California,United States","89","lyt"
"Hartford,Connecticut,United States","879","polo"
"San Diego,California,United States","45454","utyr"
"Seattle,Washington,United States","uytr","69"
"United States","87","tree"

The number of columns are not fixed and they may increase or decrease. Also I want to write to the CSV file without quotes and without escaping for the 'LocationList' column.

How to achieve the following in R? New to R any help is appreciated.

In this case, you just want to replace the NA, with nothing. However, this is not the standard way to remove NA values.

Assuming dat is your data, use

dat$LocationList <- gsub("^(NA,)+", "", dat$LocationList)

Try:

my.data <- read.table(text='LocationList,Identity,Category
                      "New York,New York,United States","42","S"
                      "NA,California,United States","89","lyt"
                      "Hartford,Connecticut,United States","879","polo"
                      "San Diego,California,United States","45454","utyr"
                      "Seattle,Washington,United States","uytr","69"
                      "NA,NA,United States","87","tree"', header=T, sep=",")
my.data$LocationList <- gsub("NA,", "", my.data$LocationList)
my.data
#                         LocationList Identity Category
# 1    New York,New York,United States       42        S
# 2           California,United States       89      lyt
# 3 Hartford,Connecticut,United States      879     polo
# 4 San Diego,California,United States    45454     utyr
# 5   Seattle,Washington,United States     uytr       69
# 6                      United States       87     tree

If you get rid of the quotes when you write to a conventional csv file, you will have trouble reading the data in later. This is because you have commas already inside each value in the LocationList variable, so you would have commas both in the middle of fields and marking the break between fields. You might try using write.csv2() instead, which will indicate new fields with a semicolon ; . You could use:

write.csv2(my.data, file="myFile.csv", quote=FALSE, row.names=FALSE)

Which yields the following file:

LocationList;Identity;Category
New York,New York,United States;42;S
California,United States;89;lyt
Hartford,Connecticut,United States;879;polo
San Diego,California,United States;45454;utyr
Seattle,Washington,United States;uytr;69
United States;87;tree

( I now notice that the values for Identity and Category for row 5 are presumably messed up. You may want to switch those before writing to file. )

x             <- my.data[5, 2]
my.data[5, 2] <- my.data[5, 3]
my.data[5, 2] <- x
rm(x)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM