简体   繁体   English

R:将数据框列中的空字符串替换为“0”会导致所有列值都替换为“0”

[英]R: Replacing empty string with “0” in data frame column results in all column values being replaced with “0”

I have a R dataframe called newdata which I have read in using read.csv() :我有一个名为newdata的 R dataframe ,我已经使用read.csv()读过它:

PROPDMGEXP   EVTYPE

"K"          WIND
"M"          HAIL
"H"          TORNADO
"B"          WIND
"+"          HIGH WIND
"-"          TORNADO
"?"          HURRICANE
             WIND
             TORNADO
"k"          HAIL

The blank values in the PROPDMGEXP column were blank cells in the CSV file from which the data was read. PROPDMGEXP列中的空白值是从中读取数据的 CSV 文件中的空白单元格。 I am assuming that they are empty strings.我假设它们是空字符串。 I want to replace the values in the PROPDMGEXP column with other values, and I have used regex to do so:我想用其他值替换PROPDMGEXP列中的值,并且我使用了正则表达式:

newdata$PROPDMGEXP[grepl("K", newdata$PROPDMGEXP, ignore.case = TRUE)] <- "10^3"

newdata$PROPDMGEXP[grepl("H", newdata$PROPDMGEXP, ignore.case = TRUE)] <- "10^2"

newdata$PROPDMGEXP[grepl("M", newdata$PROPDMGEXP, ignore.case = TRUE)] <- "10^6"

newdata$PROPDMGEXP[grepl("B", newdata$PROPDMGEXP, ignore.case = TRUE)] <- "10^9"

newdata$PROPDMGEXP[grepl("+", newdata$PROPDMGEXP, fixed = TRUE)] <- "1"

newdata$PROPDMGEXP[grepl("-", newdata$PROPDMGEXP, fixed = TRUE)] <- "0"

newdata$PROPDMGEXP[grepl("?", newdata$PROPDMGEXP, fixed = TRUE)] <- "0"

newdata$PROPDMGEXP[grepl("", newdata$PROPDMGEXP)] <- "0"

I have checked whether the values are being replaced through subsetting the data and printing it out:我已经通过子集数据和打印出来检查了这些值是否被替换:

mydata <- subset(newdata, PROPDMGEXP == "10^6", select=c(EVTYPE, PROPDMGEXP))

mydata1 <- subset(newdata, PROPDMGEXP == "10^3", select=c(EVTYPE, PROPDMGEXP))

mydata2 <- subset(newdata, PROPDMGEXP == "10^2", select=c(EVTYPE, PROPDMGEXP))

mydata3 <- subset(newdata, PROPDMGEXP == "10^9", select=c(EVTYPE, PROPDMGEXP))

mydata4 <- subset(newdata, PROPDMGEXP == "1", select=c(EVTYPE, PROPDMGEXP))

mydata5 <- subset(newdata, PROPDMGEXP == "0", select=c(EVTYPE, PROPDMGEXP))

mydata7 <- subset(newdata, PROPDMGEXP == "-", select=c(EVTYPE, PROPDMGEXP))

mydata8 <- subset(newdata, PROPDMGEXP == "?", select=c(EVTYPE, PROPDMGEXP))

mydata9 <- subset(newdata, PROPDMGEXP == "", select=c(EVTYPE, PROPDMGEXP))

print(head(mydata))

print(head(mydata1))

print(head(mydata2))

print(head(mydata3))

print(head(mydata4))

print("Printing 0...")
print(head(mydata5))
   
print("Printing -")
print(head(mydata7))

print("Printing ?")
print(head(mydata8))

print("Printing blank")
print(head(mydata9))

I have found that when I replace the empty string values in the PROPDMGEXP column with "0", this results in all the other replaced values that are not "0" (eg "10^3", "10^2", "10^6", etc) in the PROPDMGEXP column being replaced with "0".我发现当我将PROPDMGEXP列中的空字符串值替换为“0”时,这会导致所有其他非“0”的替换值(例如“10^3”、“10^2”、“10 PROPDMGEXP列中的 ^6" 等)被替换为 "0"。

I have verified this through the following:我已经通过以下方式验证了这一点:

print(dim(newdata))
> 902297      7

print(length(which(newdata$PROPDMGEXP == "0")))
> 902297

I am not sure why this is occurring.我不确定为什么会发生这种情况。 Any insights are appreciated.任何见解都值得赞赏。

The problem is grepl("", newdata$PROPDMGEXP) is all TRUE .问题是grepl("", newdata$PROPDMGEXP)都是TRUE If you ever need to use this kind of approach again, better use如果您需要再次使用这种方法,最好使用

newdata$PROPDMGEXP[newdata$PROPDMGEXP==""] <- "0"

especially if you are doing complete matching and/or you know exactly the string you're going to replace;特别是如果您正在进行完全匹配和/或您确切知道要替换的字符串; or if it's empty.或者如果它是空的。 Because it will match only empty strings instead of match with everything.因为它只会匹配空字符串而不是匹配所有内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM