[英]R: Replacing empty string with “0” in data frame column results in all column values being replaced with “0”
I have a R dataframe called newdata
which I have read in using read.csv()
:我有一个名为
newdata
的 R dataframe ,我已经使用read.csv()
读过它:
PROPDMGEXP EVTYPE
"K" WIND
"M" HAIL
"H" TORNADO
"B" WIND
"+" HIGH WIND
"-" TORNADO
"?" HURRICANE
WIND
TORNADO
"k" HAIL
The blank values in the PROPDMGEXP
column were blank cells in the CSV file from which the data was read. PROPDMGEXP
列中的空白值是从中读取数据的 CSV 文件中的空白单元格。 I am assuming that they are empty strings.我假设它们是空字符串。 I want to replace the values in the
PROPDMGEXP
column with other values, and I have used regex to do so:我想用其他值替换
PROPDMGEXP
列中的值,并且我使用了正则表达式:
newdata$PROPDMGEXP[grepl("K", newdata$PROPDMGEXP, ignore.case = TRUE)] <- "10^3"
newdata$PROPDMGEXP[grepl("H", newdata$PROPDMGEXP, ignore.case = TRUE)] <- "10^2"
newdata$PROPDMGEXP[grepl("M", newdata$PROPDMGEXP, ignore.case = TRUE)] <- "10^6"
newdata$PROPDMGEXP[grepl("B", newdata$PROPDMGEXP, ignore.case = TRUE)] <- "10^9"
newdata$PROPDMGEXP[grepl("+", newdata$PROPDMGEXP, fixed = TRUE)] <- "1"
newdata$PROPDMGEXP[grepl("-", newdata$PROPDMGEXP, fixed = TRUE)] <- "0"
newdata$PROPDMGEXP[grepl("?", newdata$PROPDMGEXP, fixed = TRUE)] <- "0"
newdata$PROPDMGEXP[grepl("", newdata$PROPDMGEXP)] <- "0"
I have checked whether the values are being replaced through subsetting the data and printing it out:我已经通过子集数据和打印出来检查了这些值是否被替换:
mydata <- subset(newdata, PROPDMGEXP == "10^6", select=c(EVTYPE, PROPDMGEXP))
mydata1 <- subset(newdata, PROPDMGEXP == "10^3", select=c(EVTYPE, PROPDMGEXP))
mydata2 <- subset(newdata, PROPDMGEXP == "10^2", select=c(EVTYPE, PROPDMGEXP))
mydata3 <- subset(newdata, PROPDMGEXP == "10^9", select=c(EVTYPE, PROPDMGEXP))
mydata4 <- subset(newdata, PROPDMGEXP == "1", select=c(EVTYPE, PROPDMGEXP))
mydata5 <- subset(newdata, PROPDMGEXP == "0", select=c(EVTYPE, PROPDMGEXP))
mydata7 <- subset(newdata, PROPDMGEXP == "-", select=c(EVTYPE, PROPDMGEXP))
mydata8 <- subset(newdata, PROPDMGEXP == "?", select=c(EVTYPE, PROPDMGEXP))
mydata9 <- subset(newdata, PROPDMGEXP == "", select=c(EVTYPE, PROPDMGEXP))
print(head(mydata))
print(head(mydata1))
print(head(mydata2))
print(head(mydata3))
print(head(mydata4))
print("Printing 0...")
print(head(mydata5))
print("Printing -")
print(head(mydata7))
print("Printing ?")
print(head(mydata8))
print("Printing blank")
print(head(mydata9))
I have found that when I replace the empty string values in the PROPDMGEXP
column with "0", this results in all the other replaced values that are not "0" (eg "10^3", "10^2", "10^6", etc) in the PROPDMGEXP
column being replaced with "0".我发现当我将
PROPDMGEXP
列中的空字符串值替换为“0”时,这会导致所有其他非“0”的替换值(例如“10^3”、“10^2”、“10 PROPDMGEXP
列中的 ^6" 等)被替换为 "0"。
I have verified this through the following:我已经通过以下方式验证了这一点:
print(dim(newdata))
> 902297 7
print(length(which(newdata$PROPDMGEXP == "0")))
> 902297
I am not sure why this is occurring.我不确定为什么会发生这种情况。 Any insights are appreciated.
任何见解都值得赞赏。
The problem is grepl("", newdata$PROPDMGEXP)
is all TRUE
.问题是
grepl("", newdata$PROPDMGEXP)
都是TRUE
。 If you ever need to use this kind of approach again, better use如果您需要再次使用这种方法,最好使用
newdata$PROPDMGEXP[newdata$PROPDMGEXP==""] <- "0"
especially if you are doing complete matching and/or you know exactly the string you're going to replace;特别是如果您正在进行完全匹配和/或您确切知道要替换的字符串; or if it's empty.
或者如果它是空的。 Because it will match only empty strings instead of match with everything.
因为它只会匹配空字符串而不是匹配所有内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.