简体   繁体   English

如何用正则表达式清理 dataframe 列?

[英]How to clean up dataframe column with regular expression?

Consider this dataframe:考虑这个 dataframe:

df <- data.frame(Index=c(1:4),
                  Perc1=c("SC(23.43%","12.21%","","(18.44%"))
  Index     Perc1
1     1 SC(23.43%
2     2    12.21%
3     3          
4     4   (18.44%

The goal is to clean up its column Perc1 with regex.目标是用正则表达式清理它的Perc1列。

Desired result:期望的结果:

  Index  Perc1
1     1 0.2343
2     2 0.1221
3     3       
4     4 0.1844

I tried the following code, but I get an error and a wrong result.我尝试了以下代码,但出现错误和错误结果。

pattern <- ".*([0-9]+.[0-9]{2})%"
ind <- grep(pattern, df$Perc1, value = FALSE)
df$Perc1 <- sub(pattern, "\\1", df$Perc1)
df$Perc1[-ind] <- NA
df$Perc1 <- as.numeric(df$perc1)/100

You can use readr::parse_number to get the number from Perc1 directly.您可以使用readr::parse_number直接从Perc1获取数字。

transform(df, Perc1 = readr::parse_number(Perc1)/100)

#. Index  Perc1
#1     1 0.2343
#2     2 0.1221
#3     3     NA
#4     4 0.1844

You can use regexpr and regmatches to extract the numbers.您可以使用regexprregmatches来提取数字。

r <- regexpr("\\d*\\.?\\d*(?=%)", df$Perc1, perl=TRUE)
df$Perc1 <- as.numeric(`[<-`(rep(NA, length(r)), r!=-1, regmatches(df$Perc1, r))) / 100
df
#  Index  Perc1
#1     1 0.2343
#2     2 0.1221
#3     3     NA
#4     4 0.1844

And with your approach:并使用您的方法:

pattern <- ".*?([0-9]+.[0-9]{2})%"   #Adding ? after *
ind <- grepl(pattern, df$Perc1)      #Change to grepl to get logical vector
df$Perc1 <- sub(pattern, "\\1", df$Perc1)
df$Perc1[!ind] <- NA                 #Invert the logical vector
df$Perc1 <- as.numeric(df$Perc1)/100 #There was a typo perc1 instead of Perc1
df
#  Index  Perc1
#1     1 0.2343
#2     2 0.1221
#3     3     NA
#4     4 0.1844

You can str_extract and convert the digits to numeric:您可以str_extract并将数字转换为数字:

library(stringr)
df$Perc1 <- as.numeric(str_extract(df$Perc1, "\\d\\d\\.\\d\\d"))/100

Result:结果:

df
  Index  Perc1
1     1 0.2343
2     2 0.1221
3     3     NA
4     4 0.1844

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM