简体   繁体   English

从数据框中的字符串中提取数字

[英]Extracting numbers from a string in a dataframe

I was hoping somebody would be able to show me a way to extract data from a character vector. 我希望有人能够向我展示一种从字符向量中提取数据的方法。

The dataframe is as below 数据框如下

structure(list(Sensitivity = structure(c(1L, 5L, 4L, 4L, 4L, 
4L, 3L, 5L, 2L), .Label = c("    1.01 [ 0.21, 2.91]", "   89.60 [ 85.56, 92.82]", 
"   92.95 [ 89.43, 95.59]", "   99.66 [ 98.14, 99.99]", "  100.00 [ 98.77, 100.00]"
), class = "factor"), Specificity = structure(c(8L, 1L, 3L, 4L, 
2L, 5L, 6L, 1L, 7L), .Label = c("   27.17 [ 25.15, 29.26]", "   44.96 [ 42.67,   47.26]", 
"   53.31 [ 51.00, 55.61]", "   69.90 [ 67.75, 71.99]", "   70.23 [ 68.08, 72.31]", 
"   90.18 [ 88.73, 91.50]", "   91.70 [ 90.35, 92.92]", "  100.00 [ 99.80, 100.00]"
), class = "factor")), .Names = c("Sensitivity", "Specificity"
), class = "data.frame", row.names = c(NA, -9L))

As an example for the first column element of the first column i would ideally get three columns of data of 1.01, 0.21 and 2.91. 以第一列的第一列元素为例,理想情况下,我将获得三列数据,分别为1.01、0.21和2.91。

The first and second numerical value is separated by a "[" and the second and third by a ",". 第一和第二数值用“ [”隔开,第二和第三数值用“,”隔开。 I am not au fait with grep but have tried using and am going wrong somewhere! 我对grep不太满意,但是尝试过使用并且在某处出错了!

Here is a regular expression solution you can try with using the str_extract_all from stringr package, where we use \\\\d+\\\\.\\\\d+ to match decimal numbers which start from one or more digits followed by . 这是一个正则表达式解决方案,您可以尝试使用stringr软件包中的str_extract_all ,在这里我们使用\\\\d+\\\\.\\\\d+匹配从一个或多个数字后跟的十进制数字. and another one or more digits pattern. 和另一个一个或多个数字模式。

library(stringr)
lapply(df, function(col) do.call(rbind, str_extract_all(col, "\\d+\\.\\d+")))

$Sensitivity
      [,1]     [,2]    [,3]    
 [1,] "1.01"   "0.21"  "2.91"  
 [2,] "100.00" "98.77" "100.00"
 [3,] "99.66"  "98.14" "99.99" 
 [4,] "99.66"  "98.14" "99.99" 
 [5,] "99.66"  "98.14" "99.99" 
 [6,] "99.66"  "98.14" "99.99" 
 [7,] "92.95"  "89.43" "95.59" 
 [8,] "100.00" "98.77" "100.00"
 [9,] "89.60"  "85.56" "92.82" 

$Specificity
      [,1]     [,2]    [,3]    
 [1,] "100.00" "99.80" "100.00"
 [2,] "27.17"  "25.15" "29.26" 
 [3,] "53.31"  "51.00" "55.61" 
 [4,] "69.90"  "67.75" "71.99" 
 [5,] "44.96"  "42.67" "47.26" 
 [6,] "70.23"  "68.08" "72.31" 
 [7,] "90.18"  "88.73" "91.50" 
 [8,] "27.17"  "25.15" "29.26" 
 [9,] "91.70"  "90.35" "92.92" 

Try this: 尝试这个:

cbind(
 matrix(as.numeric(unlist(strsplit(unlist(strsplit(gsub("]","",
          dat$Sensitivity), ",")),"\\["))),ncol=3,byrow = T)
 ,
 matrix(as.numeric(unlist(strsplit(unlist(strsplit(gsub("]","",
          dat$Specificity), ",")),"\\["))),ncol=3,byrow = T)
)

        [,1]  [,2]   [,3]   [,4]  [,5]   [,6]
 [1,]   1.01  0.21   2.91 100.00 99.80 100.00
 [2,] 100.00 98.77 100.00  27.17 25.15  29.26
 [3,]  99.66 98.14  99.99  53.31 51.00  55.61
 [4,]  99.66 98.14  99.99  69.90 67.75  71.99
 [5,]  99.66 98.14  99.99  44.96 42.67  47.26
 [6,]  99.66 98.14  99.99  70.23 68.08  72.31
 [7,]  92.95 89.43  95.59  90.18 88.73  91.50
 [8,] 100.00 98.77 100.00  27.17 25.15  29.26
 [9,]  89.60 85.56  92.82  91.70 90.35  92.92

Here is an option using base R to extract the numeric part with the type as numeric 这是使用base R提取类型为numeric的数字部分的选项

lst <- lapply(d1, function(x) read.csv(text=gsub("[][]", ", ", x), header=FALSE)[-4])
lst
#$Sensitivity
#      V1    V2     V3
#1   1.01  0.21   2.91
#2 100.00 98.77 100.00
#3  99.66 98.14  99.99
#4  99.66 98.14  99.99
#5  99.66 98.14  99.99
#6  99.66 98.14  99.99
#7  92.95 89.43  95.59
#8 100.00 98.77 100.00
#9  89.60 85.56  92.82

#$Specificity
#      V1    V2     V3
#1 100.00 99.80 100.00
#2  27.17 25.15  29.26
#3  53.31 51.00  55.61
#4  69.90 67.75  71.99
#5  44.96 42.67  47.26
#6  70.23 68.08  72.31
#7  90.18 88.73  91.50
#8  27.17 25.15  29.26
#9  91.70 90.35  92.92

If needed, the list of data.frame s can be converted to a single data.frame by cbind ing 如果需要, listdata.frame s时,可以转换为一个单一data.frame通过cbind ING

do.call(cbind, lst)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM