[英]Extracting numbers from a string in a dataframe
I was hoping somebody would be able to show me a way to extract data from a character vector. 我希望有人能够向我展示一种从字符向量中提取数据的方法。
The dataframe is as below 数据框如下
structure(list(Sensitivity = structure(c(1L, 5L, 4L, 4L, 4L,
4L, 3L, 5L, 2L), .Label = c(" 1.01 [ 0.21, 2.91]", " 89.60 [ 85.56, 92.82]",
" 92.95 [ 89.43, 95.59]", " 99.66 [ 98.14, 99.99]", " 100.00 [ 98.77, 100.00]"
), class = "factor"), Specificity = structure(c(8L, 1L, 3L, 4L,
2L, 5L, 6L, 1L, 7L), .Label = c(" 27.17 [ 25.15, 29.26]", " 44.96 [ 42.67, 47.26]",
" 53.31 [ 51.00, 55.61]", " 69.90 [ 67.75, 71.99]", " 70.23 [ 68.08, 72.31]",
" 90.18 [ 88.73, 91.50]", " 91.70 [ 90.35, 92.92]", " 100.00 [ 99.80, 100.00]"
), class = "factor")), .Names = c("Sensitivity", "Specificity"
), class = "data.frame", row.names = c(NA, -9L))
As an example for the first column element of the first column i would ideally get three columns of data of 1.01, 0.21 and 2.91. 以第一列的第一列元素为例,理想情况下,我将获得三列数据,分别为1.01、0.21和2.91。
The first and second numerical value is separated by a "[" and the second and third by a ",". 第一和第二数值用“ [”隔开,第二和第三数值用“,”隔开。 I am not au fait with grep but have tried using and am going wrong somewhere!
我对grep不太满意,但是尝试过使用并且在某处出错了!
Here is a regular expression solution you can try with using the str_extract_all
from stringr
package, where we use \\\\d+\\\\.\\\\d+
to match decimal numbers which start from one or more digits followed by .
这是一个正则表达式解决方案,您可以尝试使用
stringr
软件包中的str_extract_all
,在这里我们使用\\\\d+\\\\.\\\\d+
匹配从一个或多个数字后跟的十进制数字.
and another one or more digits pattern. 和另一个一个或多个数字模式。
library(stringr)
lapply(df, function(col) do.call(rbind, str_extract_all(col, "\\d+\\.\\d+")))
$Sensitivity
[,1] [,2] [,3]
[1,] "1.01" "0.21" "2.91"
[2,] "100.00" "98.77" "100.00"
[3,] "99.66" "98.14" "99.99"
[4,] "99.66" "98.14" "99.99"
[5,] "99.66" "98.14" "99.99"
[6,] "99.66" "98.14" "99.99"
[7,] "92.95" "89.43" "95.59"
[8,] "100.00" "98.77" "100.00"
[9,] "89.60" "85.56" "92.82"
$Specificity
[,1] [,2] [,3]
[1,] "100.00" "99.80" "100.00"
[2,] "27.17" "25.15" "29.26"
[3,] "53.31" "51.00" "55.61"
[4,] "69.90" "67.75" "71.99"
[5,] "44.96" "42.67" "47.26"
[6,] "70.23" "68.08" "72.31"
[7,] "90.18" "88.73" "91.50"
[8,] "27.17" "25.15" "29.26"
[9,] "91.70" "90.35" "92.92"
Try this: 尝试这个:
cbind(
matrix(as.numeric(unlist(strsplit(unlist(strsplit(gsub("]","",
dat$Sensitivity), ",")),"\\["))),ncol=3,byrow = T)
,
matrix(as.numeric(unlist(strsplit(unlist(strsplit(gsub("]","",
dat$Specificity), ",")),"\\["))),ncol=3,byrow = T)
)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1.01 0.21 2.91 100.00 99.80 100.00
[2,] 100.00 98.77 100.00 27.17 25.15 29.26
[3,] 99.66 98.14 99.99 53.31 51.00 55.61
[4,] 99.66 98.14 99.99 69.90 67.75 71.99
[5,] 99.66 98.14 99.99 44.96 42.67 47.26
[6,] 99.66 98.14 99.99 70.23 68.08 72.31
[7,] 92.95 89.43 95.59 90.18 88.73 91.50
[8,] 100.00 98.77 100.00 27.17 25.15 29.26
[9,] 89.60 85.56 92.82 91.70 90.35 92.92
Here is an option using base R
to extract the numeric part with the type as numeric
这是使用
base R
提取类型为numeric
的数字部分的选项
lst <- lapply(d1, function(x) read.csv(text=gsub("[][]", ", ", x), header=FALSE)[-4])
lst
#$Sensitivity
# V1 V2 V3
#1 1.01 0.21 2.91
#2 100.00 98.77 100.00
#3 99.66 98.14 99.99
#4 99.66 98.14 99.99
#5 99.66 98.14 99.99
#6 99.66 98.14 99.99
#7 92.95 89.43 95.59
#8 100.00 98.77 100.00
#9 89.60 85.56 92.82
#$Specificity
# V1 V2 V3
#1 100.00 99.80 100.00
#2 27.17 25.15 29.26
#3 53.31 51.00 55.61
#4 69.90 67.75 71.99
#5 44.96 42.67 47.26
#6 70.23 68.08 72.31
#7 90.18 88.73 91.50
#8 27.17 25.15 29.26
#9 91.70 90.35 92.92
If needed, the list
of data.frame
s can be converted to a single data.frame
by cbind
ing 如果需要,
list
的data.frame
s时,可以转换为一个单一data.frame
通过cbind
ING
do.call(cbind, lst)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.