[英]R: Extract largest number from character string with mixed digits and letters
最好,我正在寻找dplyr
解决方案。
我有
> str(p)
'data.frame': 25 obs. of 1 variable:
$ intram_size: chr "5" "4,7 x 6,6 mm" "4x6x7 mm" "5" ...
和
> head(p)
intram_size
1 5
2 4,7 x 6,6 mm
3 4x6x7 mm
4 5
5 4x11
6 1x4
p$intram_size
表示某个肿瘤的二维测量值。 我需要提取最大的数字,即测量的最大直径。 一个问题是,
已被使用。
Expected output
> head(p)
intram_size new
1 5 5
2 4,7 x 6,6 mm 6.6
3 4x6x7 mm 7
4 5 5
5 4x11 11
6 1x4 4
数据样本
p <- structure(list(intram_size = c("5", "4,7 x 6,6 mm", "4x6x7 mm",
"5", "4x11", "1x4", "7x10", "8", "3", "7", "7x4x3", "10x5", "8",
"7", "11", "7", "10", "5", "13", "5", "3,5", "10", "2,5", "7",
"11 x 6 x 4")), row.names = c(NA, 25L), class = "data.frame")
library(tidyverse)
p %>%
mutate(intram_size = str_replace_all(intram_size, ',', '.'),
new = str_extract_all(intram_size, '\\d+(\\.\\d+)?'),
new = map_dbl(new, ~max(as.numeric(.x))))
# intram_size new
#1 5 5.0
#2 4.7 x 6.6 mm 6.6
#3 4x6x7 mm 7.0
#4 5 5.0
#5 4x11 11.0
#6 1x4 4.0
#7 7x10 10.0
#8 8 8.0
#9 3 3.0
#10 7 7.0
#11 7x4x3 7.0
#12 10x5 10.0
#13 8 8.0
#14 7 7.0
#15 11 11.0
#16 7 7.0
#17 10 10.0
#18 5 5.0
#19 13 13.0
#20 5 5.0
#21 3.5 3.5
#22 10 10.0
#23 2.5 2.5
#24 7 7.0
#25 11 x 6 x 4 11.0
使用 dplyr(添加和修改列)和 stringr(提取模式),过程可能如下所示:
# sample data
p <- structure(list(intram_size = c("5", "4,7 x 6,6 mm", "4x6x7 mm",
"5", "4x11", "1x4", "7x10", "8", "3", "7", "7x4x3", "10x5", "8",
"7", "11", "7", "10", "5", "13", "5", "3,5", "10", "2,5", "7",
"11 x 6 x 4")), row.names = c(NA, 25L), class = "data.frame")
library(dplyr)
library(stringr)
mod <- p %>%
# replace decimal separator
mutate(intram_size = str_replace_all(intram_size, ",", "."),
# extract numbers
split = str_extract_all(intram_size, "[0-9\\.]+")) %>%
rowwise() %>%
# convert to right data type
mutate(num = list(as.numeric(split)),
# find maximum
max = max(num, na.rm = TRUE))
head(mod)
#> # A tibble: 6 x 4
#> # Rowwise:
#> intram_size split num max
#> <chr> <list> <list> <dbl>
#> 1 5 <chr [1]> <dbl [1]> 5
#> 2 4.7 x 6.6 mm <chr [2]> <dbl [2]> 6.6
#> 3 4x6x7 mm <chr [3]> <dbl [3]> 7
#> 4 5 <chr [1]> <dbl [1]> 5
#> 5 4x11 <chr [2]> <dbl [2]> 11
#> 6 1x4 <chr [2]> <dbl [2]> 4
由reprex package (v0.3.0) 于 2020 年 12 月 3 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.