[英]extracting numbers with units from string
I have a series of strings as below: 我有一系列字符串如下:
x <- " 20 to 80% of the sward should be between 3 and 10cm tall,
with 20 to 80% of the sward between 10 and 30cm tall"
I want to extract the numeric values and keep the units, I have tried the following: 我想提取数值并保留单位,我尝试了以下内容:
x <- lapply(x, function(x){gsub("[^\\d |cm\\b |mm\\b |% ]", "", x, perl = T)})
Which gives: 这使:
" 20 80% 3 10cm 20 80% 10 30cm "
What I need is: 我需要的是:
"20 80%" "3 10cm" "20 80%" "10 30cm"
Thanks for reading 谢谢阅读
We could use str_extract_all
from library(stringr)
to extract the elements that matches the pattern (modified based on comments from @PierreLafortune) 我们可以使用
str_extract_all
从library(stringr)
来提取模式(改性从@PierreLafortune评论)相匹配的元素
library(stringr)
lst <- str_extract_all(x, '\\d+\\S*')
If the length of the list
elements are the same, we can rbind
them to create a matrix
. 如果长度
list
元素是相同的,我们可以rbind
他们创造一个matrix
。
m1 <- do.call(rbind, lst)
paste
the alternating columns together 将交替的列
paste
在一起
v1 <- paste(m1[,c(TRUE, FALSE)], m1[,c(FALSE, TRUE)])
and convert it back to matrix
. 并将其转换回
matrix
。
dim(v1) <- c(nrow(m1), ncol(m1)/2)
v1
# [,1] [,2] [,3] [,4]
#[1,] "20 80%" "3 10cm" "20 80%" "10 30cm"
Not particularly elegant but... 不是特别优雅,但......
library(magrittr)
library(stringr)
library(dplyr)
library(plyr)
" 20 80% 3 10cm 20 80% 10 30cm " %>%
str_split(" ") %>%
unlist %>%
as.data.frame %>%
plyr::rename(replace = c("." = "string")) %$%
gsub(string, replacement = "", pattern = " ") %>%
as.data.frame %>%
plyr::rename(replace = c("." = "string")) %>%
filter(string != "") -> etc_etc
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.