简体   繁体   English

用字符串中的单位提取数字

[英]extracting numbers with units from string

I have a series of strings as below: 我有一系列字符串如下:

x <- " 20 to 80% of the sward should be between 3 and 10cm tall, 
with 20 to 80% of the sward between 10 and 30cm tall"

I want to extract the numeric values and keep the units, I have tried the following: 我想提取数值并保留单位,我尝试了以下内容:

x <- lapply(x, function(x){gsub("[^\\d |cm\\b |mm\\b |% ]", "", x, perl = T)})

Which gives: 这使:

" 20  80%       3  10cm   20  80%     10  30cm "

What I need is: 我需要的是:

"20 80%" "3 10cm" "20 80%" "10 30cm" 

Thanks for reading 谢谢阅读

We could use str_extract_all from library(stringr) to extract the elements that matches the pattern (modified based on comments from @PierreLafortune) 我们可以使用str_extract_alllibrary(stringr)来提取模式(改性从@PierreLafortune评论)相匹配的元素

library(stringr)
lst <-  str_extract_all(x, '\\d+\\S*')

If the length of the list elements are the same, we can rbind them to create a matrix . 如果长度list元素是相同的,我们可以rbind他们创造一个matrix

m1 <- do.call(rbind, lst)

paste the alternating columns together 将交替的列paste在一起

v1 <- paste(m1[,c(TRUE, FALSE)], m1[,c(FALSE, TRUE)])

and convert it back to matrix . 并将其转换回matrix

dim(v1) <- c(nrow(m1), ncol(m1)/2)
v1
#     [,1]     [,2]     [,3]     [,4]     
#[1,] "20 80%" "3 10cm" "20 80%" "10 30cm"

Not particularly elegant but... 不是特别优雅,但......

library(magrittr)
library(stringr)
library(dplyr)
library(plyr)
" 20  80%       3  10cm   20  80%     10  30cm " %>%
str_split(" ") %>%
unlist %>% 
as.data.frame %>% 
    plyr::rename(replace = c("." = "string")) %$%
    gsub(string, replacement = "", pattern = " ") %>%
    as.data.frame %>% 
    plyr::rename(replace = c("." = "string")) %>%
    filter(string != "") -> etc_etc

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM