繁体   English   中英

从 R 中的字符串中提取多个数值

[英]Extracting Multiple Numerical Values from String in R

我有一个数据集,我想只从以下字符串中提取数值:

{  "What are the last three digits of your zip code?": "043",  "What are the last three digits of your phone number?": "681"}

具体来说,我想将其提取为两个单独的列(043 和 681)。 有没有办法用字符串中的这些符号来做到这一点?

我们可以使用str_extract_all

library(stringr)
str_extract_all(str1, "\\d+")[[1]]
#[1] "043" "681"

如果有多个元素,我们可以这样做

library(dplyr)
library(tidyr)
tibble(col1 = str2) %>%
    mutate(col1 = str_extract_all(str2, "\\d+")) %>%
    unnest_wider(c(col1)) %>%
    set_names(str_c('col', seq_along(.)))

-输出

# A tibble: 2 x 2
#  col1  col2 
#  <chr> <chr>
#1 043   681  
#2 313   681  

数据

str1 <- "{ \"What are the last three digits of your zip code?\": \"043\", \"What are the last three digits of your phone number?\": \"681\"}"

str2 <- c('{  "What are the last three digits of your zip code?": "043",  "What are the last three digits of your phone number?": "681"}', '{  "What are the last three digits of your zip code?": "313",  "What are the last three digits of your phone number?": "681"}')

这是使用strsplit的基本 R 选项

> Map(function(x) x[nchar(x) > 0], strsplit(str1, "\\D+"))
[[1]]
[1] "043" "681"


> Map(function(x) x[nchar(x) > 0], strsplit(str2, "\\D+"))
[[1]]
[1] "043" "681"

[[2]]
[1] "313" "681"

底座 R 解决方案:

# Split numeric values into separate columns: df => data.frame
df <- data.frame(
  do.call(rbind, do.call(c, lapply(list(str1, str2), strsplit, "\\D+")))
)

# Subset data.frame to exclude blanks; correctly name vectors: res => data.frame
res <- setNames(df, paste0("col", seq_along(df)))[, colSums(df == "") < nrow(df)]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM