从 R 中的字符串中提取多个数值

Question

我有一个数据集，我想只从以下字符串中提取数值：

{  "What are the last three digits of your zip code?": "043",  "What are the last three digits of your phone number?": "681"}

具体来说，我想将其提取为两个单独的列（043 和 681）。 有没有办法用字符串中的这些符号来做到这一点？

Answer 1

我们可以使用str_extract_all

library(stringr)
str_extract_all(str1, "\\d+")[[1]]
#[1] "043" "681"

如果有多个元素，我们可以这样做

library(dplyr)
library(tidyr)
tibble(col1 = str2) %>%
    mutate(col1 = str_extract_all(str2, "\\d+")) %>%
    unnest_wider(c(col1)) %>%
    set_names(str_c('col', seq_along(.)))

-输出

# A tibble: 2 x 2
#  col1  col2 
#  <chr> <chr>
#1 043   681  
#2 313   681

数据

str1 <- "{ \"What are the last three digits of your zip code?\": \"043\", \"What are the last three digits of your phone number?\": \"681\"}"

str2 <- c('{  "What are the last three digits of your zip code?": "043",  "What are the last three digits of your phone number?": "681"}', '{  "What are the last three digits of your zip code?": "313",  "What are the last three digits of your phone number?": "681"}')

Answer 2

这是使用strsplit的基本 R 选项

> Map(function(x) x[nchar(x) > 0], strsplit(str1, "\\D+"))
[[1]]
[1] "043" "681"


> Map(function(x) x[nchar(x) > 0], strsplit(str2, "\\D+"))
[[1]]
[1] "043" "681"

[[2]]
[1] "313" "681"

Answer 3

底座 R 解决方案：

# Split numeric values into separate columns: df => data.frame
df <- data.frame(
  do.call(rbind, do.call(c, lapply(list(str1, str2), strsplit, "\\D+")))
)

# Subset data.frame to exclude blanks; correctly name vectors: res => data.frame
res <- setNames(df, paste0("col", seq_along(df)))[, colSums(df == "") < nrow(df)]

从 R 中的字符串中提取多个数值

问题描述

3 个解决方案

解决方案1
4 已采纳 2021-04-12 16:34:16

数据

解决方案2
2 2021-04-12 21:36:21

解决方案3
0 2021-04-13 11:32:30

从 R 中的字符串中提取多个数值

问题描述

3 个解决方案

解决方案1 4 已采纳 2021-04-12 16:34:16

数据

解决方案2 2 2021-04-12 21:36:21

解决方案3 0 2021-04-13 11:32:30

解决方案1
4 已采纳 2021-04-12 16:34:16

解决方案2
2 2021-04-12 21:36:21

解决方案3
0 2021-04-13 11:32:30