[英]Extracting Multiple Numerical Values from String in R
我有一个数据集,我想只从以下字符串中提取数值:
{ "What are the last three digits of your zip code?": "043", "What are the last three digits of your phone number?": "681"}
具体来说,我想将其提取为两个单独的列(043 和 681)。 有没有办法用字符串中的这些符号来做到这一点?
我们可以使用str_extract_all
library(stringr)
str_extract_all(str1, "\\d+")[[1]]
#[1] "043" "681"
如果有多个元素,我们可以这样做
library(dplyr)
library(tidyr)
tibble(col1 = str2) %>%
mutate(col1 = str_extract_all(str2, "\\d+")) %>%
unnest_wider(c(col1)) %>%
set_names(str_c('col', seq_along(.)))
-输出
# A tibble: 2 x 2
# col1 col2
# <chr> <chr>
#1 043 681
#2 313 681
str1 <- "{ \"What are the last three digits of your zip code?\": \"043\", \"What are the last three digits of your phone number?\": \"681\"}"
str2 <- c('{ "What are the last three digits of your zip code?": "043", "What are the last three digits of your phone number?": "681"}', '{ "What are the last three digits of your zip code?": "313", "What are the last three digits of your phone number?": "681"}')
这是使用strsplit
的基本 R 选项
> Map(function(x) x[nchar(x) > 0], strsplit(str1, "\\D+"))
[[1]]
[1] "043" "681"
> Map(function(x) x[nchar(x) > 0], strsplit(str2, "\\D+"))
[[1]]
[1] "043" "681"
[[2]]
[1] "313" "681"
底座 R 解决方案:
# Split numeric values into separate columns: df => data.frame
df <- data.frame(
do.call(rbind, do.call(c, lapply(list(str1, str2), strsplit, "\\D+")))
)
# Subset data.frame to exclude blanks; correctly name vectors: res => data.frame
res <- setNames(df, paste0("col", seq_along(df)))[, colSums(df == "") < nrow(df)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.