[英]Extract a string of numbers (of varying length) from the middle of a larger string in r
我见过很多类似的问题(我从他们那里得到了一些想法),但我似乎无法让我的代码正常工作。 我正在使用 R。
我有一个看起来像这样的 dataframe:
df <- tribble(
~name, ~employment,
"Markisha", "{'url': 'https://api.zenefits.com/core/people/123/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'}",
"Rickisha", "{'url': 'https://api.zenefits.com/core/people/1234/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'}"
)
我想在“就业”列中提取“人”和“就业”之间的数字。结果看起来像这样。
correct_df <-tribble(~name, ~employment_id
"Markisha", "123",
"Rickisha", "1234"
)
我试图使用这个代码块
str_match(employment, "people/(.*?)/employments"
但我的结果看起来像这样
incorrect_df <- tribble(
~name, ~employment, ~employment_id,
"Markisha", "{'url': 'https://api.zenefits.com/core/people/123/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'}", "people/123/employments",
"Rickisha", "{'url': 'https://api.zenefits.com/core/people/1234/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'}", "people/1234/employments"
)
任何想法如何解决这一问题?
如果字符串中没有其他数字,则可以; 如果在这里,您将需要按照@akrun 的建议,使用环视来定义相关数字的左侧和/或右侧的上下文:
library(stringr)
df$num <- str_extract(df$employment, "\\d+")
df
# A tibble: 2 x 3
name employment num
<chr> <chr> <chr>
1 Markisha {'url': 'https://api.zenefits.com/core/people/123/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'} 123
2 Rickisha {'url': 'https://api.zenefits.com/core/people/1234/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'} 1234
在str_extract
中使用正则表达式环视来提取一个或多个数字 ( \\d+
),该数字在 'people/` 之后
library(dplyr)
library(stringr)
df %>%
transmute(name, employment_id = str_extract(employment, '(?<=people/)\\d+'))
-输出
# A tibble: 2 x 2
# name employment_id
# <chr> <chr>
#1 Markisha 123
#2 Rickisha 1234
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.