繁体   English   中英

从 r 中较大字符串的中间提取一串数字(长度不同)

[英]Extract a string of numbers (of varying length) from the middle of a larger string in r

我见过很多类似的问题(我从他们那里得到了一些想法),但我似乎无法让我的代码正常工作。 我正在使用 R。

我有一个看起来像这样的 dataframe:

df <- tribble(
  ~name, ~employment, 
  "Markisha", "{'url': 'https://api.zenefits.com/core/people/123/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'}",
  "Rickisha", "{'url': 'https://api.zenefits.com/core/people/1234/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'}"
)

我想在“就业”列中提取“人”和“就业”之间的数字。结果看起来像这样。

correct_df <-tribble(~name, ~employment_id
                     "Markisha", "123", 
                     "Rickisha", "1234"
                     )

我试图使用这个代码块

str_match(employment, "people/(.*?)/employments"

但我的结果看起来像这样

incorrect_df <- tribble(
  ~name, ~employment, ~employment_id, 
  "Markisha", "{'url': 'https://api.zenefits.com/core/people/123/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'}", "people/123/employments",
  "Rickisha", "{'url': 'https://api.zenefits.com/core/people/1234/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'}", "people/1234/employments"
)

任何想法如何解决这一问题?

如果字符串中没有其他数字,则可以; 如果在这里,您将需要按照@akrun 的建议,使用环视来定义相关数字的左侧和/或右侧的上下文:

library(stringr)
df$num <- str_extract(df$employment, "\\d+")
df
# A tibble: 2 x 3
  name     employment                                                                                                                      num  
  <chr>    <chr>                                                                                                                           <chr>
1 Markisha {'url': 'https://api.zenefits.com/core/people/123/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'}  123  
2 Rickisha {'url': 'https://api.zenefits.com/core/people/1234/employments', 'object': '/meta/ref/list', 'ref_object': '/core/employments'} 1234 

str_extract中使用正则表达式环视来提取一个或多个数字 ( \\d+ ),该数字在 'people/` 之后

library(dplyr)
library(stringr)
df %>%
  transmute(name, employment_id = str_extract(employment, '(?<=people/)\\d+'))

-输出

# A tibble: 2 x 2
#  name     employment_id
#   <chr>    <chr>        
#1 Markisha 123          
#2 Rickisha 1234      

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM