简体   繁体   中英

Extracting string after specific word

I tried looking around for a similar question, but did not find any. I'm trying to extract the string immediately after a specific word.

I have a bunch of strings, but I only want to extract the string after "TaskItem:" . I tried using str_extract but was not able to get the output I need.

Here's some sample data:

sample <- structure(c(14L, 10L, 16L, 9L), .Label = c("", "crash: ae01531510acf7b30821ce9d3d28db889e6b1504; manufacture: samsung; cpu: arm64-v8a; opengl: 3; os: Android; orientation: Landscape; nonfatal: false; root: false; online: true; muted: false; background: false; app_version: 1.1.2; ram_current: 2468; ram_total: 3644; disk_current: 4649; disk_total: 4851; bat: 100; run: 1337;", 
"crash: ae01531510acf7b30821ce9d3d28db889e6b1504; manufacture: samsung; cpu: arm64-v8a; opengl: 3; os: Android; orientation: Landscape; nonfatal: false; root: false; online: true; muted: false; background: false; app_version: 1.1.2; ram_current: 2499; ram_total: 3644; disk_current: 4649; disk_total: 4851; bat: 100; run: 221;", 
"crash: ae01531510acf7b30821ce9d3d28db889e6b1504; manufacture: samsung; cpu: arm64-v8a; opengl: 3; os: Android; orientation: Landscape; nonfatal: false; root: false; online: true; muted: true; background: false; app_version: 1.1.2; ram_current: 2559; ram_total: 3644; disk_current: 4649; disk_total: 4851; bat: 100; run: 1215;", 
"crash: ae01531510acf7b30821ce9d3d28db889e6b1504; manufacture: samsung; cpu: arm64-v8a; opengl: 3; os: Android; orientation: Landscape; nonfatal: false; root: false; online: true; muted: true; background: false; app_version: 1.1.2; ram_current: 2627; ram_total: 3644; disk_current: 4649; disk_total: 4851; bat: 100; run: 235;", 
"crash: ae01531510acf7b30821ce9d3d28db889e6b1504; manufacture: samsung; cpu: arm64-v8a; opengl: 3; os: Android; orientation: Landscape; nonfatal: false; root: false; online: true; muted: true; background: false; app_version: 1.1.2; ram_current: 2655; ram_total: 3644; disk_current: 4649; disk_total: 4851; bat: 100; run: 115;", 
"crash: ae01531510acf7b30821ce9d3d28db889e6b1504; manufacture: samsung; cpu: arm64-v8a; opengl: 3; os: Android; orientation: Landscape; nonfatal: false; root: false; online: true; muted: true; background: false; app_version: 1.1.2; ram_current: 2656; ram_total: 3644; disk_current: 4649; disk_total: 4851; bat: 100; run: 1681;", 
"segment: Android; name: CalendarDetailActivity; visit: 1;", 
"segment: Android; name: MainActivity; visit: 1;", "segment: Android; name: OnBoardingActivity; visit: 1;", 
"segment: Android; name: SchedulePreferenceActivity; visit: 1;", 
"segment: Android; name: SplashActivity; start: 1; visit: 1;", 
"segment: Android; name: SplashActivity; visit: 1;", "TaskItem: CURATED_CONTENT;", 
"TaskItem: SCHEDULE_PREFERENCES;", "TaskItem: SCHEDULE;"), class = "factor")

So, in the above example, I would just like "TaskItem: SCHEDULE_PREFERENCES;" to return "SCHEDULE_PREFERENCES" (removing the semicolon would be great but not important) and "TaskItem: SCHEDULE;" to return "SCHEDULE" ; the other two can be NA . Any suggestions would be great. Thank you!

We can use str_extract with a regex lookaround

library(stringr)
str_extract(sample, "(?<=TaskItem:\\s)[^;]+")
#[1] "CURATED_CONTENT" NA                "SCHEDULE"        NA      

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM