简体   繁体   中英

Regex for extracting all words between word and character

i know basic of regex performing with R. But here i have a file like :

**[2016-04-28 14:00:06,603],,,,,SERVICE_ID=441,DEBUG,DBSEntryServlet,DBSEntryServlet: delegateToRequestManager:: SERVICE_ID=541,SERVICE_ID=9981

[2016-04-28 14:00:06,608],,,,,,DEBUG,DBSEntryServlet,10.91.39.143:60801 SERVICE_ID=00234,SERVICE_ID=11134,IMD=6767**

I wanted to extract timestamp alongwith all the SERVICE_ID in that line.

So, my expected output is:

[2016-04-28 14:00:06,603] SERVICE_ID=441 SERVICE_ID=541 SERVICE_ID=9981

[2016-04-28 14:00:06,608] SERVICE_ID=00234 SERVICE_ID=11134

The code which I tried was only extracting one SERVICE_ID.

library(qdapRegex)

a <- readLines("C:\\MY_FOLDER\\vinita\\sample.txt")

testi <- rm_between(a,"SERVICE_ID",",",extract = T)

We replace the 2 or more , with " " to get 'str2', then using regex lookarounds, we match one or more space ( \\\\s+ ) that follows the ] ) followed by characters ( .* ) till the end of the string, replace it with "" so that we can extract the [2016-04..,03] part. From the 'str2', we extract the substrings "SERVICE_ID=" followed by numbers ( \\\\d+ ) into a list , paste them together and finally paste it with the 'str3'.

library(stringr)
str2 <- gsub(",{2,}", " ", str1)
str3 <- sub("(?<=\\])\\s+.*", "", str2, perl = TRUE)
paste(str3, sapply(str_extract_all(str2, "SERVICE_ID=\\d+"), paste, collapse=" "))
#[1] "[2016-04-28 14:00:06,603] SERVICE_ID=441 SERVICE_ID=541 SERVICE_ID=9981"
#[2] "[2016-04-28 14:00:06,608] SERVICE_ID=00234 SERVICE_ID=11134" 

data

 str1 <- c("[2016-04-28 14:00:06,603],,,,,SERVICE_ID=441,DEBUG,DBSEntryServlet,DBSEntryServlet: delegateToRequestManager:: SERVICE_ID=541,SERVICE_ID=9981",
"[2016-04-28 14:00:06,608],,,,,,DEBUG,DBSEntryServlet,10.91.39.143:60801 SERVICE_ID=00234,SERVICE_ID=11134,IMD=6767")
str1 <- c("[2016-04-28 14:00:06,603],,,,,SERVICE_ID=441,DEBUG,DBSEntryServlet,DBSEntryServlet: delegateToRequestManager:: SERVICE_ID=541,SERVICE_ID=9981",
      "[2016-04-28 14:00:06,608],,,,,,DEBUG,DBSEntryServlet,10.91.39.143:60801   SERVICE_ID=00234,SERVICE_ID=11134,IMD=6767")
 str2 <- gsub(",{2,}", " ", str1)
 str4 <- sub("\\].*","",str2,perl = TRUE)
 str5 <- sub("\\[","",str4,perl = T)

 service_ids <- sapply(str_extract_all(str2,"SERVICE_ID=\\d+"), function(x){paste(x,collapse = " ")})
 net <- cbind(str5,service_ids)

Output:

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM