I have the following string pattern:
Name_session_id:Owner:UUID BUT sometimes it can be just Name:Owner:UUID.
For example:
John_1:David:enfl43erl34r345
or
John:David:enfl43erl34r345
I want to use stringr
and rebus
to be able to build a dataframe that looks like this:
Name Session Owner UUID
John 1 David enfl43erl34r345
John NA David enfl43erl34r345
Please advise how to do this, here is what I have done so far with the pattern:
capture(one_or_more(WRD)) %R%
optional("_") %R%
capture(optional(DGT)) %R%
":" %R%
capture(one_or_more(WRD)) %R%
":" %R%
capture(one_or_more(WRD))
The problem is with the first one_or_more(WRD)
, it matches _
, too, and the following _
and \\d?
are not even tried since \\w+
grabs the whole chunk of letters, digits and underscores.
Replace the first one_or_more(WRD)
with one_or_more(ALNUM)
to only capture 1+ letters or digits into Group 1:
capture(one_or_more(ALNUM)) %R%
optional("_") %R%
capture(optional(DGT)) %R%
":" %R%
capture(one_or_more(WRD)) %R%
":" %R%
capture(one_or_more(WRD))
Or, make it lazy with lazy(one_or_more(WRD))
:
capture(lazy(one_or_more(WRD))) %R%
optional("_") %R%
capture(optional(DGT)) %R%
":" %R%
capture(one_or_more(WRD)) %R%
":" %R%
capture(one_or_more(WRD))
However, I believe you should use
capture(one_or_more(ALNUM)) %R%
optional(
group("_" %R%
capture(one_or_more(DGT)))) %R%
":" %R%
capture(one_or_more(WRD)) %R%
":" %R%
capture(one_or_more(WRD))
It will create a regex like ([[:alnum:]]+)(?:_([\\d]+))?:([\\w]+):([\\w]+)
. That is, instead of using _
as an optional char followed with an optional one_or_more(DGT)
, you can wrap these two subsequent patterns with an optional group while making the patterns obligatory inside it.
Playing with some regex, you can rely solely on stringr::str_extract()
:
library(stringr)
data.frame(
Name = str_extract(data, "^[^:_]+"),
Session = str_extract(data, "(?<=_).*?(?=:)"),
Owner = str_extract(data, "(?<=:).*(?=:)"),
UUID = str_extract(data, "[^:]*$"),
stringsAsFactors = FALSE
)
Name Session Owner UUID
1 John 1 David enfl43erl34r345
2 John <NA> David enfl43erl34r345
Not using rebus
, but here is a no bullshit approach in base:
data:
df1 <-
data.frame(strings = c("John_1:David:enfl43erl34r345", "John:David:enfl43erl34r345"), stringsAsFactors = F)
code:
fun1 <- function(x) {
ans <- strsplit(x, "^[^:]+\\K_(?=\\d)", perl = T)
ans <- lapply(ans, strsplit, ":")
ans <- unlist(ans)
if(length(ans) == 3) { ans <- append(ans, NA, 1) }
return(ans)
}
result <- as.data.frame(t(apply(df1, 1, fun1)), stringsAsFactors = F)
names(result) = c("Name", "Session", "Owner", "UUID")
result:
# Name Session Owner UUID
#1 John 1 David enfl43erl34r345
#2 John <NA> David enfl43erl34r345
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.