![](/img/trans.png)
[英]R regex - extract strings between two characters for multiple instances
[英]regex in R to extract value between two strings
我的線條看起來像這樣
01:04:43.064 [12439] <2> xyz
01:04:43.067 [12439] <2> a lmn
01:04:43.068 [12439] <4> j klm
x_times_wait to <3000>
01:04:43.068 [12439] <4> j klm
enter_object <5000> main k
我希望正則表達式僅提取以時間戳開頭的行的尖括號后的值
這就是我嘗試過的-假設這些行在名為nn的數據幀中
split<-str_split_fixed(nn[,1], ">", 2)
split2<-data.frame(split[,2])
問題是split2給
xyz
a lmn
j klm
j klm
main k
如何確保不返回空行和主k?
\d+(?::\d+){2}\.\d+\s+\[[^\]]+\]\s+<\d+>(.+)$
而不是拆分嘗試比賽並抓住小組1.請參閱演示。
https://regex101.com/r/vN3sH3/16
要么
除以(?<=<\\d>)
並獲取split2
如果將時間戳定義為1個或多個數字,后跟一個:
,然后是1個或多個數字和另一個:
然后是1個或多個數字,那么此方法可能對您有用。
x <- c("01:04:43.064 [12439] <2> xyz", "01:04:43.067 [12439] <2> a lmn",
"01:04:43.068 [12439] <4> j klm", "x_times_wait to <3000>",
"01:04:43.068 [12439] <4> j klm", "enter_object <5000> main k")
sub(".*> ", "", x[grepl("\\d+:\\d+:\\d+", x)])
# [1] "xyz" "a lmn" "j klm" "j klm"
這將首先刪除所有非時間戳記元素,然后在>
之后獲取其余元素的值。
這是基於R的方法:
正則表達式:
^(\\d{2}:){2}\\d{2}\\.\\d{3}.*>\\s*\\K.+
您可以將其與gregexpr
一起gregexpr
:
unlist(regmatches(vec, gregexpr("^(\\d{2}:){2}\\d{2}\\.\\d{3}.*>\\s*\\K.+",
vec, perl = TRUE)))
# [1] "xyz" "a lmn" "j klm" "j klm"
vec
是包含您的字符串的向量。
使用rex可以使這種類型的任務更加簡單。
string <- "01:04:43.064 [12439] <2> xyz
01:04:43.067 [12439] <2> a lmn
01:04:43.068 [12439] <4> j klm
x_times_wait to <3000>
01:04:43.068 [12439] <4> j klm
enter_object <5000> main k"
library(rex)
timestamp <- rex(n(digit, 2), ":", n(digit, 2), ":", n(digit, 2), ".", n(digit, 3))
re <- rex(timestamp, space,
"[", digits, "]", space,
"<", digits, ">", space,
capture(anything))
re_matches(string, re, global = TRUE)
#> [[1]]
#> 1
#> 1 xyz
#> 2 a lmn
#> 3 j klm
#> 4 j klm
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.