[英]Accessing results of gregexpr
I would like to use the gregexpr function to find the start and finish positions of substrings within a string. 我想使用gregexpr函数来查找字符串中子字符串的开始和结束位置。 The function works fine in the console, but I cannot access the results for either the start positions or string lengths: 该函数在控制台中工作正常,但是我无法访问起始位置或字符串长度的结果:
g <- gregexpr("e", "cheese")
g
[[1]]
[1] 3 4 6
attr(,"match.length")
[1] 1 1 1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE
g[[1]][1]
merely brings up the first value (3), but I need to create a vector with ALL values for both start positions and length. g[[1]][1]
仅显示第一个值(3),但是我需要创建一个包含所有起始位置和长度值的向量。 Thanks. 谢谢。
You can use unlist and you will receive list of position. 您可以使用取消列表,您将收到职位列表。 Once need to have only first and last, min and max can be used 一旦只需要第一个和最后一个,就可以使用min和max
unlist(g)
[1] 3 4 6 [1] 3 4 6
You can extract them in this way : 您可以通过以下方式提取它们:
g <- gregexpr("e", "cheese")
# one liner for : starts <- g[[1]]
# attributes(starts) <- NULL
starts <- `attributes<-`(g[[1]],NULL)
lens <- attr(g[[1]],'match.length')
> starts
[1] 3 4 6
> lens
[1] 1 1 1
Of course this works only if text is of length 1 (as in the example, since it contains only "cheese"
). 当然,这仅在文本长度为1时才有效(如示例中,因为它仅包含"cheese"
)。 Otherwise you'll need to iterate over the elements of g
using g[[2]]
, g[[3]]
... etc. 否则,您需要使用g[[2]]
, g[[3]]
...等迭代g
的元素。
Another approach would be: 另一种方法是:
g <- gregexpr("e", "cheese")
g[[1]][1:length(g[[1]])]
#[1] 3 4 6
And the microbenchmarking with the unlist
approach: 使用unlist
方法进行unlist
:
microbenchmark::microbenchmark(
g[[1]][1:length(g[[1]])],
unlist(g)
)
#Unit: nanoseconds
# expr min lq mean median uq max neval
# g[[1]][1:length(g[[1]])] 378 378 653.80 379 756 8307 100
# unlist(g) 0 378 544.32 378 378 15104 100
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.