简体   繁体   English

访问gregexpr的结果

[英]Accessing results of gregexpr

I would like to use the gregexpr function to find the start and finish positions of substrings within a string. 我想使用gregexpr函数来查找字符串中子字符串的开始和结束位置。 The function works fine in the console, but I cannot access the results for either the start positions or string lengths: 该函数在控制台中工作正常,但是我无法访问起始位置或字符串长度的结果:

g <- gregexpr("e", "cheese")

g

[[1]]
[1] 3 4 6
attr(,"match.length")
[1] 1 1 1
attr(,"index.type")
[1] "chars"
attr(,"useBytes")
[1] TRUE

g[[1]][1] merely brings up the first value (3), but I need to create a vector with ALL values for both start positions and length. g[[1]][1]仅显示第一个值(3),但是我需要创建一个包含所有起始位置和长度值的向量。 Thanks. 谢谢。

You can use unlist and you will receive list of position. 您可以使用取消列表,您将收到职位列表。 Once need to have only first and last, min and max can be used 一旦只需要第一个和最后一个,就可以使用min和max

unlist(g)

[1] 3 4 6 [1] 3 4 6

You can extract them in this way : 您可以通过以下方式提取它们:

g <- gregexpr("e", "cheese")

# one liner for : starts <- g[[1]]
#                 attributes(starts) <- NULL
starts <- `attributes<-`(g[[1]],NULL) 

lens <- attr(g[[1]],'match.length')

> starts
[1] 3 4 6
> lens
[1] 1 1 1

Of course this works only if text is of length 1 (as in the example, since it contains only "cheese" ). 当然,这仅在文本长度为1时才有效(如示例中,因为它仅包含"cheese" )。 Otherwise you'll need to iterate over the elements of g using g[[2]] , g[[3]] ... etc. 否则,您需要使用g[[2]]g[[3]] ...等迭代g的元素。

Another approach would be: 另一种方法是:

g <- gregexpr("e", "cheese")

g[[1]][1:length(g[[1]])]
#[1] 3 4 6

And the microbenchmarking with the unlist approach: 使用unlist方法进行unlist

microbenchmark::microbenchmark(
   g[[1]][1:length(g[[1]])], 
   unlist(g)
)

#Unit: nanoseconds
#                     expr min  lq   mean median  uq   max neval
# g[[1]][1:length(g[[1]])] 378 378 653.80    379 756  8307   100
#                unlist(g)   0 378 544.32    378 378 15104   100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM