R正则表达式Lookbehind

Question

I have a vector filled with strings of the following format: <year1><year2><id1><id2> 我有一个向量填充以下格式的字符串： <year1><year2><id1><id2>

the first entries of the vector looks like this: 向量的第一个条目如下所示：

199719982001
199719982002
199719982003
199719982003

For the first entry we have: year1 = 1997, year2 = 1998, id1 = 2, id2 = 001. 对于第一个条目，我们有：year1 = 1997，year2 = 1998，id1 = 2，id2 = 001。

I want to write a regular expression that pulls out year1, id1, and the digits of id2 that are not zero. 我想写一个正则表达式，它取出year1，id1和id2的数字不为零。 So for the first entry the regex should output: 199721. 所以对于第一个条目，正则表达式应该输出：199721。

I have tried doing this with the stringr package, and created the following regex: 我尝试使用stringr包，并创建了以下正则表达式：

"^\\d{4}|\\d{1}(?<=\\d{3}$)"

to pull out year1 and id1, however when using the lookbehind i get a "invalid regular expression" error. 拉出year1和id1，然而当使用lookbehind我得到一个“无效的正则表达式”错误。 This is a bit puzzling to me, can R not handle lookaheads and lookbehinds? 这对我来说有点令人费解，R不能处理前瞻和外观吗？

Answer 1

You will need to use gregexpr from the base package. 您将需要使用base包中的gregexpr 。 This works: 这有效：

> s <- "199719982001"
> gregexpr("^\\d{4}|\\d{1}(?<=\\d{3}$)",s,perl=TRUE)
[[1]]
[1]  1 12
attr(,"match.length")
[1] 4 1
attr(,"useBytes")
[1] TRUE

Note the perl=TRUE setting. 请注意perl=TRUE设置。 For more details look into ?regex . 有关详细信息，请查看?regex 。

Judging from the output your regular expression does not catch id1 though. 从输出来看，你的正则表达式不会捕获id1 。

Answer 2

Since this is fixed format, why not use substr? 由于这是固定格式，为什么不使用substr？ year1 is extracted using substr(s,1,4) , id1 is extracted using substr(s,9,9) and the id2 as as.numeric(substr(s,10,13)) . 使用substr(s,1,4)提取year1 ，使用substr(s,9,9)提取id1 ，将id2提取为as.numeric(substr(s,10,13)) 。 In the last case I used as.numeric to get rid of the zeroes. 在最后一种情况下，我使用as.numeric来摆脱零。

Answer 3

你可以使用sub。

sub("^(.{4}).{4}(.{1}).*([1-9]{1,3})$","\\1\\2\\3",s)

R正则表达式Lookbehind

问题描述

3 个解决方案

解决方案1
9 2012-01-12 12:07:22

解决方案2
8 已采纳 2012-01-12 11:51:59

解决方案3
1 2012-01-14 08:52:09

R正则表达式Lookbehind

问题描述

3 个解决方案

解决方案1 9 2012-01-12 12:07:22

解决方案2 8 已采纳 2012-01-12 11:51:59

解决方案3 1 2012-01-14 08:52:09

解决方案1
9 2012-01-12 12:07:22

解决方案2
8 已采纳 2012-01-12 11:51:59

解决方案3
1 2012-01-14 08:52:09