简体   繁体   English

限制R中特定单词后的字符长度

[英]Limit character length after specific word in R

I have vector of names that I would like to clean. 我有一些要清除的名称向量。 I would like to shorten each character length: 我想缩短每个字符的长度:

Example: 例:

x <- c("LambMa, a.b.c., LaMa (shorter wording), LambM, abc , a.b.c",
       "LambMa, a.b.c., LaMa (shorter wording)") 

I would like to obtain in this example only the first LambMa abc and cut off the rest. 在此示例中,我只想获取第一个LambMa abc并切断其余部分。 So if the specific character doesn't contain the abc twice DO NOTHING (skip). 因此,如果特定字符两次都不包含abc则不要(跳过)。

So the specific word or expression to look for is "abc" . 因此要查找的特定单词或表达式是"abc" so cut the rest after first occurrence. 因此,请在第一次出现后将其余部分剪掉。

EDIT: I would like to obtain only the characters before abc (included) from vector x in case the there is double occurrence of abc in that given character string x . 编辑:我想从向量x中仅获取abc (包括)之前的字符,以防在给定的字符串x出现abc两次的情况。

The solution to the example above would be: 上面示例的解决方案是:

solution <- c("LambMa, a.b.c.","LambMa, a.b.c., LaMa (shorter wording)") 

EDIT 2: Also partial solution would be very helpful and would be accepted. 编辑2:同样,部分解决方案将非常有帮助并且会被接受。 Thanks 谢谢

x <- c("LambMa, a.b.c., LaMa (shorter wording), LambM, abc , a.b.c",
       "LambMa, a.b.c., LaMa (shorter wording)") 

occ_abc<-gregexpr("a.b.c",x) # find the occurences of "a.b.c."
for(i in 1:length(occ_abc)){ # for each item of x
    if(length(occ_abc[[i]])>=2) { # if there is 2 or more occurences
      x[i]<-substr(x[i],1,occ_abc[[i]][1]+5) # replace with first part of the string
    } else { # else leave the item untouched
      x[i]
    }
}

>x

[1] "LambMa, a.b.c."                         "LambMa, a.b.c., LaMa (shorter wording)"

The if...else part can very probably be replaced by an ifelse statement. if...else部分可以用ifelse语句替换。

You can use gsub to swap out if the pattern you specified matches. 如果指定的模式匹配,则可以使用gsub换出。 To avoid using a look-behind, you can capture the first abc and replace with it: 为了避免使用后视,您可以捕获第一个abc并替换为:

gsub("(a\\.b\\.c\\.).+(a\\.b\\.c)","\\1",x)
[1] "LambMa, a.b.c."                        
[2] "LambMa, a.b.c., LaMa (shorter wording)"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM