简体   繁体   English

匹配R正则表达式中的另一个单词后面的单词

[英]Matching a word after another word in R regex

I have a dataframe in R with one column (called 'city') containing a text string. 我在R中有一个包含文本字符串的列(称为“city”)的数据框。 My goal is to extract only one word ie the city text from the text string. 我的目标是从文本字符串中只提取一个单词,即城市文本。 The city text always follows the word 'in', eg the text might be: 城市文本始终跟在“in”之后,例如文本可能是:

'in London'
'in Manchester'

I tried to create a new column ('municipality'): 我试图创建一个新列('municipality'):

df$municipality <- gsub(".*in ?([A-Z+).*$","\\1",df$city)

This gives me the first letter following 'in', but I need the next word (ONLY the next word) 这给了我'in'后面的第一个字母,但我需要下一个字(只有下一个字)

I then tried: 然后我尝试了:

gsub(".*in ?([A-Z]\w+))")

which worked on a regex checker, but not in R. Can someone please help me. 这是一个正则表达式检查器,但不是在R.可以有人请帮助我。 I know this is probably very simple but I can't crack it. 我知道这可能很简单,但我不能破解它。 Thanks in advance. 提前致谢。

We can use str_extract 我们可以使用str_extract

library(stringr)
str_extract(df$city, '(?<=in\\s)\\w+')
#[1] "London"     "Manchester"

The following regular expression will match the second word from your city column: 以下正则表达式将与您的city列中的第二个单词匹配:

^in\\s([^ ]*).*$

This matches the word in followed a single space, followed by a capture group of any non space characters, which comprises the city name. 这匹配单个空格后面的单词in后跟任何非空格字符的捕获组,其中包含城市名称。

Example: 例:

df <- data.frame(city=c("in London town", "in Manchester city"))

df$municipality <- gsub("^in\\s([^ ]*).*$", "\\1", df$city)

> df$municipality
[1] "London"     "Manchester"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM