简体   繁体   English

提取r因子中的单词

[英]extract words in r factor

I have a dataset like so:我有一个像这样的数据集:

df <- data.frame(
text = c("Update AV Line 204 to Los Angeles will be ...",
 "91 Line 700 to RiversideDowntown is delayed 15 minutes ...",
 "VC Line 102 to Los Angeles is delayed 1520 minutes ...",
 "Update AV Line 227 to Lancaster is terminated  Via Princessa ",
 "RIV Line 411 to Los Angeles is delayed 10 minutes ...",
 "SB Line 312 to San Bernardino is delayed up to ...",
 "SB Line 327 to Los Angeles is delayed up to 15..."), stringsAsFactors = T)

 df

and I need to extract key words in a new field so that the end product looks something like this:我需要在新字段中提取关键字,以便最终产品看起来像这样:

> df
  text                                                            LinesExtracted
1 Update AV Line 204 to Los Angeles will be ...                   Line 204 to Los Angeles
2 91 Line 700 to RiversideDowntown is delayed 15 minutes ...      Line 700 to Riverside Downtown 
3 VC Line 102 to Los Angeles is delayed 1520 minutes ...          Line 102 to Los Angeles
4 UpdateAV Line 227 to Lancaster is terminated  Via Princessa     Line 227 to Lancaster 
5 RIV Line 411 to Los Angeles is delayed 10 minutes ...           Line 411 to Los Angeles
6 SB Line 312 to San Bernardino is delayed up to ...              Line 312 to San Bernardino
7 SB Line 327 to Los Angeles is delayed up to 15...               Line 327 to Los Angeles

Thanks.谢谢。

Since regex can be difficult to read, I have split it into steps:由于正则表达式可能难以阅读,我将其分为几个步骤:

df$LinesExtracted <- gsub("^.*Line","Line",df$text)
df$LinesExtracted <- gsub(" will be .*$","",df$LinesExtracted)
df$LinesExtracted <- gsub(" is .*$","",df$LinesExtracted)
df$LinesExtracted <- gsub("([a-z])([A-Z])","\\1 \\2",df$LinesExtracted,perl=TRUE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM