[英]Extracting everything after first two words in R
I am trying to extract all the info, using a regular expression in R, after the first number and first word of an entry in a data frame. 我正在尝试使用R中的正则表达式在数据帧中条目的第一个数字和第一个单词之后提取所有信息。
For example: 例如:
Header =
c("2006 Volvo XC70",
"2012 Ford Econoline Cargo Van E-250 Commercial",
"2012 Nissan Frontier",
"2012 Kia Soul 5dr Wagon Automatic")
I want to write a pattern that will grab Volvo XC70 , or Econoline Cargo Van E-250 Commercial (everything after the year and make) from an entry in my "header" column so that I may run the function on my data frame and create a new "model" column. 我想编写一个模式,该模式将从“标题”列中的某个条目中获取Volvo XC70或Econoline Cargo Van E-250 Commercial (除此以外的所有东西),以便我可以在数据框中运行该函数并创建新的“模型”列。 I can't figure out a pattern that will allow me to skip the first string of integers, then a space, then the first string of characters, and then a space, and then grab everything proceeding.
我无法找出一种模式,该模式将允许我跳过第一个整数字符串,一个空格,一个第一个字符字符串,一个空格,然后再跳过所有内容。
Any help would be appreciated. 任何帮助,将不胜感激。 Thanks!
谢谢!
Just use sub. 只需使用sub。
sub("^\\d+\\s+\\w+\\s+", "", df$x)
Example: 例:
x <- "2012 Ford Econoline Cargo Van E-250 Commercial"
sub("^\\d+\\s+\\w+\\s+", "", x)
# [1] "Econoline Cargo Van E-250 Commercial"
For this task, I would fetch a basic list using the XML package: 对于此任务,我将使用XML包获取基本列表:
library(XML)
doc <- xmlParse('http://www.fueleconomy.gov/ws/rest/ympg/shared/menu/make')
Now that we fetched the XML data we can create a vector with the car makes: 现在,我们获取了XML数据,我们可以使用汽车制造商创建矢量:
mk <- xpathSApply(doc, '//value', xmlValue)
Finally, I'll compile the pattern and play around with sprintf
and sub
: 最后,我将编译模式并使用
sprintf
和sub
:
df$Makes <- sub(sprintf('\\d+ (?:%s) ', paste(mk, collapse='|')), '', df$Header)
Output: 输出:
## Header
# 1 2006 Volvo XC70
# 2 2012 Ford Econoline Cargo Van E-250 Commercial
# 3 2012 Nissan Frontier
# 4 2012 Kia Soul 5dr Wagon Automatic
## Makes
# 1 XC70
# 2 Econoline Cargo Van E-250 Commercial
# 3 Frontier
# 4 Soul 5dr Wagon Automatic
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.