繁体   English   中英

R中的字符串模式操作

[英]String Pattern Manipulation in R

我试图从R中的一堆文本中找到主持人和访客姓名。

示范文本 -

dat = data.frame(Series = c('England in Australia ODI Match',
'Prudential Trophy (Australia in England)',
'Pakistan in New Zealand ODI Match',
'Prudential Trophy (New Zealand in England)',
'Prudential Trophy (West Indies in England)',
'Australia in New Zealand ODI Series',
'Texaco Trophy (Australia in England)'))

我想要创建两个新列。所需的输出如下所示 -

Visitor     Host
England     Australia
Australia   England
Pakistan    New Zealand
New Zealand England
West Indies England
Australia   New Zealand

我正在尝试以下功能,但它不完整。

dat$Host = sub(" in.*", "", dat$Series)

这是你想要的东西:

re = regexpr("((New |West )?\\w+) in ((New |West )?\\w+)", dat$Series)
rm = regmatches(dat$Series, re)
d = do.call(rbind,strsplit(rm, " in "))
colnames(d) = c("Visitor","Host")

输出:

     Visitor       Host         
[1,] "England"     "Australia"  
[2,] "Australia"   "England"    
[3,] "Pakistan"    "New Zealand"
[4,] "New Zealand" "England"    
[5,] "West Indies" "England"    
[6,] "Australia"   "New Zealand"
[7,] "Australia"   "England"    

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM