[英]String Pattern Manipulation in R
我試圖從R中的一堆文本中找到主持人和訪客姓名。
示范文本 -
dat = data.frame(Series = c('England in Australia ODI Match',
'Prudential Trophy (Australia in England)',
'Pakistan in New Zealand ODI Match',
'Prudential Trophy (New Zealand in England)',
'Prudential Trophy (West Indies in England)',
'Australia in New Zealand ODI Series',
'Texaco Trophy (Australia in England)'))
我想要創建兩個新列。所需的輸出如下所示 -
Visitor Host
England Australia
Australia England
Pakistan New Zealand
New Zealand England
West Indies England
Australia New Zealand
我正在嘗試以下功能,但它不完整。
dat$Host = sub(" in.*", "", dat$Series)
這是你想要的東西:
re = regexpr("((New |West )?\\w+) in ((New |West )?\\w+)", dat$Series)
rm = regmatches(dat$Series, re)
d = do.call(rbind,strsplit(rm, " in "))
colnames(d) = c("Visitor","Host")
輸出:
Visitor Host
[1,] "England" "Australia"
[2,] "Australia" "England"
[3,] "Pakistan" "New Zealand"
[4,] "New Zealand" "England"
[5,] "West Indies" "England"
[6,] "Australia" "New Zealand"
[7,] "Australia" "England"
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.