[英]Using R, how does one extract multiple URLs/pattern matches from a string in a dataset, and then place each URL in its own adjacent column?
[英]How to select multiple pattern from dataset in R
我有一个包含电子邮件ID列表的数据集(数据):
email=c("susgho.agency@gmail.com","suagencyter.m@gmail.com",
"duff.abcnkhgt@gmail.com","ftyhabcdfg@gmail.com",
"gjhfhg1-ail.com","gjhgkjhgbrt.gh@aol.com")
我还有另一个数据集-(disp)带有模式列表:
pattern=c(".agency",".abc","1-ail.com"))
我想看看模式是否与电子邮件匹配。 预期输出应如下:
email pattern
susgho.agency@gmail.com .agency
suagencyter.m@gmail.com
duff.abcnkhgt@gmail.com .abc
ftyhabcdfg@gmail.com
gjhfhg1-ail.com 1-ail.com
gjhgkjhgbrt.gh@aol.com
我正在使用for循环,但要花很长时间才能执行。
w <- NULL
for(i in 1:nrow(disp))
{
w1 <- as.character(disp[i,1])
w2 <- data[grep(w1, data$email),]
if(nrow(w2) > 0)
{
w2$pattern <- w1
w <- rbind(w, w2)
}
else
w <- rbind(w, w2)
}
任何帮助将不胜感激。 TIA!
您可以这样做:
df$pattern[max.col(-attr(adist(df2$pattern,df$email,counts = T),'counts')[,,3])] = as.character(df2$pattern)
df
email pattern
1 susgho.agency@gmail.com .agency
2 suagencyter.m@gmail.com <NA>
3 duff.abcnkhgt@gmail.com .abc
4 ftyhabcdfg@gmail.com <NA>
5 gjhfhg1-ail.com 1-ail.com
或者你可以做
merge(df,stack(setNames(Vectorize(grep)(df2$pattern,df,value=T,fixed=T),df2$pattern)),by.x="email",by.y = "values",all=T)
email ind
1 duff.abcnkhgt@gmail.com .abc
2 ftyhabcdfg@gmail.com <NA>
3 gjhfhg1-ail.com 1-ail.com
4 suagencyter.m@gmail.com <NA>
5 susgho.agency@gmail.com .agency
数据:
df=read.table(text="email
susgho.agency@gmail.com
suagencyter.m@gmail.com
duff.abcnkhgt@gmail.com
ftyhabcdfg@gmail.com
gjhfhg1-ail.com",h=T)
df2=read.table(text=" pattern
.agency
.abc
1-ail.com",h=T)
使用stringr::str_match
方法略有不同,尽管您需要首先通过添加stringr::str_match
反斜杠前缀来转义pattern
字符串中的特殊字符:
email=c("susgho.agency@gmail.com","suagencyter.m@gmail.com",
"duff.abcnkhgt@gmail.com","ftyhabcdfg@gmail.com",
"gjhfhg1-ail.com","gjhgkjhgbrt.gh@aol.com")
pattern=c("\\.agency","\\.abc","1\\-ail.com")
data.frame(email, pattern = stringr::str_match(email, paste(pattern, collapse = "|")))
这将产生以下输出:
email pattern
1 susgho.agency@gmail.com .agency
2 suagencyter.m@gmail.com <NA>
3 duff.abcnkhgt@gmail.com .abc
4 ftyhabcdfg@gmail.com <NA>
5 gjhfhg1-ail.com 1-ail.com
6 gjhgkjhgbrt.gh@aol.com <NA>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.