[英]Extract names from a string using a list of names with grepl and a loop and add them to a new column in R
我有一個數據集,其中一列包含姓名,一列指示該人白天做了什么。 我正在嘗試使用 R 找出那天在我的數據集中誰會見了誰。我創建了一個包含數據集中名稱的向量,並在循環中使用 grepl 來確定名稱出現在詳細說明人們活動的列中的位置在數據集中。
name <- c("Dupont","Dupuy","Smith")
activity <- c("On that day, he had lunch with Dupuy in London.",
"She had lunch with Dupont and then went to Brighton to meet Smith.",
"Smith remembers that he was tired on that day.")
met_with <- c("Dupont","Dupuy","Smith")
df<-data.frame(name, activity, met_with=NA)
for (i in 1:length(met_with)) {
df$met_with<-ifelse(grepl(met_with[i], df$activity), met_with[i], df$met_with)
}
然而,由於兩個原因,該解決方案並不令人滿意。 當這個人遇到一個以上的人時,我不能提取一個以上的名字(在我的例子中是 Dupuy),我不能告訴 R 在我的名字中使用這個名字而不是代詞時不要返回這個人的名字活動列(例如史密斯)。
理想情況下,我希望 df 看起來像:
name activity met_with
Dupont On that day, he had lunch with Dupuy in London. Dupuy
Dupuy She had lunch with Dupont and then (...). Dupont Smith
Smith Smith remembers that he was tired on that day. NA
我正在清理字符串以構建邊緣列表和節點列表,以便稍后進行網絡分析。
謝謝
您可以使用setdiff
排除要與行匹配的名稱,並使用gregexpr
和regmatches
提取匹配的名稱。 也許也可以考慮在名稱周圍加上\\\\b
。
for(i in seq_len(nrow(df))) {
df$met_with[i] <- paste(regmatches(df$activity[i],
gregexpr(paste(setdiff(name, df$name[i]), collapse="|"),
df$activity[i]))[[1]], collapse = " ")
}
df
# name activity met_with
#1 Dupont On that day, he had lunch with Dupuy in London. Dupuy
#2 Dupuy She had lunch with Dupont and then went to Brighton to meet Smith. Dupont Smith
#3 Smith Smith remembers that he was tired on that day.
另一種使用Reduce
可能是:
df$met_with <- Reduce(function(x, y) {
i <- grepl(y, df$activity, fixed = TRUE) & y != df$name
x[i] <- lapply(x[i], `c`, y)
x
}, unique(name), vector("list", nrow(df)))
df
# name activity met_with
#1 Dupont On that day, he had lunch with Dupuy in London. Dupuy
#2 Dupuy She had lunch with Dupont and then went to Brighton to meet Smith. Dupont, Smith
#3 Smith Smith remembers that he was tired on that day. NULL
與@Gki 相同的邏輯,但使用stringr
函數和mapply
而不是循環。
library(stringr)
pat <- str_c('\\b', df$name, '\\b', collapse = '|')
df$met_with <- mapply(function(x, y) str_c(setdiff(x, y), collapse = ' '),
str_extract_all(df$activity, pat), df$name)
df
# name activity
#1 Dupont On that day, he had lunch with Dupuy in London.
#2 Dupuy She had lunch with Dupont and then went to Brighton to meet Smith.
#3 Smith Smith remembers that he was tired on that day.
# met_with
#1 Dupuy
#2 Dupont Smith
#3
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.