R遍歷列並僅保留包含'＆'或'and'的行

Question

我有一個多列的數據框。 列A包含一個重復的數字。 B列包含一個名字。 我想搜索所有行，並為列A的相等值保留所有僅包含'＆'符號或在列B中具有單詞'and'的行。如果所有條目都沒有這些值之一，那么我只想要保留任何一行都沒關系。 樣本數據：

Column A           Column B     
12345                John
12345                Mary and Bob
12345                Ben
44444                Jim
44444                Larry & Meg
55555                Tommy

預期產量：

Column A            Column B
12345               Mary and Bob
44444               Larry & Meg
55555               Tommy

Answer 1

您可以使用ave和grepl獲得匹配的行：

dat[ave(dat$ColumnB, dat$ColumnA, FUN=function(x) {
  g <- grepl("( & )|( and )", x)
  if (all(!g)) {
    seq_along(x) == 1
  } else {
    g
  }
}) == "TRUE",]
#   ColumnA      ColumnB
# 2   12345 Mary and Bob
# 5   44444  Larry & Meg
# 6   55555        Tommy

數據：

dat = data.frame(ColumnA=c(12345, 12345, 12345, 44444, 44444, 55555), ColumnB=c("John", "Mary and Bob", "Ben", "Jim", "Larry & Meg", "Tommy"), stringsAsFactors=FALSE)

Answer 2

嘗試

library(data.table)
setDT(df1)[ , {tmp <- grepl('\\band\\b|&', ColumnB)
               .SD[tmp|all(!tmp)]}, ColumnA]
#   ColumnA      ColumnB
#1:   12345 Mary and Bob
#2:   44444  Larry & Meg
#3:   55555        Tommy

或使用dplyr

library(dplyr)
df1 %>% 
   group_by(ColumnA) %>% 
   mutate(tmp= grepl('\\band\\b|&', ColumnB)) %>% 
   filter(tmp|all(!tmp))%>%
   select(-tmp)

#  ColumnA      ColumnB
#1   12345 Mary and Bob
#2   44444  Larry & Meg
#3   55555        Tommy

數據

df1 <- structure(list(ColumnA = c(12345L, 12345L, 12345L, 44444L, 44444L, 
55555L), ColumnB = c("John", "Mary and Bob", "Ben", "Jim", "Larry & Meg", 
"Tommy")), .Names = c("ColumnA", "ColumnB"), class = "data.frame",
row.names = c(NA, -6L))

Answer 3

您想將數據集分為兩對和單身，對ID進行重復數據刪除，然后返回所有沒有兩對的對和單身。

# Reproducible Example!
df <- data.frame(a=c(rep(12345,3),rep(44444,2),55555),
                 b=c("John","Mary and Bob","Ben","Jim","Larry & Meg","Tommy")
)
couples <- which(grepl("&| and ",df$b,ignore.case=T))

df_couples <- df[couples,][!duplicated(df$a[couples]),]
df_singles <- df[-couples,][!duplicated(df$a[-couples]),]

rbind(df_couples, df_singles[!df_singles$a %in% df_couples$a,])
# 
#       a            b
# 2 12345 Mary and Bob
# 5 44444  Larry & Meg
# 6 55555        Tommy

R遍歷列並僅保留包含'＆'或'and'的行

問題描述

3 個解決方案

解決方案1
3 2015-06-23 19:42:29

解決方案2
2 2015-06-23 19:31:28

數據

解決方案3
0 2015-06-23 19:52:43

R遍歷列並僅保留包含&#39;＆&#39;或&#39;and&#39;的行

問題描述

3 個解決方案

解決方案1 3 2015-06-23 19:42:29

解決方案2 2 2015-06-23 19:31:28

數據

解決方案3 0 2015-06-23 19:52:43

R遍歷列並僅保留包含'＆'或'and'的行

解決方案1
3 2015-06-23 19:42:29

解決方案2
2 2015-06-23 19:31:28

解決方案3
0 2015-06-23 19:52:43