如何刪除 dataframe 的所有行，這些行在 R 的列子集中具有相同的字符串值？

Question

我有一個 dataframe，看起來像這樣：

ID	時間	Q1	Q2	Q3	第四季度
1個	2分鍾	同意	北美	中性的	北美
2個	5分鍾	北美	不同意	同意	北美
3個	3分鍾	同意	北美	中性的	北美
4個	5分鍾	不同意	不同意	北美	北美
5個	6分鍾	北美	同意	同意	同意
6個	1分鍾	北美	北美	北美	北美

我只想保留對問題的回答在問題列 (Q1:Q4) 中不相等的行。 在此示例中，我將保留 ID 1-3 的行並刪除 4-6 行，因為它們都是相同的字符串。 我確實想保留前兩列的信息，但我不想在關於是否保留該行的決策邏輯中使用它。 所有行都有 NA，但 NA 位於不同的位置 - 所以我想刪除其中具有任何值的所有列都相同的行，或者刪除跨列的所有值都丟失的行。

我找到了另一個做了類似事情的答案並嘗試了這個：

keep <- apply(df[3:6], 1, function(x) length(unique(x[!is.na(x)])) != 1)
df[keep, ]

但這似乎只刪除了所有 NA 的行。

Answer 1

這是一個基於 dplyr 的解決方案：使用rowwise()進行分組，然后過濾到跨列具有 > 1 個不同值的行。

library(dplyr)

df %>% 
  rowwise() %>% 
  filter(n_distinct(c_across(Q1:Q4), na.rm = TRUE) > 1) %>% 
  ungroup()

# A tibble: 3 × 6
     ID Time  Q1    Q2       Q3      Q4   
  <int> <chr> <chr> <chr>    <chr>   <chr>
1     1 2min  Agree NA       Neutral NA   
2     2 5min  NA    Disagree Agree   NA   
3     3 3min  Agree NA       Neutral NA

Answer 2

data <- data.frame(ID = 1:6, Time = c("2 min", "5 min", "3 min", "5 min", "6 min", "1 min"), Q1 = c("Agree", NA, "Agree", "Disagree", NA, NA), Q2 = c(NA, "Disagree", NA, "Disagree", "Agree", NA), Q3 = c("Neutral", "Agree", "Neutral", NA, "Agree", NA), Q4 = c(NA, NA, NA, NA, "Agree", NA)) rows <- apply(data[3:6], 1, \(x) all(x[!is.na(x)][1] == x[!is.na(x)][-1]) ) data[!rows,] #> ID Time Q1 Q2 Q3 Q4 #> 1 1 2 min Agree <NA> Neutral <NA> #> 2 2 5 min <NA> Disagree Agree <NA> #> 3 3 3 min Agree <NA> Neutral <NA>

Answer 3

為了保留所有具有相同響應的人（假設忽略NA s），您可以在您嘗試的代碼中嘗試這個輕微的變化：

keeps <- apply(df[3:6], 1, function(x) length(unique(x[!is.na(x)])) == 1)
df[keeps, ]

# ID Time       Q1       Q2    Q3    Q4
# 4  5  min Disagree Disagree  <NA>  <NA>
# 5  6  min     <NA>    Agree Agree Agree

如果您想要所有沒有相同響應的人（假設忽略NA s）：

keeps <- apply(df[3:6], 1, function(x) length(unique(x[!is.na(x)])) != 1 & !all(is.na(x)))
df[keeps, ]

# ID Time    Q1       Q2      Q3   Q4
# 1  2  min Agree     <NA> Neutral <NA>
# 2  5  min  <NA> Disagree   Agree <NA>
# 3  3  min Agree     <NA> Neutral <NA>

數據

df <- read.table(text = "ID Time    Q1  Q2  Q3  Q4
1   2 min   Agree   NA  Neutral NA
2   5 min   NA  Disagree    Agree   NA
3   3 min   Agree   NA  Neutral NA
4   5 min   Disagree    Disagree    NA  NA
5   6 min   NA  Agree   Agree   Agree
6   1 min   NA  NA  NA  NA", header = TRUE)

如何刪除 dataframe 的所有行，這些行在 R 的列子集中具有相同的字符串值？

問題描述

3 個解決方案

解決方案1
1 2022-12-28 22:39:04

解決方案2
0 2022-12-28 21:31:58

解決方案3
0 2022-12-28 21:33:10

如何刪除 dataframe 的所有行，這些行在 R 的列子集中具有相同的字符串值？

問題描述

3 個解決方案

解決方案1 1 2022-12-28 22:39:04

解決方案2 0 2022-12-28 21:31:58

解決方案3 0 2022-12-28 21:33:10

解決方案1
1 2022-12-28 22:39:04

解決方案2
0 2022-12-28 21:31:58

解決方案3
0 2022-12-28 21:33:10