簡體   English   中英

如何根據 R 中該列中特定成員的值收集該列的成員

[英]How collect members of a column based on the value of a specific member in that column in R

在下面的數據框中,我想收集 B1 的成員,它們在 B2 中的值等於或大於 B2 中“b”的值。 然后在這個新信息之后,計算每個 B1 成員出現的次數。

dataframe:

ID  B1  B2
z1  a   2.5
z1  b   1.7
z1  c   170
z1  c   9
z1  d   3
y2  a   0
y2  b   21
y2  c   15
y2  c   101
y2  d   30
y2  d   3
y2  d   15.5
x3  a   30.8
x3  a   54
x3  a   0
x3  b   30.8
x3  c   30.8
x3  d   7

所以結果是:

ID  B1  B2
z1  a   2.5
z1  c   170
z1  c   9
z1  d   3
y2  c   101
y2  d   30
x3  a   30.8
x3  a   54
x3  c   30.8

ID  B1  count
z1  a   1
z1  c   2
z1  d   1
y2  a   0
y2  c   1
y2  d   1
x3  a   2
x3  c   1
x3  d   0

按“ID”分組, filter其中“B2”大於或等於“B2”,其中“B1”為“b”,並創建另一個條件,其中“B1”不等於“b”

library(dplyr)
out1 <- df1 %>%
    group_by(ID) %>% 
    filter(any(B1 == "b") & B2 >= min(B2[B1 == "b"]), B1 != 'b') 

-輸出

> out1
# A tibble: 9 × 3
# Groups:   ID [3]
  ID    B1       B2
  <chr> <chr> <dbl>
1 z1    a       2.5
2 z1    c     170  
3 z1    c       9  
4 z1    d       3  
5 y2    c     101  
6 y2    d      30  
7 x3    a      30.8
8 x3    a      54  
9 x3    c      30.8

第二個 output 將通過 group by 和summarise來獲取行數,然后用complete填充缺失的組合

library(tidyr)
out1 %>% 
  group_by(B1, .add = TRUE) %>%
  summarise(count = n(), .groups = "drop_last") %>% 
  complete(B1 = unique(.$B1), fill = list(count = 0)) %>%
  ungroup
# A tibble: 9 × 3
  ID    B1    count
  <chr> <chr> <int>
1 x3    a         2
2 x3    c         1
3 x3    d         0
4 y2    a         0
5 y2    c         1
6 y2    d         1
7 z1    a         1
8 z1    c         2
9 z1    d         1

數據

df1 <- structure(list(ID = c("z1", "z1", "z1", "z1", "z1", "y2", "y2", 
"y2", "y2", "y2", "y2", "y2", "x3", "x3", "x3", "x3", "x3", "x3"
), B1 = c("a", "b", "c", "c", "d", "a", "b", "c", "c", "d", "d", 
"d", "a", "a", "a", "b", "c", "d"), B2 = c(2.5, 1.7, 170, 9, 
3, 0, 21, 15, 101, 30, 3, 15.5, 30.8, 54, 0, 30.8, 30.8, 7)), 
class = "data.frame", row.names = c(NA, 
-18L))

使用 tidyverse:

library(tidyverse)

df %>% 
  group_by(ID) %>% 
  filter(B2 > B2[B1 == "b"]) %>%
  group_by(ID, B1) %>%
  count(name = "count") %>%
  as.data.frame()
#>   ID B1 count
#> 1 x3  a     1
#> 2 y2  c     1
#> 3 y2  d     1
#> 4 z1  a     1
#> 5 z1  c     2
#> 6 z1  d     1

reprex package (v2.0.1) 創建於 2022-04-26

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM