簡體   English   中英

使用dplyr基於重復值在條件因子級別匯總的拆分數據幀

[英]Split data frame conditional on factor level summarise based on duplicated values using dplyr

我有一個像這樣的數據框:

df<- data.frame(region= c("1","1","1","1","1","1","1","1","2","2"), 
  loc=c("A","A","A","B","B","B","C","D","E","F"), sp1= 
c("a","a","b","a","e","e","e","e","a","a"), sp2= 
c("b","b","c","b","f","f","f","f","b","b"), inter= 
c("a_b","a_b","b_c","a_b","e_f","e_f","e_f","e_f","a_b","a_b"))

我希望通過組region找到每個重復的水平interloc區域內再算上它發生了多少地塊中的輸出數據幀應顯示如下:

df<- data.frame(region= c("1","1","2"), sp1= 
 c("a","e","a"), sp2= 
 c("b","f","b"), inter= 
 c("a_b","e_f","a_b"), freq=c("2","3","2"))

我嘗試了以下方法:

df %>%
group_by(region,inter) %>%
filter(duplicated(inter))

您可以篩選出在每個regioninter組合中具有多於一行的組,然后使用n_distinct來計算唯一位置的數量。 我將物種變量作為組包括在內,以將其保留在數據集中。

df %>%
     group_by(region, sp1, sp2, inter) %>%
     filter(n() > 1) %>%
     summarise( n = n_distinct(loc) )

# A tibble: 3 x 5
# Groups:   region, sp1, sp2 [?]
  region    sp1    sp2  inter     n
  <fctr> <fctr> <fctr> <fctr> <int>
1      1      a      b    a_b     2
2      1      e      f    e_f     3
3      2      a      b    a_b     2

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM