[英]dplyr summarise logical condition
我有以下數據框
df <- data.frame(Gender = c(rep(c("M","F"),each=4)),
DiffA=c(1,1,-1,-1,1,1,1,-1),
DiffB=c(1,-1,1,-1,1,1,1,-1))
我想創建2個新變量,針對每個性別進行總結:i)DiffA和DiffB為正的行數; ii)DiffA和DiffB為負的行數,以便獲得:
df2 <- data.frame(Gender = c("M","F"),
Diff_Pos=c(1,3),
Diff_Neg=c(1,1))
我未能結合來自dplyr n()的摘要函數,該函數返回行數和所需的邏輯語句。 提前致謝
我會考慮做
library(tidyr)
df %>% filter(DiffA == DiffB) %>% count(Gender, DiffA) %>% spread(DiffA, n)
Gender -1 1
# (fctr) (int) (int)
# 1 F 1 3
# 2 M 1 1
類似的data.table代碼是
dcast(df[DiffA == DiffB, .N, by=.(Gender, DiffA)], Gender ~ DiffA)
# Gender -1 1
# 1: F 1 3
# 2: M 1 1
如果實際數據超出-1
和1
,則將相關列包裝在sign()
。
這是base R
選項
with(subset(df, DiffA==DiffB), table(Gender, DiffA))
# DiffA
#Gender -1 1
# F 1 3
# M 1 1
這應該工作:
df %>%
dplyr::mutate(
Diff_Pos = DiffA > 0 & DiffB > 0,
Diff_Neg = DiffA < 0 & DiffB < 0) %>%
dplyr::group_by(Gender) %>%
dplyr::summarise(
Diff_Pos = sum(Diff_Pos),
Diff_Neg = sum(Diff_Neg))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.