[英]How to aggregate R dataframe of two columns based on values of another
我的dataframe如下,其中gender=="1"是指男性,gender=="2"是指女性,职业go从A到U,年份从2010年到2018年(我给你一个小例子)
Gender Occupation Year
1 A 2010
1 A 2010
2 A 2010
1 B 2010
2 B 2010
1 A 2011
2 A 2011
1 C 2011
2 C 2011
我想要一个 output ,它将性别、年份和职业不同的行数相加,如下所示:
Year | Occupation | Men | Woman
2010 | A | 2 | 1
2010 | B | 1 | 1
2011 | A | 1 | 1
2011 | C | 1 | 1
我尝试了以下方法:
Nr_gender_occupation <- data %>%
group_by(year, occupation) %>%
summarise(
Men = aggregate(gender=="1" ~ occupation, FUN= count),
Women = aggregate(gender=="2" ~ occupation, FUN=count)
)
我们可以使用“性别”中的索引来更改值,然后使用pivot_wider
中的tidyr
将数据重塑为“宽”格式
library(dplyr)
library(tidyr)
data %>%
mutate(Gender = c("Male", "Female")[Gender]) %>%
pivot_wider(names_from = Gender, values_from = Gender, values_fn = length)
-输出
# A tibble: 4 x 4
# Occupation Year Male Female
# <chr> <int> <int> <int>
#1 A 2010 2 1
#2 B 2010 1 1
#3 A 2011 1 1
#4 C 2011 1 1
或者使用带有unnest
的table
library(tidyr)
data %>%
group_by(Year, Occupation) %>%
summarise(out = list(table(Gender)), .groups = 'drop') %>%
unnest_wider(out)
或者我们可以使用count
和pivot_wider
data %>%
count(Gender, Occupation, Year) %>%
pivot_wider(names_from = Gender, values_from = n)
data <- structure(list(Gender = c(1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
Occupation = c("A", "A", "A", "B", "B", "A", "A", "C", "C"
), Year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L,
2011L, 2011L)), class = "data.frame", row.names = c(NA, -9L
))
您还可以在您的组内进行计数:
library(dplyr)
df %>%
group_by(Occupation, Year) %>%
summarize(Men = sum(Gender == 1),
Woman = sum(Gender == 2), .groups = "drop")
Output
Occupation Year Men Woman
<chr> <dbl> <int> <int>
1 A 2010 2 1
2 A 2011 1 1
3 B 2010 1 1
4 C 2011 1 1
使用dcast
的data.table
选项
dcast(setDT(df), Year + Occupation ~ c("Men", "Woman")[Gender])
给
Year Occupation Men Woman
1: 2010 A 2 1
2: 2010 B 1 1
3: 2011 A 1 1
4: 2011 C 1 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.