[英]How to aggregate R dataframe of two columns based on values of another
我的dataframe如下,其中gender=="1"是指男性,gender=="2"是指女性,職業go從A到U,年份從2010年到2018年(我給你一個小例子)
Gender Occupation Year
1 A 2010
1 A 2010
2 A 2010
1 B 2010
2 B 2010
1 A 2011
2 A 2011
1 C 2011
2 C 2011
我想要一個 output ,它將性別、年份和職業不同的行數相加,如下所示:
Year | Occupation | Men | Woman
2010 | A | 2 | 1
2010 | B | 1 | 1
2011 | A | 1 | 1
2011 | C | 1 | 1
我嘗試了以下方法:
Nr_gender_occupation <- data %>%
group_by(year, occupation) %>%
summarise(
Men = aggregate(gender=="1" ~ occupation, FUN= count),
Women = aggregate(gender=="2" ~ occupation, FUN=count)
)
我們可以使用“性別”中的索引來更改值,然后使用pivot_wider
中的tidyr
將數據重塑為“寬”格式
library(dplyr)
library(tidyr)
data %>%
mutate(Gender = c("Male", "Female")[Gender]) %>%
pivot_wider(names_from = Gender, values_from = Gender, values_fn = length)
-輸出
# A tibble: 4 x 4
# Occupation Year Male Female
# <chr> <int> <int> <int>
#1 A 2010 2 1
#2 B 2010 1 1
#3 A 2011 1 1
#4 C 2011 1 1
或者使用帶有unnest
的table
library(tidyr)
data %>%
group_by(Year, Occupation) %>%
summarise(out = list(table(Gender)), .groups = 'drop') %>%
unnest_wider(out)
或者我們可以使用count
和pivot_wider
data %>%
count(Gender, Occupation, Year) %>%
pivot_wider(names_from = Gender, values_from = n)
data <- structure(list(Gender = c(1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
Occupation = c("A", "A", "A", "B", "B", "A", "A", "C", "C"
), Year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L,
2011L, 2011L)), class = "data.frame", row.names = c(NA, -9L
))
您還可以在您的組內進行計數:
library(dplyr)
df %>%
group_by(Occupation, Year) %>%
summarize(Men = sum(Gender == 1),
Woman = sum(Gender == 2), .groups = "drop")
Output
Occupation Year Men Woman
<chr> <dbl> <int> <int>
1 A 2010 2 1
2 A 2011 1 1
3 B 2010 1 1
4 C 2011 1 1
使用dcast
的data.table
選項
dcast(setDT(df), Year + Occupation ~ c("Men", "Woman")[Gender])
給
Year Occupation Men Woman
1: 2010 A 2 1
2: 2010 B 1 1
3: 2011 A 1 1
4: 2011 C 1 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.