简体   繁体   English

如何根据另一列的值聚合两列的 R dataframe

[英]How to aggregate R dataframe of two columns based on values of another

My dataframe is as follows in which gender=="1" refers to men and gender=="2" refers to women, Occupations go from A to U and year goes from 2010 to 2018 (I give you a small example)我的dataframe如下,其中gender=="1"是指男性,gender=="2"是指女性,职业go从A到U,年份从2010年到2018年(我给你一个小例子)

Gender   Occupation    Year
1            A         2010
1            A         2010
2            A         2010
1            B         2010
2            B         2010
1            A         2011
2            A         2011
1            C         2011
2            C         2011

I want an output that sums the number of rows in which gender and year and occupation is distinct like you can see next:我想要一个 output ,它将性别、年份和职业不同的行数相加,如下所示:

Year | Occupation | Men | Woman
2010 |      A     |  2  |   1
2010 |      B     |  1  |   1
2011 |      A     |  1  |   1
2011 |      C     |  1  |   1

I have tried the following:我尝试了以下方法:

Nr_gender_occupation <- data %>%
   group_by(year, occupation) %>%
   summarise(
      Men = aggregate(gender=="1" ~ occupation, FUN= count),
      Women = aggregate(gender=="2" ~ occupation, FUN=count)
)

We could use the index in 'Gender' to change the values, then with pivot_wider from tidyr reshape the data into 'wide' format我们可以使用“性别”中的索引来更改值,然后使用pivot_wider中的tidyr将数据重塑为“宽”格式

library(dplyr)
library(tidyr)
data %>%
 mutate(Gender = c("Male", "Female")[Gender]) %>%
 pivot_wider(names_from = Gender, values_from = Gender, values_fn = length)

-output -输出

# A tibble: 4 x 4
#  Occupation  Year  Male Female
#  <chr>      <int> <int>  <int>
#1 A           2010     2      1
#2 B           2010     1      1
#3 A           2011     1      1
#4 C           2011     1      1

Or use table with unnest或者使用带有unnesttable

library(tidyr)
data %>%
   group_by(Year, Occupation) %>%
   summarise(out = list(table(Gender)), .groups = 'drop') %>%     
   unnest_wider(out)

Or we can use count with pivot_wider或者我们可以使用countpivot_wider

data %>%
  count(Gender, Occupation, Year) %>%
  pivot_wider(names_from = Gender, values_from = n)

data数据

data <- structure(list(Gender = c(1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), 
    Occupation = c("A", "A", "A", "B", "B", "A", "A", "C", "C"
    ), Year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L, 
    2011L, 2011L)), class = "data.frame", row.names = c(NA, -9L
))

You can also do a count within your groups:您还可以在您的组内进行计数:

library(dplyr)

df %>% 
  group_by(Occupation, Year) %>% 
  summarize(Men = sum(Gender == 1),
            Woman = sum(Gender == 2), .groups = "drop")

Output Output

  Occupation  Year   Men Woman
  <chr>      <dbl> <int> <int>
1 A           2010     2     1
2 A           2011     1     1
3 B           2010     1     1
4 C           2011     1     1

A data.table option using dcast使用dcastdata.table选项

dcast(setDT(df), Year + Occupation ~ c("Men", "Woman")[Gender])

gives

   Year Occupation Men Woman
1: 2010          A   2     1
2: 2010          B   1     1
3: 2011          A   1     1
4: 2011          C   1     1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何基于另一列的值聚合一列的R数据帧 - How to aggregate R dataframe of one column based on values of another 如何根据另一个向量中的值删除 R 中数据帧中的列? - How to remove columns in a dataframe in R based on the values from another vector? 根据 R 中两个数据帧的两列中的匹配对名称,将值添加到另一个数据帧中的其他值 - Add values to other values in another dataframe based on matching pair names in two columns of two dataframes in R 如何在R数据框中使用两个不同的函数聚合两个不同的列 - How to aggregate two different columns with two different functions in R dataframe 根据R中另一个数据框中的值重命名数据框列 - Rename dataframe columns based on values in another dataframe in R 基于r中另一个数据帧中的列向数据帧添加值 - Adding values to a dataframe based on columns in another dataframe in r 根据由两列匹配的另一个 Dataframe 覆盖 Dataframe 中的值 - Overwrite values in Dataframe based on another Dataframe matched by two columns R基于多个列聚合,然后合并到数据框? - R aggregate based on multiple columns and then merge into dataframe? 根据R中的三列聚合数据帧 - Aggregate a dataframe based on three columns in R 如何使用 R 中另一个 dataframe 中的值匹配一个 dataframe 中的两列 - How to match two columns in one dataframe using values in another dataframe in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM