如何在不使用长变量值作为新变量名的情况下将 R dataframe 从长改型为宽？

Question

I have a long dataframe that lists the top 3 employers of each occupation code (3 rows per occupation code).我有一个很长的 dataframe 列出了每个职业代码的前 3 名雇主（每个职业代码 3 行）。 It looks like this.它看起来像这样。

occcode occcode	employer雇主
1 1	top employer for occcode1 occcode1 的最佳雇主
1 1	2nd employer for occcode 1 occcode 1 的第二个雇主
1 1	3rd employer for occcode 1 occcode 1 的第三个雇主
2 2	top employer for occcode2 occcode2 的最佳雇主
2 2	2nd employer for occcode 2 occcode 2 的第二个雇主
2 2	3rd employer for occcode 1 occcode 1 的第三个雇主

I want to reshape it so that I have one row per occupation code, and columns named "emp1", "emp2", and "emp3" that are respectively populated with the 1st-3rd employers of that occupation code.我想重塑它，以便每个职业代码有一行，以及名为“emp1”、“emp2”和“emp3”的列，分别填充该职业代码的第 1-3 个雇主。

occcode occcode	employer1雇主1	employer2雇主2	employer3雇主3
1 1	top employer for occcode1 occcode1 的最佳雇主	2nd employer for occcode 1 occcode 1 的第二个雇主	3rd employer for occcode 1 occcode 1 的第三个雇主
2 2	top employer for occcode2 occcode2 的最佳雇主	2nd employer for occode2 occode2 的第二个雇主	3rd employer for occcode 1 occcode 1 的第三个雇主

I previously thought using the spread() function would work.我以前认为使用spread() function 会起作用。 But reading the documentation and testing it out, it doesn't produce what I have in mind because it requires that the values in "employer" in the long version of the data be standardized (such that there are only 3 employer names);但是阅读文档并对其进行测试，它并没有产生我的想法，因为它要求将长版本数据中“雇主”中的值标准化（这样只有 3 个雇主名称）； that's not the case because employer names vary a lot across occupation codes.情况并非如此，因为雇主名称在不同职业代码中差异很大。 What is the best way to do reshape the data in line with what I need?根据我的需要重塑数据的最佳方法是什么？

Answer 1

I removed the last row of source data to show that this should work for variable numbers of employers per occcode:我删除了最后一行源数据，以表明这应该适用于每个 occcode 的可变数量的雇主：

library(tidyverse)      
data.frame(
  stringsAsFactors = FALSE,
           occcode = c(1L, 1L, 1L, 2L, 2L),
          employer = c("top employer for occcode1",
                       "2nd employer for occcode 1","3rd employer for occcode 1",
                       "top employer for occcode2",
                       "2nd employer for occcode 2")
) %>%
  
  group_by(occcode) %>%
  mutate(col = paste0("employer", row_number())) %>%
  ungroup() %>%
  pivot_wider(names_from = col, values_from = employer)

Result结果

# A tibble: 2 × 4
  occcode employer1                 employer2                  employer3                 
    <int> <chr>                     <chr>                      <chr>                     
1       1 top employer for occcode1 2nd employer for occcode 1 3rd employer for occcode 1
2       2 top employer for occcode2 2nd employer for occcode 2 NA

Answer 2

Here is another approach:这是另一种方法：

library(data.table)
dcast(
  setDT(df)[, emp:={emp=substr(employer,1,1);emp=paste0("employer",fifelse(emp=="t","1",emp))}],
  occcode~emp, value.var="employer"
)

Output: Output：

   occcode                 employer1                  employer2                  employer3
1:       1 top employer for occcode1 2nd employer for occcode 1 3rd employer for occcode 1
2:       2 top employer for occcode2 2nd employer for occcode 2 3rd employer for occcode 2

Input:输入：

structure(list(occcode = c(1L, 1L, 1L, 2L, 2L, 2L), employer = c("top employer for occcode1", 
"2nd employer for occcode 1", "3rd employer for occcode 1", "top employer for occcode2", 
"2nd employer for occcode 2", "3rd employer for occcode 2")), row.names = c(NA, 
-6L), class = "data.frame")

如何在不使用长变量值作为新变量名的情况下将 R dataframe 从长改型为宽？

问题描述

2 个解决方案

解决方案1
3 2022-08-25 18:30:54

解决方案2
0 2022-08-25 18:37:10

如何在不使用长变量值作为新变量名的情况下将 R dataframe 从长改型为宽？

问题描述

2 个解决方案

解决方案1 3 2022-08-25 18:30:54

解决方案2 0 2022-08-25 18:37:10

解决方案1
3 2022-08-25 18:30:54

解决方案2
0 2022-08-25 18:37:10