[英]How do I reshape an R dataframe from long to wide without using the long variable values as the new variable names?
I have a long dataframe that lists the top 3 employers of each occupation code (3 rows per occupation code).我有一个很长的 dataframe 列出了每个职业代码的前 3 名雇主(每个职业代码 3 行)。 It looks like this.
它看起来像这样。
occcode ![]() |
employer![]() |
---|---|
1 ![]() |
top employer for occcode1 ![]() |
1 ![]() |
2nd employer for occcode 1 ![]() |
1 ![]() |
3rd employer for occcode 1 ![]() |
2 ![]() |
top employer for occcode2 ![]() |
2 ![]() |
2nd employer for occcode 2 ![]() |
2 ![]() |
3rd employer for occcode 1 ![]() |
I want to reshape it so that I have one row per occupation code, and columns named "emp1", "emp2", and "emp3" that are respectively populated with the 1st-3rd employers of that occupation code.我想重塑它,以便每个职业代码有一行,以及名为“emp1”、“emp2”和“emp3”的列,分别填充该职业代码的第 1-3 个雇主。
occcode ![]() |
employer1![]() |
employer2![]() |
employer3![]() |
---|---|---|---|
1 ![]() |
top employer for occcode1 ![]() |
2nd employer for occcode 1 ![]() |
3rd employer for occcode 1 ![]() |
2 ![]() |
top employer for occcode2 ![]() |
2nd employer for occode2 ![]() |
3rd employer for occcode 1 ![]() |
I previously thought using the spread()
function would work.我以前认为使用
spread()
function 会起作用。 But reading the documentation and testing it out, it doesn't produce what I have in mind because it requires that the values in "employer" in the long version of the data be standardized (such that there are only 3 employer names);但是阅读文档并对其进行测试,它并没有产生我的想法,因为它要求将长版本数据中“雇主”中的值标准化(这样只有 3 个雇主名称); that's not the case because employer names vary a lot across occupation codes.
情况并非如此,因为雇主名称在不同职业代码中差异很大。 What is the best way to do reshape the data in line with what I need?
根据我的需要重塑数据的最佳方法是什么?
I removed the last row of source data to show that this should work for variable numbers of employers per occcode:我删除了最后一行源数据,以表明这应该适用于每个 occcode 的可变数量的雇主:
library(tidyverse)
data.frame(
stringsAsFactors = FALSE,
occcode = c(1L, 1L, 1L, 2L, 2L),
employer = c("top employer for occcode1",
"2nd employer for occcode 1","3rd employer for occcode 1",
"top employer for occcode2",
"2nd employer for occcode 2")
) %>%
group_by(occcode) %>%
mutate(col = paste0("employer", row_number())) %>%
ungroup() %>%
pivot_wider(names_from = col, values_from = employer)
Result结果
# A tibble: 2 × 4
occcode employer1 employer2 employer3
<int> <chr> <chr> <chr>
1 1 top employer for occcode1 2nd employer for occcode 1 3rd employer for occcode 1
2 2 top employer for occcode2 2nd employer for occcode 2 NA
Here is another approach:这是另一种方法:
library(data.table)
dcast(
setDT(df)[, emp:={emp=substr(employer,1,1);emp=paste0("employer",fifelse(emp=="t","1",emp))}],
occcode~emp, value.var="employer"
)
Output: Output:
occcode employer1 employer2 employer3
1: 1 top employer for occcode1 2nd employer for occcode 1 3rd employer for occcode 1
2: 2 top employer for occcode2 2nd employer for occcode 2 3rd employer for occcode 2
Input:输入:
structure(list(occcode = c(1L, 1L, 1L, 2L, 2L, 2L), employer = c("top employer for occcode1",
"2nd employer for occcode 1", "3rd employer for occcode 1", "top employer for occcode2",
"2nd employer for occcode 2", "3rd employer for occcode 2")), row.names = c(NA,
-6L), class = "data.frame")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.