简体   繁体   English

如何使用 group_by 将每个变量中的数据从长转换为宽? R

[英]How can I transpose data in each variable from long to wide using group_by? R

I have a dataframe with id variable name .我有一个带有 id variable name的 dataframe 。 I'm trying to figure out a way to transpose each variable in the dataframe by name.我试图找出一种方法来按名称转置 dataframe 中的每个变量。

My current df is below:我当前的df如下:

name   jobtitle companyname datesemployed empduration joblocation jobdescrip 

1 David… Project… EOS IT Man… Aug 2018 – P… 1 yr 9 mos  San Franci… Coordinati…
2 David… Technic… Options Te… Sep 2017 – J… 5 mos       Belfast, U… Working wi…
3 David… Data An… NA          Jan 2018 – J… 6 mos       Belfast, U… Working wi…

However, I'd like a dataframe in which there is only one row for name, and every observation for name becomes its own column, like below:但是,我想要一个 dataframe ,其中名称只有一行,名称的每个观察值都成为自己的列,如下所示:

name   jobtitle_1 companyname_1 datesemployed_1 empduration_1 joblocation_1 jobdescrip_1 job_title2 companyname_2 datesemployed_2 empduration_2 joblocation_2 jobdescrip_2

1 David… Project… EOS IT Man… Aug 2018 – P… 1 yr 9 mos  San Franci… Coordinati… Technic… Options Te… Sep 2017 – J… 5 mos       Belfast, U… Working wi…

I have used commands like gather_by and melt in the past to reshape from long to wide, but in this case, I'm not sure how to apply it, since every observation for the id variable will need to become its own column.我过去使用过诸如gather_bymelt之类的命令来从长到宽重塑,但在这种情况下,我不确定如何应用它,因为对id 变量的每个观察都需要成为它自己的列。

It sounds like you are looking for gather and pivot_wider .听起来您正在寻找collect 和 pivot_wider

I used my own sample data with two names:我使用了我自己的样本数据,有两个名称:

df <- tibble(name = c('David', 'David', 'David', 'Bill', 'Bill'),
             jobtitle = c('PM', 'TPM', 'Analyst', 'Dev', 'Eng'),
             companyname = c('EOS', 'Options', NA, 'Microsoft', 'Nintendo'))

First add an index column to distinguish the different positions for each name.首先添加一个索引列,以区分每个名称的不同位置。

indexed <- df %>%
  group_by(name) %>%
  mutate(.index = row_number())
indexed
#   name  jobtitle companyname .index
#   <chr> <chr>    <chr>        <int>
# 1 David PM       EOS              1
# 2 David TPM      Options          2
# 3 David Analyst  NA               3
# 4 Bill  Dev      Microsoft        1
# 5 Bill  Eng      Nintendo         2

Then it is possible to use gather to get a long form, with one value per row.然后可以使用gather得到一个长格式,每行一个值。

gathered <- indexed %>% gather('var', 'val', -c(name, .index))
gathered
#    name  .index var         val      
#    <chr>  <int> <chr>       <chr>    
#  1 David      1 jobtitle    PM       
#  2 David      2 jobtitle    TPM      
#  3 David      3 jobtitle    Analyst  
#  4 Bill       1 jobtitle    Dev      
#  5 Bill       2 jobtitle    Eng      
#  6 David      1 companyname EOS      
#  7 David      2 companyname Options  
#  8 David      3 companyname NA       
#  9 Bill       1 companyname Microsoft
# 10 Bill       2 companyname Nintendo 

Now pivot_wider can be used to create a column for each variable and index.现在可以使用pivot_wider为每个变量和索引创建一个列。

gathered %>% pivot_wider(names_from = c(var, .index), values_from = val)
#   name  jobtitle_1 jobtitle_2 jobtitle_3 companyname_1 companyname_2 companyname_3
#   <chr> <chr>      <chr>      <chr>      <chr>         <chr>         <chr>        
# 1 David PM         TPM        Analyst    EOS           Options       NA           
# 2 Bill  Dev        Eng        NA         Microsoft     Nintendo      NA    

Get the data in long format, create a unique column identifier and get it back to wide format.获取长格式数据,创建唯一的列标识符并将其恢复为宽格式。

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = -name, names_to = 'col') %>%
  group_by(name, col) %>%
  mutate(row = row_number()) %>%
  pivot_wider(names_from = c(col, row), values_from = value)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM