简体   繁体   English

如何将多列长数据重塑为宽?

[英]How can I reshape multiple columns of long data to wide?

structure(tibble(c("top", "jng", "mid", "bot", "sup"), c("369", "Karsa", "knight", "JackeyLove", "yuyanjia"), 
           c("Malphite", "Rek'Sai",  "Zoe", "Aphelios", "Braum"), c("1", "1", "1", "1", "1"), c("7", "5", "7", "5", "0"), 
           c("6079-7578", "6079-7578", "6079-7578", "6079-7578", "6079-7578")), .Names = c("position", "player", "champion", "result", "kills", "gameid"))

Output:输出:

# A tibble: 5 x 6
  position player     champion result kills gameid   
* <chr>    <chr>      <chr>    <chr>  <chr> <chr>    
1 top      369        Malphite 1      7     6079-7578
2 jng      Karsa      Rek'Sai  1      5     6079-7578
3 mid      knight     Zoe      1      7     6079-7578
4 bot      JackeyLove Aphelios 1      5     6079-7578
5 sup      yuyanjia   Braum    1      0     6079-7578

My desired output would be:我想要的输出是:

structure(list(gameid = "6079-7578", result = "1", player_top = "369", 
    player_jng = "Karsa", player_mid = "knight", player_bot = "JackeyLove", 
    player_sup = "yuyanjia", champion_top = "Malphite", champion_jng = "Rek'Sai", 
    champion_mid = "Zoe", champion_bot = "Aphelios", champion_sup = "Braum", 
    kills_top = "7", kills_jng = "5", kills_mid = "7", kills_bot = "5", 
    kills_sup = "0"), row.names = c(NA, -1L), class = c("tbl_df", 
"tbl", "data.frame")) 

which looks like this:看起来像这样:

     gameid result player_top player_jng player_mid player_bot player_sup champion_top champion_jng champion_mid champion_bot champion_sup
1 6079-7578      1        369      Karsa     knight JackeyLove   yuyanjia     Malphite      RekSai          Zoe     Aphelios        Braum
  kills_top kills_jng kills_mid kills_bot kills_sup
1         7         5         7         5         0

I know I should use pivot_wider() and something like drop_na, but I don't know how to do pivot_wider() with mutiple columns and collapse the rows at the same time.我知道我应该使用pivot_wider() 和drop_na 之类的东西,但我不知道如何使用多个列执行pivot_wider() 并同时折叠行。 Any help would be appreciated.任何帮助,将不胜感激。

You can use pivot_wider() for this, defining the "position" variable as the variable that the new column names come from in names_from and the three variables with values you want to use to fill those columns with as values_from .您可以pivot_wider()使用pivot_wider() ,将“位置”变量定义为新列名称来自names_from的变量,以及三个变量,其中包含要用于填充这些列的值 as values_from

By default the multiple values_from variables are pasted on to the front of new columns names.默认情况下,多个values_from变量被粘贴到新列名称的前面。 This can be changed, but in this case that matches the naming structure you want.这可以改变,但在这种情况下,匹配你想要的命名结构。

All other variables in the original dataset will be used as the id_cols in the order that they appear.原始数据集中的所有其他变量将按照它们出现的顺序用作id_cols

library(tidyr)
pivot_wider(dat,  
            names_from = "position", 
            values_from = c("player", "champion", "kills"))
#>   result    gameid player_top player_jng player_mid player_bot player_sup
#> 1      1 6079-7578        369      Karsa     knight JackeyLove   yuyanjia
#>   champion_top champion_jng champion_mid champion_bot champion_sup kills_top
#> 1     Malphite      Rek'Sai          Zoe     Aphelios        Braum         7
#>   kills_jng kills_mid kills_bot kills_sup
#> 1         5         7         5         0

You can control the order of your id columns in the output by explicitly writing them out via id_cols .您可以通过id_cols显式写出它们来控制输出中 id 列的顺序。 Here's an example, matching your desired output.这是一个示例,匹配您想要的输出。

pivot_wider(dat, id_cols = c("gameid", "result"), 
            names_from = "position", 
            values_from = c("player", "champion", "kills"))
#>      gameid result player_top player_jng player_mid player_bot player_sup
#> 1 6079-7578      1        369      Karsa     knight JackeyLove   yuyanjia
#>   champion_top champion_jng champion_mid champion_bot champion_sup kills_top
#> 1     Malphite      Rek'Sai          Zoe     Aphelios        Braum         7
#>   kills_jng kills_mid kills_bot kills_sup
#> 1         5         7         5         0

Created on 2021-06-24 by the reprex package (v2.0.0)reprex 包( v2.0.0 ) 于 2021 年 6 月 24 日创建

Using data.table might help here.在这里使用data.table可能会有所帮助。 In dcast() each row will be identified by a unique combo of gameid and result, the columns will be spread by position, and filled with values from the variables listed in value.var.dcast()每一行都由一个唯一的 gameid 和 result 组合标识,列将按位置分布,并填充 value.var 中列出的变量的值。

library(data.table)
library(dplyr)
df <- structure(tibble(c("top", "jng", "mid", "bot", "sup"), c("369", "Karsa", "knight", "JackeyLove", "yuyanjia"), 
                 c("Malphite", "Rek'Sai",  "Zoe", "Aphelios", "Braum"), c("1", "1", "1", "1", "1"), c("7", "5", "7", "5", "0"), 
                 c("6079-7578", "6079-7578", "6079-7578", "6079-7578", "6079-7578")), .Names = c("position", "player", "champion", "result", "kills", "gameid"))

df2 <- dcast(setDT(df), gameid + result~position, value.var = list('player','champion','kills'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM