简体   繁体   English

如何遍历 dataframe 中的列,应用扩展,并在 R 中创建新的 dataframe?

[英]How to loop over the columns in a dataframe, apply spread, and create a new dataframe in R?

I have a dataframe which looks like this example, just much larger:我有一个 dataframe 看起来像这个例子,只是更大:

Name  date         var1  var2  var3 
Peter 2020-03-30   0.4   0.5   0.2
Ben   2020-10-14   0.6   0.4   0.1
Mary  2020-12-06   0.7   0.2   0.9

I want to create a new dataframe for each variable (ie, var1, var2, var3), which should look like this, eg, for var1:我想为每个变量(即 var1、var2、var3)创建一个新的 dataframe,它应该如下所示,例如,对于 var1:

date         Peter    Ben    Mary
2020-03-30   0.4      NA     NA
2020-10-14   NA       0.6    NA
2020-12-06   NA       NA     0.7

I can do it with spread for one variable at a time:我可以一次对一个变量进行spread

df_new <-tidyr::spread(df[,-c(2:3)], name, var1)

But I could not figure out how to loop it over all columns as I am new to R.但是我不知道如何在所有列上循环它,因为我是 R 的新手。

Thank you!谢谢!

First we want to create a list of data frames and then pivot each one:首先我们要创建一个数据帧列表,然后是 pivot 每个:

library(tidyverse)
res_list = dat %>% 
   pivot_longer(cols = contains("var")) %>% 
   split(., .$name) %>% 
   map(. %>% pivot_wider(names_from="Name"))
$var1
# A tibble: 3 × 5
  date       name  Peter   Ben  Mary
  <date>     <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var1    0.4  NA    NA  
2 2020-10-14 var1   NA     0.6  NA  
3 2020-12-06 var1   NA    NA     0.7

$var2
# A tibble: 3 × 5
  date       name  Peter   Ben  Mary
  <date>     <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var2    0.5  NA    NA  
2 2020-10-14 var2   NA     0.4  NA  
3 2020-12-06 var2   NA    NA     0.2

$var3
# A tibble: 3 × 5
  date       name  Peter   Ben  Mary
  <date>     <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var3    0.2  NA    NA  
2 2020-10-14 var3   NA     0.1  NA  
3 2020-12-06 var3   NA    NA     0.9

Then you can access them like然后你可以像访问它们

res_list["var1"]

# A tibble: 3 × 5
  date       name  Peter   Ben  Mary
  <date>     <chr> <dbl> <dbl> <dbl>
1 2020-03-30 var1    0.4  NA    NA  
2 2020-10-14 var1   NA     0.6  NA  
3 2020-12-06 var1   NA    NA     0.7

We can do it this way: The beginning is similar to user438383 solution.我们可以这样做:开始类似于 user438383 的解决方案。 But then we name each tibble in the list and save them to the global environment within the the pipe.但随后我们命名列表中的每个 tibble 并将它们保存到 pipe 内的全局环境中。 For this we need massign from collapse package: thanks to @akrun How to save each named tibble in a list, as a separate tibble or dataframe in one run为此,我们需要从collapse massign中恢复:感谢@akrun 如何将每个命名的 tibble 保存在列表中,作为单独的 tibble 或 dataframe 一次运行


library(tidyverse)
library(collapse)
df %>% 
  pivot_longer(cols = contains("var")) %>%
  group_split(name) %>%
  setNames(unique(df$Name)) %>%
  map(. %>%  pivot_wider(names_from = Name)) %>%
  map(. %>%  select(-name)) %>% 
  massign(names(.), ., .GlobalEnv)

Ben
Mary
Peter
 A tibble: 3 x 4
  date       Peter   Ben  Mary
  <chr>      <dbl> <dbl> <dbl>
1 2020-03-30   0.5  NA    NA  
2 2020-10-14  NA     0.4  NA  
3 2020-12-06  NA    NA     0.2
> Mary
# A tibble: 3 x 4
  date       Peter   Ben  Mary
  <chr>      <dbl> <dbl> <dbl>
1 2020-03-30   0.2  NA    NA  
2 2020-10-14  NA     0.1  NA  
3 2020-12-06  NA    NA     0.9
> Peter
# A tibble: 3 x 4
  date       Peter   Ben  Mary
  <chr>      <dbl> <dbl> <dbl>
1 2020-03-30   0.4  NA    NA  
2 2020-10-14  NA     0.6  NA  
3 2020-12-06  NA    NA     0.7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM