简体   繁体   中英

How to manipulate cases and variables in a data frame

I am just learning data wrangling. Currently I am data wrangling a data frame of 6422 observation and 20 variables.

I have added an example to show my problem. I have several columns which have identical cases, which i want to summarize to 1 case (in the example column x1,x2,x3,x4 and row 1,2,3). The corresponding columns value1 (in the example 1,2,3) and value2 (in the example 7,8,9) should be manipulated in such a way that they are transformed into the same row.

My goal is to manipulate the data frame df1 into the data frame df7

df1 <- data.frame(x1,x2,x3,x4,stats.title,value1,value2)

#   x1 x2 x3 x4 stats.title value1 value2
#1  A  B  C  D           I      1      7
#2  A  B  C  D           J      2      8
#3  A  B  C  D           K      3      9
#4  E  F  G  H           I      4     10
#5  E  F  G  H           J      5     11
#6  E  F  G  H           K      6     12

df 7 <- rbind(df5,df6)
#   x1 x2 x3 x4 value1_I value1_J value1_K value2_I value2_J value2_K
#1  A  B  C  D        1        2        3        7        8        9
#4  E  F  G  H        4        5        6       10       11       12

Attached the example:

library(magrittr)
library(dplyr)
library(tidyr)

x1 <- rep(c("A","E"), each=3)
x2<-  rep(c("B","F"), each=3)
x3<-  rep(c("C","G"), each=3)
x4<-  rep(c("D","H"), each=3)
stats.title <- rep(c("I","J","K"), times=2)
value1 <- (1:6)
value2 <- (7:12)

df1 <- data.frame(x1,x2,x3,x4,stats.title,value1,value2)

#   x1 x2 x3 x4 stats.title value1 value2
#1  A  B  C  D           I      1      7
#2  A  B  C  D           J      2      8
#3  A  B  C  D           K      3      9
#4  E  F  G  H           I      4     10
#5  E  F  G  H           J      5     11
#6  E  F  G  H           K      6     12

df2 <- df1[1,] %>% select(-stats.title,-value1,-value2)
#   x1 x2 x3 x4
#1  A  B  C  D
df2.1 <- df1[4,] %>% select(-stats.title,-value1,-value2)
#  x1 x2 x3 x4
#4  E  F  G  H

df3 <- df1 %>% select(stats.title,value1,value2)
#   stats.title value1 value2
#1           I      1      7
#2           J      2      8
#3           K      3      9
#4           I      4     10
#5           J      5     11
#6           K      6     12

df4 <- pivot_wider(df3[1:3,], names_from = stats.title, values_from = c("value1","value2")) 
df4 <- as.data.frame(df4)
df4.1 <- pivot_wider(df3[4:6,], names_from = stats.title, values_from = c("value1","value2")) 
df4.1 <- as.data.frame(df4.1)

df5 <- data.frame(df2,df4)
df6 <- data.frame(df2.1,df4.1)
df 7 <- rbind(df5,df6)
#   x1 x2 x3 x4 value1_I value1_J value1_K value2_I value2_J value2_K
#1  A  B  C  D        1        2        3        7        8        9
#4  E  F  G  H        4        5        6       10       11       12

Thanks for the help in advance.

I tried the example shown above. Due to the fact that i have 6422 rows, i am searching for an effective way to manipulate the data frame.

You showed enough efforts: Here is the solution: The trick is first to pivot_longer :

library(dplyr)
library(tidyr)

df1 %>% 
  pivot_longer(c(value1, value2)) %>% 
  pivot_wider(
    id_cols = x1:x4, # you can omit
    names_from = c(name, stats.title),
    values_from = value)
  x1    x2    x3    x4    value1_I value2_I value1_J value2_J value1_K value2_K
  <chr> <chr> <chr> <chr>    <int>    <int>    <int>    <int>    <int>    <int>
1 A     B     C     D            1        7        2        8        3        9
2 E     F     G     H            4       10        5       11        6       12

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM