简体   繁体   中英

How do I transform a data frame to the desired structure and form?

My data is a .xlsx pivot table. There are several sheets there, but I need only one for my analysis. On this sheet I have a data frame which looks like this

df <- data.frame(ind = c("ind1", "ind1", "ind1", "ind1", 
                         "ind2", "ind2", "ind2", "ind2",
                         "ind3", "ind3", "ind3", "ind3",
                         "ind4", "ind4", "ind4", "ind4"),
                 shr = c(-0.23, 0, 0.12, 0.68,
                         -0.54, 0.80, 0.14, -0.23,
                          0.48, 0.94, -0.01, 0.31,
                          0.18, 0.11, 0.98, 0.05))

And other columns with different types of data. I don't need them, only these two I have presented in an example. So, the df is:

df
#    ind   shr
#1  ind1 -0.23
#2  ind1  0.00
#3  ind1  0.12
#4  ind1  0.68
#5  ind2 -0.54
#6  ind2  0.80
#7  ind2  0.14
#8  ind2 -0.23
#9  ind3  0.48
#10 ind3  0.94
#11 ind3 -0.01
#12 ind3  0.31
#13 ind4  0.18
#14 ind4  0.11
#15 ind4  0.98
#16 ind4  0.05

What I need is to transform this dataframe to this form:

df
#      shr
# ind1 -0.23 0.00 0.12 0.68
# ind2 -0.54 0.80 0.14 -0.23
# ind3 .....
# ind4 .....

Or even it would be also convenient if my data have looked like this:

df
# ind1   ind2   ind3   ind4
# -0.23   .      .
#  0.00   .      .
#  0.12   .      .
#  0.68   .      .

In short, I want to make my data compact and comfortable for further analysis. The main difficulties are that my initial file with data is the .xlsx with different sheets and pivot table.

(1) How do I extract data from .xlsx file with several sheets? (2) How can I get desirable df structure?

Here's how to tranform your data. pivot_wider from tidyr requires an ID column. Here I create one using mutate(row = row_number()) . To read the data from excel, I suggest the readxl package. The read_xlsx function allows you to specify the excel sheet and the range.

library(dplyr)
df %>%
  group_by(ind) %>%
  mutate(row = row_number()) %>%
  pivot_wider(names_from= ind, values_from = shr) %>%
  select(-row)

# A tibble: 4 x 4
   ind1  ind2  ind3  ind4
  <dbl> <dbl> <dbl> <dbl>
1 -0.23 -0.54  0.48  0.18
2  0     0.8   0.94  0.11
3  0.12  0.14 -0.01  0.98
4  0.68 -0.23  0.31  0.05

you can use below code:

list1<-c(1:4)
df$col<-1:nrow(df)
df$remainder<-df$col%%4
df$col<-NULL

dcast(df,ind~remainder, value.var = "shr" )

>   ind     0     1    2     3
1 ind1  0.68 -0.23 0.00  0.12
2 ind2 -0.23 -0.54 0.80  0.14
3 ind3  0.31  0.48 0.94 -0.01
4 ind4  0.05  0.18 0.11  0.98

dcast(df,remainder~ind, value.var = "shr" )

>  remainder  ind1  ind2  ind3 ind4
1         0  0.68 -0.23  0.31 0.05
2         1 -0.23 -0.54  0.48 0.18
3         2  0.00  0.80  0.94 0.11
4         3  0.12  0.14 -0.01 0.98

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM