My data is a .xlsx
pivot table. There are several sheets there, but I need only one for my analysis. On this sheet I have a data frame which looks like this
df <- data.frame(ind = c("ind1", "ind1", "ind1", "ind1",
"ind2", "ind2", "ind2", "ind2",
"ind3", "ind3", "ind3", "ind3",
"ind4", "ind4", "ind4", "ind4"),
shr = c(-0.23, 0, 0.12, 0.68,
-0.54, 0.80, 0.14, -0.23,
0.48, 0.94, -0.01, 0.31,
0.18, 0.11, 0.98, 0.05))
And other columns with different types of data. I don't need them, only these two I have presented in an example. So, the df is:
df
# ind shr
#1 ind1 -0.23
#2 ind1 0.00
#3 ind1 0.12
#4 ind1 0.68
#5 ind2 -0.54
#6 ind2 0.80
#7 ind2 0.14
#8 ind2 -0.23
#9 ind3 0.48
#10 ind3 0.94
#11 ind3 -0.01
#12 ind3 0.31
#13 ind4 0.18
#14 ind4 0.11
#15 ind4 0.98
#16 ind4 0.05
What I need is to transform this dataframe to this form:
df
# shr
# ind1 -0.23 0.00 0.12 0.68
# ind2 -0.54 0.80 0.14 -0.23
# ind3 .....
# ind4 .....
Or even it would be also convenient if my data have looked like this:
df
# ind1 ind2 ind3 ind4
# -0.23 . .
# 0.00 . .
# 0.12 . .
# 0.68 . .
In short, I want to make my data compact and comfortable for further analysis. The main difficulties are that my initial file with data is the .xlsx
with different sheets and pivot table.
(1) How do I extract data from .xlsx
file with several sheets? (2) How can I get desirable df structure?
Here's how to tranform your data. pivot_wider
from tidyr
requires an ID column. Here I create one using mutate(row = row_number())
. To read the data from excel, I suggest the readxl
package. The read_xlsx
function allows you to specify the excel sheet and the range.
library(dplyr)
df %>%
group_by(ind) %>%
mutate(row = row_number()) %>%
pivot_wider(names_from= ind, values_from = shr) %>%
select(-row)
# A tibble: 4 x 4
ind1 ind2 ind3 ind4
<dbl> <dbl> <dbl> <dbl>
1 -0.23 -0.54 0.48 0.18
2 0 0.8 0.94 0.11
3 0.12 0.14 -0.01 0.98
4 0.68 -0.23 0.31 0.05
you can use below code:
list1<-c(1:4)
df$col<-1:nrow(df)
df$remainder<-df$col%%4
df$col<-NULL
dcast(df,ind~remainder, value.var = "shr" )
> ind 0 1 2 3
1 ind1 0.68 -0.23 0.00 0.12
2 ind2 -0.23 -0.54 0.80 0.14
3 ind3 0.31 0.48 0.94 -0.01
4 ind4 0.05 0.18 0.11 0.98
dcast(df,remainder~ind, value.var = "shr" )
> remainder ind1 ind2 ind3 ind4
1 0 0.68 -0.23 0.31 0.05
2 1 -0.23 -0.54 0.48 0.18
3 2 0.00 0.80 0.94 0.11
4 3 0.12 0.14 -0.01 0.98
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.