I have a really huge data set, I share it by a link because I don't know any other way of showing it to you in another way. I need that the file looks like this . The second link is an example of the total file because it is really long to do it "by hand".
It hs been suggested to me to try to do this But it seems to me that my example in that post wasn't enough because with any of the proposals I am getting the result that I need. I've been trying for a week and I really don't know how to solve it, so I have decided to post my real data using a link in case that is more helpful. When I try Using dplyr
and tidyr
I get this warning message
d<-read.csv("m.tot3.csv",header=TRUE, sep=",",dec=".")
df<-data.frame(d)
library(dplyr)
library(tidyr)
library(data.table)
sub1 <- df[c(TRUE, FALSE),]
sub2 <- df[c(FALSE, TRUE),]
tibble(ind = c(row(sub1)), col1 = factor(unlist(sub1), levels = letters[1:1688]),
col2 = as.integer(unlist(sub2))) %>%
pivot_wider(names_from = col1, values_from = col2,
values_fill = list(col2 = 0)) %>%
select(-ind)
I get this error message
Error: Can't convert <double> to <list>.
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning message:
Values in `col2` are not uniquely identified; output will contain list-cols.
Use `values_fn = list(col2 = list)` to suppress this warning.
Use `values_fn = list(col2 = length)` to identify where the duplicates arise
Use `values_fn = list(col2 = summary_fun)` to summarise duplicates
Using reshape
sub1 <- df[c(TRUE, FALSE),]
sub2 <- df[c(FALSE, TRUE),]
out <- reshape(
data.frame(ind = c(row(sub1)),
col1 = factor(unlist(sub1), levels = letters[1:1688]),
col2 = as.integer(unlist(sub2))),
idvar = 'ind', direction = 'wide', timevar = 'col1')[-1]
names(out) <- sub("col2\\.", "", names(out))
out[is.na(out)] <- 0
row.names(out) <- NULL
I get this warning message
Warning messages:
1: In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, :
there are records with missing times, which will be dropped.
2: In reshapeWide(data, idvar = idvar, timevar = timevar, varying = varying, :
multiple rows match for col1=NA: first taken`
finally, using data.table
d_test<-melt(
setDT(
setnames(
data.table::transpose(df),
paste(rep(1:(nrow(d)/2), each = 2), c("name", "value"), sep = "_"))),
measure = patterns("name", "value"))[
, dcast(.SD, variable ~ value1, value.var = "value2", fill = 0)]
I get this
I really don't know how to solve it and any answer is really welcome Regards
One of the issue is that the factor
conversion with levels
return all NA
because the levels
are not matching with unique
values in the dataset
library(dplyr)
library(tidyr)
library(data.table)
df1 <- tibble(ind = c(row(sub1)),
col1 = factor(unlist(sub1), levels = unique(unlist(sub1))),
col2 = as.integer(unlist(sub2)))
Second issue is there are duplicates, so we create a sequence column by 'col1'
out <- df1 %>%
mutate(rn = rowid(col1)) %>%
pivot_wider(names_from = col1, values_from = col2,
values_fill = list(col2 = 0)) %>%
select(-rn)
dim(out)
#[1] 23 3704
out[1:5, 1:5]
# A tibble: 5 x 5
# ind `69` `70` `71` `82`
# <int> <int> <int> <int> <int>
#1 1 2 0 0 0
#2 2 0 4 0 0
#3 3 0 0 6 0
#4 4 0 0 0 8
#5 5 0 0 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.