简体   繁体   中英

Reshaping from wide to long in R

I am trying to learn R and have a question on reshaping the following dataset.

bankname,date,year,month,quarter,totalliabilities,corr1,amt1,corr2,amt2
Bank of Pittsgurgh,2/7/1950,1950,2,1,237991,#N/A,#N/A,#N/A,#N/A
Bank of Pittsgurgh,5/2/1950,1950,5,2,258865,#N/A,#N/A,#N/A,#N/A
Bank of Pittsgurgh,8/7/1950,1950,8,3,218524,#N/A,#N/A,#N/A,#N/A,#N/A
Bank of Pittsgurgh,11/6/1950,1950,11,4,237520,First Bank,17472,Third Bank,30711
The Arsenal Bank,2/2/1950,1950,2,1,218508,#N/A,#N/A,#N/A,#N/A
The Arsenal Bank,5/3/1950,1950,5,2,224110,#N/A,#N/A,#N/A,#N/A
The Arsenal Bank,8/2/1950,1950,8,3,216071,#N/A,#N/A,#N/A,#N/A
The Arsenal Bank,11/1/1950,1950,11,4,226166,National Bank,20966,Trust Company,873

When I run the following code to reshape, I get the following error. How can I fix this? Also, I would like to destring amt variable to numeric variables and remove #NA in this dataset. How can I destring this variable?

-First I tried to create "id"

bank_test2$id<-as.numeric(as.factor(bank_test2$bankname))

-Then I tried to create a unique time variable using year and quarter

bank_test2$yq<-as.factor(paste(as.character(bank_test2$year),as.character(bank_test2$quarter)))   
bank_test2<-bank_test2[with(bank_test2, order(yq,id)),]   

-reshape the data

v <- outer(c("corr", "amt"), c(1:2), FUN=paste0)   
bank_test2<-reshape(bank_test2, direction='long', varying=c(v), sep='')      


Error in `row.names<-.data.frame`(`*tmp*`, value = paste(d[, idvar], times[1L],  : 
  duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘1.1’, ‘2.1’ 

id, bankname,   date,   year,   month,  quarter,    totalliabilities,   node,   corr,   amt      
1,  Bank of Pittsgurgh, 2/7/1950,   1950,   2,  1,  237991, 1,  #N/A,   #N/A      
1,  Bank of Pittsgurgh, 5/2/1950,   1950,   5,  2,  258865, 1,  #N/A,   #N/A   
1,  Bank of Pittsgurgh, 8/7/1950,   1950,   8,  3,  218524, 1,  #N/A,   #N/A   
1,  Bank of Pittsgurgh, 11/6/1950,  1950,   11, 4,  237520, 1,  First Bank, 21906   
1,  Bank of Pittsgurgh, 2/7/1950,   1950,   2,  1,  237991, 2,  #N/A,   #N/A   
1,  Bank of Pittsgurgh, 5/2/1950,   1950,   5,  2,  258865, 2,  #N/A,   #N/A   
1,  Bank of Pittsgurgh, 8/7/1950,   1950,   8,  3,  218524, 2,  #N/A,   #N/A   
1,  Bank of Pittsgurgh, 11/6/1950,  1950,   11, 4,  237520, 2,  Third Bank, 4442   
2,  The Arsenal Bank,   2/2/1950,   1950,   2,  1,  218508, 1,  #N/A,   #N/A   
2,  The Arsenal Bank,   5/3/1950,   1950,   5,  2,  224110, 1,  #N/A,   #N/A   
2,  The Arsenal Bank,   8/2/1950,   1950,   8,  3,  216071, 1,  #N/A,   #N/A   
2,  The Arsenal Bank,   11/1/1950,  1950,   11, 4,  226166, 1,  National Bank, 43224      
2,  The Arsenal Bank,   2/2/1950,   1950,   2,  1,  218508, 2,  #N/A,   #N/A   
2   The Arsenal Bank,   5/3/1950,   1950,   5,  2,  224110, 2,  #N/A,   #N/A   
2   The Arsenal Bank,   8/2/1950,   1950,   8,  3,  216071, 2,  #N/A,   #N/A   
2   The Arsenal Bank,   11/1/1950,  1950,   11, 4,  226166, 2,  Trust Company,  3682   

I want the data to be organized this way, with a newly created bankid from "bankname" and create unique rownames using id and time value. Then I want to remove all the #NA in the dataset.
How should I do it?

Thank you in advance.

That particular error is complaining about rownames not being unique. To avoid it, you need to pass reshape a unique id for each row as "idvar". The best approach would be to create a new column in the original data frame with such unique id, but you can also use any other field that is unique. For example, totalliabilities is unique in your data frame, so you can use that:

bank_test2<-reshape(bank_test2, direction='long', varying=c(v), sep='',idvar="totalliabilities")

That is obviously not the best choice for an id but I hope that points you in the right direction.

I have tried to put your data in a way that's easy to use and reproduce. Then I took a subset of your data b and tried to put it a long format. Not sure if it is the desired output.

library(reshape2)
library(stringr)

a <- structure(list(bankname = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,2L, 2L, 2L, 2L), .Label = c("Bank of Pittsgurgh", "The Arsenal Bank_Pittsburgh"), class "factor"), date = structure(c(2L, 3L, 6L, 8L, 9L, 12L,13L, 15L, 1L, 4L, 5L, 7L, 10L, 11L, 14L, 16L), .Label = c("1950/02/02", "1950/02/07", "1950/05/02", "1950/05/03", "1950/08/02", "1950/08/07", "1950/11/01", "1950/11/06", "1951/02/05", "1951/02/06", "1951/05/01", "1951/05 07", "1951/08/06", "1951/08/07", "1951/11/03", "1951/11/06"), class = "factor"), year = c(1950L, 1950L, 1950L, 1950L, 1951L, 1951L, 1951L, 1951L, 1950L, 1950L, 1950L, 1950L, 1951L, 1951L, 1951L, 1951L), month = c(2L, 5L, 8L, 11L, 2L, 5L, 8L, 11L, 2L, 5L, 8L, 11L, 2L, 5L, 8L, 11L), quarter = c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), totalliabilities = c(237991.5469, 258865.6563, 218524, 237520.5469, 276052.1875, 255812.7031, 62426.625, 272447.375, 218508.4844, 224110.5156, 216071.9063, 226166.7969, 244241.625, 228508.0625, 254008.8594, 268540.1563), corr1 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 1L, 1L, 3L ), .Label = c("#N/A", "First National Bank", "National Bank of Commerce" ), class = "factor"), amt1 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 4L, 1L, 1L, 1L, L, 1L, 1L, 1L, 5L), .Label = c("#N/A", "17472.98047", "20966.50977", "21906.07031",  43224.62891" ), class = "factor"), corr2 = structure(c(1L, 1L, 1L, 2L, 1L, 1L, 1L, 3L, 1L, 1L,  L, 5L, 1L, 1L, 1L, 4L), .Label = c("#N/A", "Third National Bank", "Third National Bank", "Union Trust Company", "Unit Trust Company Of New York"), class = "factor"), amt2 = structure(c(1L,  1L, 1L, 2L, 1L, 1L, 1L, 4L, 1L, 1L, 1L, 5L, 1L, 1L, 1L, 3L ), .Label = c("#N/A", "30711.35938", "3682.449951", "4442.399902", "873.1699829"), class = "factor"), X = structure(c(1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("", "#N/A"), class = "factor"), id = c(1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2)), .Names = c("bankname", "date", year",  month", "quarter", "totalliabilities", "corr1", "amt1", "corr2", "amt2", "X", "id"), row.names = c(NA, 16L), class = "data.frame")


b<- a[c(8,12,16),c(1,2,7,8,9,10)]
b
# put the data related to corr1 and amt1 in one column type1 same for type2
b$type1 <-  paste0(b$corr1,"|",b$amt1)
b$type2 <- paste0(b$corr2,"|",b$amt2)

# melt the types together
c<- melt(b, measure.vars=c(7,8))

c
# split them them back
long <- data.frame(str_split_fixed(c$value,"\\|",2))
d <- cbind(c,long)

d[,c(1,9,10)]


#                     bankname                             X1          X2
#1          Bank of Pittsgurgh            First National Bank 21906.07031
#2 The Arsenal Bank_Pittsburgh      National Bank of Commerce 20966.50977
#3 The Arsenal Bank_Pittsburgh      National Bank of Commerce 43224.62891
#4          Bank of Pittsgurgh            Third National Bank 4442.399902
#5 The Arsenal Bank_Pittsburgh Unit Trust Company Of New York 873.1699829
#6 The Arsenal Bank_Pittsburgh            Union Trust Company 3682.449951

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM