I'd appreciate some assistance in what R code to use in the following situation:
This is the top 11 rows of the dataset:
Sa1_main11 Sa1_main11_2
20401106101 20401106101 -
20401106101 21105128609 -
20401106101 21105128653
20601110501 20601110501
20601110501 20601110530
20601110501 20601110531
20601110501 20601110532
20601110501 20601110533
20601110501 20601110534
20601110501 20601110614
20601110502 20601110502
SA1s are a geographical unit used by the Australian Bureau of Statistics.
This file is a list of what SA1 are contiguous - column 1 is the base SA1, and the second column is the SA1 that adjoins the first SA1.
For example, take the first 3 rows
What I need to do is to produce a dataset where the first line is of the format
20401106101 21105128609 21105128653
I've tried reshape2
package, but the lack of row labels (which would all be identical) makes that not possible for me.
Edit - here is a link to what the data looks like
https://www.dropbox.com/s/tigqdevybskm1bs/Original.JPG
here is a link to what the top 3 rows should look like
It looks like split
might help you:
split(DF[,2], DF[,1])
#$`20401106101`
#[1] 20401106101 21105128609 21105128653
#
#$`20601110501`
#[1] 20601110501 20601110530 20601110531 20601110532 20601110533 20601110534 20601110614
#
#$`20601110502`
#[1] 20601110502
It's unclear what you intend to do with the data. Neither data.frames nor matrices can hold rows of different length. So replicating the exact result is a bit complicated (and not very useful). Anyway, this would come close:
res <- split(DF[,2], DF[,1])
res <- lapply(res, function(x) {
length(x) <- max(sapply(res, length))
x
})
do.call(rbind, res)
# [,1] [,2] [,3] [,4] [,5] [,6] #[,7]
#20401106101 20401106101 21105128609 21105128653 NA NA NA NA
#20601110501 20601110501 20601110530 20601110531 20601110532 20601110533 20601110534 20601110614
#20601110502 20601110502 NA NA NA NA NA NA
Check if this works: ( dat
is the dataset)
library(reshape2)
dat$indx <- with(dat, ave(seq_along(Sa1_main11), Sa1_main11, FUN=seq_along))
dcast(dat, Sa1_main11~indx, value.var="Sa1_main11_2")
# Sa1_main11 1 2 3 4 5
#1 20401106101 20401106101 21105128609 21105128653 NA NA
#2 20601110501 20601110501 20601110530 20601110531 20601110532 20601110533
#3 20601110502 20601110502 NA NA NA NA
# 6 7
#1 NA NA
#2 20601110534 20601110614
#3 NA NA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.