[英]Turn different sized rows into columns
I am reading in a data file with many different rows, all of which can have different lengths like so: 我正在读取包含许多不同行的数据文件,所有这些行的长度都可以像这样:
dataFile <- read.table("file.txt", as.is=TRUE);
The rows can be as follows: 这些行可以如下所示:
1 5 2 6 2 1
2 6 24
2 6 1 5 2 7 982 24 6
25 2
I need the rows to be transformed into columns. 我需要将行转换为列。 I'll be then using the columns for a violin plot like so: 然后,我将这些列用于小提琴图,如下所示:
names(dataCol)[1] <- "x";
jpeg("violinplot.jpg", width = 1000, height = 1000);
do.call(vioplot,c(dataCol,))
dev.off()
I'm assuming there will be an empty string/placeholder for any column with fewer entries than the column with the maximum number of entries. 我假设任何条目少于具有最大条目数的列的列都将有一个空字符串/占位符。 How can it be done? 如何做呢?
Use the fill = TRUE
argument in read.table
. 在read.table
使用fill = TRUE
参数。 Then to change rows to columns, use t
to transpose. 然后要将行更改为列,请使用t
进行转置。 Using your data this would look like... 使用您的数据,看起来像...
df <- read.table( text = "1 5 2 6 2 1
2 6 24
2 6 1 5 2 7 982 24 6
25 2
" , header = FALSE , fill = TRUE )
df
# V1 V2 V3 V4 V5 V6 V7 V8 V9
#1 1 5 2 6 2 1 NA NA NA
#2 2 6 24 NA NA NA NA NA NA
#3 2 6 1 5 2 7 982 24 6
#4 25 2 NA NA NA NA NA NA NA
t(df)
# [,1] [,2] [,3] [,4]
#V1 1 2 2 25
#V2 5 6 6 2
#V3 2 24 1 NA
#V4 6 NA 5 NA
#V5 2 NA 2 NA
#V6 1 NA 7 NA
#V7 NA NA 982 NA
#V8 NA NA 24 NA
#V9 NA NA 6 NA
EDIT: apparently read.table
has a fill=TRUE
option, which is WAYYYY easier than my answer. 编辑:显然read.table
有一个fill=TRUE
选项,比我的回答要容易WAYYYY。
I've never used vioplot before, and that seems like a weird way to make a function call (instead of something like vioplot(dataCol)
), but I have worked with ragged arrays before, so I'll try that. 我以前从未使用过vioplot,这似乎是进行函数调用的一种怪异方式(而不是像vioplot(dataCol)
类的东西),但是我之前使用过vioplot(dataCol)
数组,所以我会尝试一下。
Have you read the data in yet? 你读过数据了吗? That tends to be the hardest part. 这往往是最困难的部分。 The code below reads the above data from a file called temp.txt
into a matrix called out2
下面的代码从名为temp.txt
的文件中将上述数据读取到名为out2
的矩阵中
file = 'temp.txt'
dat = readChar(file,file.info(file)$size)
split1 = strsplit(dat,"\n")
split2 = strsplit(split1[[1]]," ")
n = max(unlist(lapply(split2,length)))
out=matrix(nrow=n,ncol=length(split2))
tFun = function(i){
vect = as.numeric(split2[[i]])
length(vect)=n
out[,i]=vect
}
out2 = sapply(1:length(split2),tFun)
I'll try and explain what I've done: the first step is to read in every character via readChar
. 我将尝试解释所做的事情:第一步是通过readChar
读取每个字符。 You then split the lines, then the elements within each line to get the list split2
, where each element of the list is a row of the input file. 然后拆分行,然后拆分每行中的元素以获取列表split2
,其中列表中的每个元素都是输入文件的一行。
From there you create a blank matrix that would be the right size for your data, then iterate through the list and assign each element to a column. 从那里创建一个空白矩阵,该矩阵将适合您的数据大小,然后遍历列表并将每个元素分配给一列。
It's not pretty, but it works! 它不漂亮,但是可以用!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.