[英]extracting factors from row in data frame
I'm dealing with a ragged dataframe that contains a column of timepoints in the first column, a list of serial numbers in the first row, and the actual inventory data (# of items) in the rest of the dataframe. 我正在处理一个衣衫data的数据框,该数据框在第一列中包含一列时间点,在第一行中包含序列号列表,并在其余数据框中包含实际库存数据(项目数)。
> mydf
V1 V2 V3 V4 V5
1 month item_serial123 item_serial234 item_serial345 item_serial456
2 0 234 120 302 500
3 1 344 125 350 450
4 2 235 129 400 300
5 3 453 145 450 330
6 4 200 130 500 200
7 5 201 501
8 6 504 202
I'm trying to format the data such that I have a 'long' list such that I can run an analysis on each item's serial number. 我正在尝试格式化数据,以便有一个“长”列表,以便可以对每个项目的序列号进行分析。 I can discard the non-numeric data from my list and make sure the data is imported as character objects by setting the stringsAsFactors=FALSE
flag in read.table
, then transforming mydf into a data matrix: 我可以从列表中丢弃非数字数据,并通过在read.table
设置stringsAsFactors=FALSE
标志来确保将数据作为字符对象导入,然后将mydf转换为数据矩阵:
> mydf.new<-data.matrix(mydf)
Warning in data.matrix(mydf) : NAs introduced by coercion
Warning in data.matrix(mydf) : NAs introduced by coercion
Warning in data.matrix(mydf) : NAs introduced by coercion
Warning in data.matrix(mydf) : NAs introduced by coercion
Warning in data.matrix(mydf) : NAs introduced by coercion
> mydf.new
V1 V2 V3 V4 V5
[1,] NA NA NA NA NA
[2,] 0 234 120 302 500
[3,] 1 344 125 350 450
[4,] 2 235 129 400 300
[5,] 3 453 145 450 330
[6,] 4 200 130 500 200
[7,] 5 201 NA 501 NA
[8,] 6 NA NA 504 202
changing the variable V1 to "time" is trivial. 将变量V1更改为“时间”是微不足道的。 What i'm really struggling with is how to extract the serial numbers from mydf[1,2:5]
and assigning them to the appropriate data when I melt/cast mydf.new
. 我真正在挣扎的是如何从mydf[1,2:5]
提取序列号,并在我融化/发布mydf.new
时将其分配给适当的数据。 What I'd like to wind up with is something like this: 我想结束的是这样的事情:
time count serial_number
0 234 item_serial123
1 344 item_serial123
2 235 item_serial123
3 453 item_serial123
4 200 item_serial123
5 201 item_serial123
6 NA item_serial123
etc. etc. Any suggestions? 等。有什么建议吗?
If I correctly understood your question, then you have a data.frame like this: 如果我正确理解了您的问题,那么您将获得一个像这样的data.frame:
> df
month item_serial123 item_serial234 item_serial345 item_serial456
1 0 234 120 302 500
2 1 344 125 350 450
3 2 235 129 400 300
4 3 453 145 450 330
5 4 200 130 500 200
6 5 201 NA 501 NA
7 6 NA NA 504 202
now, you can use reshape
to get the following: 现在,您可以使用reshape
获得以下内容:
> df_new <- reshape(df, idvar = "month", varying = list(2:5),
v.names="item_serial", direction = "long",
new.row.names=1:(prod(dim(df[,-1]))))
> df_new$time <- factor(df_new$time, labels=names(df)[-1])
> df_new
month time item_serial # you may want to use `colnames`to chance them
1 0 item_serial123 234
2 1 item_serial123 344
3 2 item_serial123 235
4 3 item_serial123 453
5 4 item_serial123 200
6 5 item_serial123 201
7 6 item_serial123 NA
8 0 item_serial234 120
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.