从数据框中的行中提取因子

Question

I'm dealing with a ragged dataframe that contains a column of timepoints in the first column, a list of serial numbers in the first row, and the actual inventory data (# of items) in the rest of the dataframe. 我正在处理一个衣衫data的数据框，该数据框在第一列中包含一列时间点，在第一行中包含序列号列表，并在其余数据框中包含实际库存数据（项目数）。

> mydf
    V1             V2             V3             V4             V5
1 month item_serial123 item_serial234 item_serial345 item_serial456
2     0            234            120            302            500
3     1            344            125            350            450
4     2            235            129            400            300
5     3            453            145            450            330
6     4            200            130            500            200
7     5            201                           501               
8     6                                          504            202

I'm trying to format the data such that I have a 'long' list such that I can run an analysis on each item's serial number. 我正在尝试格式化数据，以便有一个“长”列表，以便可以对每个项目的序列号进行分析。 I can discard the non-numeric data from my list and make sure the data is imported as character objects by setting the stringsAsFactors=FALSE flag in read.table , then transforming mydf into a data matrix: 我可以从列表中丢弃非数字数据，并通过在read.table设置stringsAsFactors=FALSE标志来确保将数据作为字符对象导入，然后将mydf转换为数据矩阵：

> mydf.new<-data.matrix(mydf)
Warning in data.matrix(mydf) : NAs introduced by coercion
Warning in data.matrix(mydf) : NAs introduced by coercion
Warning in data.matrix(mydf) : NAs introduced by coercion
Warning in data.matrix(mydf) : NAs introduced by coercion
Warning in data.matrix(mydf) : NAs introduced by coercion
> mydf.new
     V1  V2  V3  V4  V5
[1,] NA  NA  NA  NA  NA
[2,]  0 234 120 302 500
[3,]  1 344 125 350 450
[4,]  2 235 129 400 300
[5,]  3 453 145 450 330
[6,]  4 200 130 500 200
[7,]  5 201  NA 501  NA
[8,]  6  NA  NA 504 202

changing the variable V1 to "time" is trivial. 将变量V1更改为“时间”是微不足道的。 What i'm really struggling with is how to extract the serial numbers from mydf[1,2:5] and assigning them to the appropriate data when I melt/cast mydf.new . 我真正在挣扎的是如何从mydf[1,2:5]提取序列号，并在我融化/发布mydf.new时将其分配给适当的数据。 What I'd like to wind up with is something like this: 我想结束的是这样的事情：

   time count serial_number
   0    234 item_serial123
   1    344 item_serial123
   2    235 item_serial123
   3    453 item_serial123
   4    200 item_serial123
   5    201 item_serial123
   6    NA  item_serial123

etc. etc. Any suggestions? 等。有什么建议吗？

Answer 1

If I correctly understood your question, then you have a data.frame like this: 如果我正确理解了您的问题，那么您将获得一个像这样的data.frame：

> df
  month item_serial123 item_serial234 item_serial345 item_serial456
1     0            234            120            302            500
2     1            344            125            350            450
3     2            235            129            400            300
4     3            453            145            450            330
5     4            200            130            500            200
6     5            201             NA            501             NA
7     6             NA             NA            504            202

now, you can use reshape to get the following: 现在，您可以使用reshape获得以下内容：

> df_new <- reshape(df, idvar = "month",  varying = list(2:5), 
                    v.names="item_serial", direction = "long",
                    new.row.names=1:(prod(dim(df[,-1]))))
> df_new$time <- factor(df_new$time, labels=names(df)[-1])
> df_new
   month           time item_serial  # you may want to use `colnames`to chance them
1      0 item_serial123         234
2      1 item_serial123         344
3      2 item_serial123         235
4      3 item_serial123         453
5      4 item_serial123         200
6      5 item_serial123         201
7      6 item_serial123          NA
8      0 item_serial234         120

从数据框中的行中提取因子

问题描述

1 个解决方案

解决方案1
0 2013-11-06 23:54:48

从数据框中的行中提取因子

问题描述

1 个解决方案

解决方案1 0 2013-11-06 23:54:48

解决方案1
0 2013-11-06 23:54:48