简体   繁体   English

R:按日期将列中的变量转置为行

[英]R: Transpose variables in column into row by date

[Solution] I found my own solution to the problem. [解决方案] 我找到了我自己的问题解决方案。

  require(data.table)
  dt <- data.table(dataframe)
  newdt<-dt[, number := 1:.N, by = Date] 
  data<-as.data.frame(newdt)

data_wide <- reshape(newdt, direction="wide", idvar = "Date", timevar =   "number")

data_wide

6/26/2015   209.3     230.2     80.4     s2
6/27/2015   209.1     227.2     239.2    s2

Edit 2: I think that the solution provided by others would work if I can figure out how to create a new column in my original dataframe that labels the number of the rows (or valid values in Variable 1) for each date.编辑 2:我认为,如果我能弄清楚如何在原始数据框中创建一个新列来标记每个日期的行数(或变量 1 中的有效值),那么其他人提供的解决方案将起作用。 In other words, I would like to restart labeling rows at each change in date.换句话说,我想在每次更改日期时重新​​开始标记行。 For example,例如,

6/26/2015   1   209.3
6/26/2015   2   230.2
6/26/2015   3   80.4
6/26/2015   4   s2
6/27/2015   1 ....

And then I could use the reshape methods described in the other posts.然后我可以使用其他帖子中描述的重塑方法。

Edit: This is close to How to reshape data from long to wide format?编辑:这接近于如何将数据从长格式重塑为宽格式? , but in order for those answers to fit my data I would need a new column here that assigns a number 1-length variable 1 for each day to the variables, which I don't have. ,但为了让这些答案适合我的数据,我需要一个新的列,为每天分配一个数字 1 长度的变量 1 给我没有的变量。

In other words, if I use换句话说,如果我使用

data_wide <- reshape(data,direction="wide", idvar = "Date", timevar = "Variable 1")
data_wide

Then, because there are 200+ unique entries for Variable 1, data_wide had 200+ columns for each date, with most of those being na because values for Variable 1 typically only exist on a single date in the data, and the data is a time series of over 5000 dates.然后,因为变量 1 有 200 多个唯一条目,所以 data_wide 每个日期有 200 多个列,其中大部分是 na,因为变量 1 的值通常只存在于数据中的单个日期,并且数据是时间超过 5000 个日期的系列。 ~~~~~~~~~~~~ ~~~~~~~~~~~~

I have a dataframe where column 1 = Date and column 2 = Variable 1 where Variable 1 is usually numeric.我有一个数据框,其中第 1 列 = Date和第 2 列 = Variable 1 ,其中变量 1 通常是数字。 Eg例如

6/26/2015   209.3    
6/26/2015   230.2    
6/26/2015   80.4     
6/26/2015   s2       
6/27/2015   209.1    
6/27/2015   227.2    
6/27/2015   239.2    
6/27/2015   s2       

I would like to be able to label the rows with a new value that is simply the row number for that date.我希望能够用新值标记行,该值只是该日期的行号。

6/26/2015   209.3    1
6/26/2015   230.2    2
6/26/2015   80.4     3
6/26/2015   s2       4
6/27/2015   209.1    1
6/27/2015   227.2    2
6/27/2015   239.2    3
6/27/2015   s2       4

[Original Post] [原帖]

I have a dataframe where column 1 = Date and column 2 = Variable 1 where Variable 1 is usually numeric.我有一个数据框,其中第 1 列 = Date和第 2 列 = Variable 1 ,其中变量 1 通常是数字。 Eg例如

6/26/2015   209.3
6/26/2015   230.2
6/26/2015   80.4
6/26/2015   s2
6/27/2015   209.1
6/27/2015   227.2
6/27/2015   239.2
6/27/2015   s2
6/28/2015   230.2
6/28/2015   228.2
6/28/2015   36.4
6/28/2015   s2
6/29/2015   209.3
6/29/2015   15.3
6/29/2015   15.4
6/29/2015   s2

I would like to be able to "transpose" the data such that each date has its own row, and each Variable for the same date is in the same row.我希望能够“转置”数据,以便每个日期都有自己的行,并且同一日期的每个变量都在同一行中。 Eg例如

6/26/2015   209.3     230.2     80.4     s2
6/27/2015   209.1     227.2     239.2    s2

And so on.等等。 Although this example shows the same number of entries for Variable 1 per date, this is not always the case.尽管此示例显示每个日期变量 1 的条目数相同,但情况并非总是如此。 I would like to be able to allow for any number of Variables to be collapsed onto the date.我希望能够将任意数量的变量折叠到日期上。

A complicating factor is that there are actually two more columns, Variable 2 and Variable 3, which are constant within a date but may vary between two dates.一个复杂的因素是实际上还有两个列,变量 2 和变量 3,它们在一个日期内是不变的,但在两个日期之间可能会有所不同。 I would like those to be collapsed onto the date as well, but I only need one column for each of these variables in the new dataframe.我也希望将它们折叠到日期上,但对于新数据框中的每个变量,我只需要一列。

I have tried using dcast and reshape functions, but these do not give the intended result.我曾尝试使用 dcast 和 reshape 函数,但这些并没有给出预期的结果。 Does anyone have suggestions?有没有人有建议?

this is best handled via tapply : something like这最好通过tapply处理:类似

tapply(data$`Variable 1`, data$date, c)

which creates a ragged array.这会创建一个参差不齐的数组。 The type of a ragged array matches your description of the expected result.参差不齐的数组类型与您对预期结果的描述相符。 Note that the original order might be lost, but you can restore a sensible order by (eg) ordering by date.请注意,原始订单可能会丢失,但您可以通过(例如)按日期排序来恢复合理的订单。

you CANNOT (sensibly) transform your result into a legal data frame, because your data type implies a variable number of columns per row.您不能(明智地)将结果转换为合法的数据框,因为您的数据类型意味着每行的列数可变。 Data frames are not intended for this use, and if you approach it this way, you will immediately run into more problems.数据框不适用于此用途,如果您以这种方式处理它,您将立即遇到更多问题。


alternatively, what's wrong with the original sparse matrix with lots of NAs?或者,具有大量 NA 的原始稀疏矩阵有什么问题? that's another valid representation of the data type you are discussing.这是您正在讨论的数据类型的另一种有效表示。


If you are just interested in the count of valid values, just do this:如果您只对有效值的计数感兴趣,请执行以下操作:

aggregate(data=data,`Variable 1`~Date,length)

For the mtcars dataset, this happens:对于 mtcars 数据集,会发生这种情况:

aggregate(data=mtcars,wt~cyl,length)
  cyl wt
1   4 11
2   6  7
3   8 14

Note that wt is just a count of values, it doesn't care about the type or value of wt, just how many (length) there are.请注意, wt 只是值的计数,它不关心 wt 的类型或值,只关心有多少(长度)。

This solution takes a dataframe where column 1 contains date values which are repeated for several rows, and where column 2 contains a value for each row.此解决方案采用一个数据框,其中第 1 列包含重复多行的日期值,第 2 列包含每行的值。 The goal is to have a new dataframe where column 1 contains no repeat dates, and each row (date) contains all of the values listed in column 2, essentially condensing dates and transposing column 2 into the date.目标是拥有一个新的数据框,其中第 1 列不包含重复日期,每行(日期)包含第 2 列中列出的所有值,本质上是压缩日期并将第 2 列转换为日期。 To do this with reshape, first, each value for the same date needs a number.要通过 reshape 做到这一点,首先,同一日期的每个值都需要一个数字。

   require(data.table)
      dt <- data.table(dataframe)
      newdt<-dt[, number := 1:.N, by = Date] 
      data<-as.data.frame(newdt)

    data_wide <- reshape(newdt, direction="wide", idvar = "Date", timevar =   "number")

    data_wide

    6/26/2015   209.3     230.2     80.4     s2
    6/27/2015   209.1     227.2     239.2    s2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM