对数到data.frame中的行

Question

I have a matrix-like data frame with an additional column denoting time. 我有一个类似矩阵的数据框，附加一列表示时间。 It contains information on the number of enrolled students in a given school, from grade 5 (column A ) to grade 9 (column E ). 它包含有关特定学校注册学生人数的信息，从5年级（ A栏）到9年级（ E栏）。

  time    A    B    C    D    E
1   13 1842 1844 1689 1776 1716
2   14 1898 1785 1807 1617 1679
3   15 2065 1865 1748 1731 1590
4   16 2215 1994 1811 1708 1703
5   17 2174 2122 1903 1765 1699

I need to trace the size of the cohort over time, meaning that I need row-wise information on how many fifth graders from each starting year remained in the school from grades 6 through 9. For example, for the cohort that has begun fifth grade in 2013, I want information on how many remained in sixth grade in 2014, and so on. 我需要随着时间的推移追踪队列的大小，这意味着我需要有关从6到9年级开始的每个起始年份的五年级学生的行数信息。例如，对于已经开始五年级的队列在2013年，我想了解2014年有多少人留在六年级，等等。

Expected output 预期产出

This is what I would like to end up with: 这就是我想要的结果：

  start.time point.A point.B point.C point.D point.E
1         13    1842    1785    1748    1708    1699
2         14    1898    1865    1811    1765      NA
3         15    2065    1811    1765      NA      NA
4         16    2215    1765      NA      NA      NA
5         17    2174      NA      NA      NA      NA

I have looked at diag() from base.R , but I could only get the the data from the main diagonal. 我从base.R看过diag() ，但我只能从主对角线获取数据。 Ideally, I'd like to accomplish this using dplyr syntax and the pipe. 理想情况下，我想使用dplyr语法和管道完成此操作。

Data 数据

structure(list(time = 13:17, A = c(1842, 1898, 2065, 2215, 2174), B = c(1844, 1785, 1865, 1994, 2122), C = c(1689, 1807, 1748, 1811, 1903), D = c(1776, 1617, 1731, 1708, 1765), E = c(1716, 1679, 1590, 1703, 1699)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), row.names = c(NA, -5L), vars = "time", drop = TRUE, indices = list(
0L, 1L, 2L, 3L, 4L), group_sizes = c(1L, 1L, 1L, 1L, 1L), biggest_group_size = 1L, labels = structure(list(
time = 13:17), class = "data.frame", row.names = c(NA, -5L), vars = "time", drop = TRUE, .Names = "time"), .Names = c("time", "A", "B", "C", "D", "E"))

Answer 1

Convert the input DF except for the first column to a matrix mat . 将输入DF除第一列外转换为矩阵mat 。 Then since row(mat) - col(mat) is constant on diagonals split with respect to that creating a list of ts class series in L . 则由于row(mat) - col(mat)是对角线恒定split相对于该创建的列表ts类系列L 。 We used ts class since we can later cbind them even if they are of different lengths. 我们使用ts类，因为我们可以稍后cbind它们cbind ，即使它们的长度不同。 The diagonals for which row(mat) - col(mat) >= 0 are the only ones we want so pick off those, cbind them together and transpose the result. row(mat) - col(mat) >= 0的对角线是我们想要的唯一对象，所以选择它们， cbind它们组合在一起并转置结果。 Then replace all columns in DF except the first with that. 然后替换DF所有列，除了第一列。 No packages are used. 没有使用包裹。

mat <- as.matrix(DF[-1])
L <- lapply(split(mat, row(mat) - col(mat)), ts)
replace(DF, -1, t(do.call("cbind", L[as.numeric(names(L)) >= 0])))

giving: 赠送：

  time    A    B    C    D    E
1   13 1842 1785 1748 1708 1699
2   14 1898 1865 1811 1765   NA
3   15 2065 1994 1903   NA   NA
4   16 2215 2122   NA   NA   NA
5   17 2174   NA   NA   NA   NA

Answer 2

Since you mentioned dplyr in your question, you could use dplyr::lead to shift the values of columns B to E by 1, 2 etc. respectively, and then bind the result with columns time and A from your original data as follows 既然你在你的问题中提到了dplyr ，你可以使用dplyr::lead将B列的值分别移动到E ，等等，然后将结果与原始数据中的列time和A绑定，如下所示

library(tidyverse)
bind_cols(df[, 1:2], map2_df(.x = df[, c(3:ncol(df))],
                             .y = seq_along(df[, 3:ncol(df)]), 
                             .f = ~dplyr::lead(x = .x, n = .y)))
#  A tibble: 5 x 6
#  Groups:   time [5]
#   time     A     B     C     D     E
#  <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#1    13  1842  1785  1748  1708  1699
#2    14  1898  1865  1811  1765    NA
#3    15  2065  1994  1903    NA    NA
#4    16  2215  2122    NA    NA    NA
#5    17  2174    NA    NA    NA    NA

Note that your data is grouped by time the way you provided it. 请注意，您的数据按照您提供的方式按time分组。

Answer 3

With some grouping and arranging and row_number() , we can do this with dplyr and tidyr , and we don't lose values. 通过一些分组和排列以及row_number() ，我们可以使用dplyr和tidyr执行此tidyr ，并且我们不会丢失值。

Looks a bit messy, but here I create a 2-dimensional index where the second dimension is inverted. 看起来有点乱，但在这里我创建了一个二维索引，其中第二个维度被反转。 When these index positions are summed, we get a matching value for diagonal rows. 当这些索引位置相加时，我们得到对角行的匹配值。

data %>% 
  ungroup() %>% 
  mutate(row = row_number()) %>% 
  gather(class, stud, A:E) %>% 
  arrange(row, desc(class)) %>% 
  group_by(row) %>% 
  mutate(time_left = row_number()) %>% 
  ungroup() %>% 
  transmute(time, class, stud, start_year = time_left + row - 1) %>% 
  ggplot(aes(time, stud, color = factor(start_year))) +
  geom_line() +
  geom_point()

Answer 4

Replace the mirrored upper triangle of "d" with the values from the lower triangle. 将镜像的上三角形“d”替换为下三角形的值。

m <- as.matrix(d[-1])
d[-1] <- NA
d[-1][upper.tri(m, diag = TRUE)[ , ncol(m):1]] <- m[lower.tri(m, diag = TRUE)]

#   time    A    B    C    D    E
# 1   13 1842 1785 1748 1708 1699
# 2   14 1898 1865 1811 1765   NA
# 3   15 2065 1994 1903   NA   NA
# 4   16 2215 2122   NA   NA   NA
# 5   17 2174   NA   NA   NA   NA

对数到data.frame中的行

问题描述

4 个解决方案

解决方案1
5 2018-02-09 12:04:21

解决方案2
2 已采纳 2018-02-09 12:30:51

解决方案3
1 2018-02-09 13:24:41

解决方案4
0 2018-02-09 12:55:33

对数到data.frame中的行

问题描述

4 个解决方案

解决方案1 5 2018-02-09 12:04:21

解决方案2 2 已采纳 2018-02-09 12:30:51

解决方案3 1 2018-02-09 13:24:41

解决方案4 0 2018-02-09 12:55:33

解决方案1
5 2018-02-09 12:04:21

解决方案2
2 已采纳 2018-02-09 12:30:51

解决方案3
1 2018-02-09 13:24:41

解决方案4
0 2018-02-09 12:55:33