![](/img/trans.png)
[英]R - Function to create a data.frame containing manipulated data from another data.frame
[英]R: create new data.frame with time series from another data.frame
我有一个带有结构的data.frame:
> str(prv)
'data.frame': 13184 obs. of 7 variables:
$ date : Factor w/ 103 levels "2020-01-01",..: 1 1 1 1 1 1 1 1 1 1 ...
$ code : int 13 13 13 13 13 17 17 17 21 21 ...
$ region : Factor w/ 21 levels "loc1","loc2",..: 1 1 1 1 1 2 2 2 12 12 ...
$ codprv : int 69 66 68 67 979 77 76 980 21 981 ...
$ denprv : Factor w/ 108 levels "city1","city2",..: 25 44 70 93 42 55 75 42 16 42 ...
$ shortprv : Factor w/ 107 levels "","C1","C2","C3",..: 24 7 65 92 1 58 74 1 20 1 ...
$ sum : int 0 0 0 0 0 0 0 0 0 0 ...
和 data.frame 是这样的:
date code region codprv denprv shortprv sum
2020-01-01 13 loc1 69 city1 C1 0
2020-01-01 13 loc1 66 city2 C2 0
2020-01-01 14 loc2 70 city3 C3 0
...
2020-01-02 13 loc1 68 city1 C3 0
2020-01-02 13 loc1 66 city2 C2 5
2020-01-02 14 loc2 70 city3 C3 1
...
2020-01-03 13 loc1 68 city1 C3 15
2020-01-03 13 loc1 66 city2 C2 7
2020-01-03 14 loc2 70 city3 C3 5
...
等等...
我需要得到:
date city1 city2 city3 ... cityN
2020-01-01 0 0 0 ... n1
2020-01-02 0 5 1 ... n2
2020-01-03 15 7 5 ... n3
我最近学会了使用 R,我只用它来执行统计分析,而不是时间序列分析。
手动操作并不难,但我想知道一种正确的转换方式(并学习如何(重新)独立使用它)。
对不起我的语言。
感谢您的关注。
您需要来自tidyr
pivot_wider
df <- data.frame(date = rep(seq(as.Date("2020/1/1"), by = "day", length.out = 4), each = 3),
denprv = rep(c("city1", "city2", "city3"), 4),
sum = 1:12)
library(tidyr)
pivot_wider(df, names_from = denprv, values_from = sum)
# A tibble: 4 x 4
date city1 city2 city3
<date> <int> <int> <int>
1 2020-01-01 1 2 3
2 2020-01-02 4 5 6
3 2020-01-03 7 8 9
4 2020-01-04 10 11 12
您的数据是长格式的,而您想要宽格式。 查看有关整洁数据的信息。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.