[英]Any way to reconstruct new data.frame by merging multiple data.frame in R?
I have multiple data.frame
where each has same weather stations' coordinate but contains different year's temperature observation. 我有多个data.frame
,其中每个都有相同的气象站坐标,但包含不同的一年温度观测值。 However, I intend to construct new data.frame where stations' coordinate will stay but respective annual temperature column will be added programmatically from original multiple data.frame. 但是,我打算构造新的data.frame,其中站的坐标将保留,但是将从原始多个data.frame中以编程方式添加相应的年度温度列。 Perhaps using dplyr
package could help, but I have some issue to concatenate Year
and Annual_Temp
column and construct new column programmatically. 也许使用dplyr
程序包可能会有所帮助,但是我遇到了一些问题,需要连接Year
和Annual_Temp
列并以编程方式构造新的列。 Because I have 35 data.frames where each has same ID
, long
, lat
, but Annual_Temp
are different from one to another. 因为我有35个data.frame,每个ID
具有相同的ID
, long
, lat
,但是Annual_Temp
彼此不同。 I need to construct clean tabular data by merging data.frame. 我需要通过合并data.frame构造干净的表格数据。 How can I make this happen in R? 如何在R中实现这一点? Any way to get this done by using dplyr
? 任何使用dplyr
完成此操作的方法吗? Any idea? 任何想法?
For example, here is head of first three data.frame: 例如,这是前三个data.frame的头部:
> multiple_DF
$air_temp.1980
Year ID long lat Annual_Temp
34090 1980 6.25_51.75 6.25 51.75 10.709091
34091 1980 6.25_51.25 6.25 51.25 10.581818
34092 1980 6.25_50.75 6.25 50.75 9.500000
34224 1980 6.75_51.75 6.75 51.75 10.354545
34225 1980 6.75_51.25 6.75 51.25 10.636364
34226 1980 6.75_50.75 6.75 50.75 9.872727
$air_temp.1981
Year ID long lat Annual_Temp
119884 1981 6.25_51.75 6.25 51.75 10.727273
119885 1981 6.25_51.25 6.25 51.25 10.563636
119886 1981 6.25_50.75 6.25 50.75 9.654545
120018 1981 6.75_51.75 6.75 51.75 10.409091
120019 1981 6.75_51.25 6.75 51.25 10.654545
120020 1981 6.75_50.75 6.75 50.75 9.954545
$air_temp.1982
Year ID long lat Annual_Temp
205678 1982 6.25_51.75 6.25 51.75 11.80909
205679 1982 6.25_51.25 6.25 51.25 11.58182
205680 1982 6.25_50.75 6.25 50.75 10.61818
205812 1982 6.75_51.75 6.75 51.75 11.44545
205813 1982 6.75_51.25 6.75 51.25 11.73636
205814 1982 6.75_50.75 6.75 50.75 10.85455
Desired output (UPDATE) : 所需的输出(更新) :
I want to produce new data.frame where Annual_Temp
will be added as the new column where Annual_Temp
and Year
must be concatenated. 我想产生新的data.frame,其中将Annual_Temp
添加为必须将Annual_Temp
和Year
串联在一起的新列。 Here is the desired data.frame that I want to have: 这是我想要的所需data.frame:
ID long lat Ann_temp_1980 Ann_temp_1981 Ann_temp_1982
1 6.25_51.75 6.25 51.75 10.709091 10.727273 11.80909
2 6.25_51.25 6.25 51.25 10.581818 10.563636 11.58182
3 6.25_50.75 6.25 50.75 9.500000 9.654545 10.61818
4 6.75_51.75 6.75 51.75 10.354545 10.409091 11.44545
5 6.75_51.25 6.75 51.25 10.636364 10.654545 11.73636
6 6.75_50.75 6.75 50.75 9.872727 9.954545 10.85455
How can I make this happen programmatically in R? 如何在R中以编程方式实现此目的? Any idea? 任何想法?
To repro example data: 要复制示例数据:
multiple_DF = structure(list(air_temp.1980 = structure(list(Year = c(1980L,
1980L, 1980L, 1980L, 1980L, 1980L), ID = c("6.25_51.75", "6.25_51.25",
"6.25_50.75", "6.75_51.75", "6.75_51.25", "6.75_50.75"), long = c(6.25,
6.25, 6.25, 6.75, 6.75, 6.75), lat = c(51.75, 51.25, 50.75, 51.75,
51.25, 50.75), Annual_Temp = c(10.709091, 10.581818, 9.5, 10.354545,
10.636364, 9.872727)), .Names = c("Year", "ID", "long", "lat",
"Annual_Temp"), row.names = c(NA, -6L), class = "data.frame"),
air_temp.1981 = structure(list(Year = c(1981L, 1981L, 1981L,
1981L, 1981L, 1981L), ID = c("6.25_51.75", "6.25_51.25",
"6.25_50.75", "6.75_51.75", "6.75_51.25", "6.75_50.75"),
long = c(6.25, 6.25, 6.25, 6.75, 6.75, 6.75), lat = c(51.75,
51.25, 50.75, 51.75, 51.25, 50.75), Annual_Temp = c(10.727273,
10.563636, 9.654545, 10.409091, 10.654545, 9.954545)), .Names = c("Year",
"ID", "long", "lat", "Annual_Temp"), row.names = c(NA, -6L
), class = "data.frame"), air_temp.1982 = structure(list(
Year = c(1982L, 1982L, 1982L, 1982L, 1982L, 1982L), ID = c("6.25_51.75",
"6.25_51.25", "6.25_50.75", "6.75_51.75", "6.75_51.25",
"6.75_50.75"), long = c(6.25, 6.25, 6.25, 6.75, 6.75,
6.75), lat = c(51.75, 51.25, 50.75, 51.75, 51.25, 50.75
), Annual_Temp = c(11.80909, 11.58182, 10.61818, 11.44545,
11.73636, 10.85455)), .Names = c("Year", "ID", "long",
"lat", "Annual_Temp"), row.names = c(NA, -6L), class = "data.frame")), .Names = c("air_temp.1980",
"air_temp.1981", "air_temp.1982"))
First, combining the tables in long form: 首先,以长格式组合表格:
library(data.table)
L = lapply(multiple_DF, data.table)
bigDT = rbindlist(L, id="src")
src Year ID long lat Annual_Temp
1: air_temp.1980 1980 6.25_51.75 6.25 51.75 10.709091
2: air_temp.1980 1980 6.25_51.25 6.25 51.25 10.581818
3: air_temp.1980 1980 6.25_50.75 6.25 50.75 9.500000
4: air_temp.1980 1980 6.75_51.75 6.75 51.75 10.354545
5: air_temp.1980 1980 6.75_51.25 6.75 51.25 10.636364
6: air_temp.1980 1980 6.75_50.75 6.75 50.75 9.872727
7: air_temp.1981 1981 6.25_51.75 6.25 51.75 10.727273
8: air_temp.1981 1981 6.25_51.25 6.25 51.25 10.563636
9: air_temp.1981 1981 6.25_50.75 6.25 50.75 9.654545
10: air_temp.1981 1981 6.75_51.75 6.75 51.75 10.409091
11: air_temp.1981 1981 6.75_51.25 6.75 51.25 10.654545
12: air_temp.1981 1981 6.75_50.75 6.75 50.75 9.954545
13: air_temp.1982 1982 6.25_51.75 6.25 51.75 11.809090
14: air_temp.1982 1982 6.25_51.25 6.25 51.25 11.581820
15: air_temp.1982 1982 6.25_50.75 6.25 50.75 10.618180
16: air_temp.1982 1982 6.75_51.75 6.75 51.75 11.445450
17: air_temp.1982 1982 6.75_51.25 6.75 51.25 11.736360
18: air_temp.1982 1982 6.75_50.75 6.75 50.75 10.854550
Then somewhat "normalizing" the data into multiple tables: 然后将数据“标准化”到多个表中:
ID_attr = unique(bigDT[, c("ID", "lat", "long")])
ID lat long
1: 6.25_51.75 51.75 6.25
2: 6.25_51.25 51.25 6.25
3: 6.25_50.75 50.75 6.25
4: 6.75_51.75 51.75 6.75
5: 6.75_51.25 51.25 6.75
6: 6.75_50.75 50.75 6.75
meas_data = bigDT[, c("Year", "ID", "Annual_Temp")]
Year ID Annual_Temp
1: 1980 6.25_51.75 10.709091
2: 1980 6.25_51.25 10.581818
3: 1980 6.25_50.75 9.500000
4: 1980 6.75_51.75 10.354545
5: 1980 6.75_51.25 10.636364
6: 1980 6.75_50.75 9.872727
7: 1981 6.25_51.75 10.727273
8: 1981 6.25_51.25 10.563636
9: 1981 6.25_50.75 9.654545
10: 1981 6.75_51.75 10.409091
11: 1981 6.75_51.25 10.654545
12: 1981 6.75_50.75 9.954545
13: 1982 6.25_51.75 11.809090
14: 1982 6.25_51.25 11.581820
15: 1982 6.25_50.75 10.618180
16: 1982 6.75_51.75 11.445450
17: 1982 6.75_51.25 11.736360
18: 1982 6.75_50.75 10.854550
I think this format will be easier to work with than the wide format the OP requested (where the year is embedded in the string column name). 我认为,这种格式将比OP要求的宽格式(在字符串列名称中嵌入年份)更容易使用。 Hadley Wickham's tidy data paper may be a useful reference. 哈德利·威克姆(Hadley Wickham) 整洁的数据文件可能是有用的参考。
To do this in dplyr, use bind_rows
instead of rbindlist
; 要在dplyr中执行此操作,请使用bind_rows
而不是rbindlist
; or just do.call(rbind, L)
in base R. 或者只是在基数R中执行do.call(rbind, L)
。
As Frank points out, it would be easier with reproducible data, but I think the following will work: 正如Frank指出的那样,使用可重现的数据会更容易,但是我认为以下方法会起作用:
library(tidyverse)
DF<-do.call("rbind", multiple_DF)
DF$Year<-paste0("Ann_temp_",DF$Year)
DF_final<-spread(DF,Year,Annual_Temp)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.