简体   繁体   English

有什么办法可以通过在R中合并多个data.frame来重建新的data.frame?

[英]Any way to reconstruct new data.frame by merging multiple data.frame in R?

I have multiple data.frame where each has same weather stations' coordinate but contains different year's temperature observation. 我有多个data.frame ,其中每个都有相同的气象站坐标​​,但包含不同的一年温度观测值。 However, I intend to construct new data.frame where stations' coordinate will stay but respective annual temperature column will be added programmatically from original multiple data.frame. 但是,我打算构造新的data.frame,其中站的坐标将保留,但是将从原始多个data.frame中以编程方式添加相应的年度温度列。 Perhaps using dplyr package could help, but I have some issue to concatenate Year and Annual_Temp column and construct new column programmatically. 也许使用dplyr程序包可能会有所帮助,但是我遇到了一些问题,需要连接YearAnnual_Temp列并以编程方式构造新的列。 Because I have 35 data.frames where each has same ID , long , lat , but Annual_Temp are different from one to another. 因为我有35个data.frame,每个ID具有相同的IDlonglat ,但是Annual_Temp彼此不同。 I need to construct clean tabular data by merging data.frame. 我需要通过合并data.frame构造干净的表格数据。 How can I make this happen in R? 如何在R中实现这一点? Any way to get this done by using dplyr ? 任何使用dplyr完成此操作的方法吗? Any idea? 任何想法?

For example, here is head of first three data.frame: 例如,这是前三个data.frame的头部:

> multiple_DF

$air_temp.1980
      Year         ID long   lat Annual_Temp
34090 1980 6.25_51.75 6.25 51.75   10.709091
34091 1980 6.25_51.25 6.25 51.25   10.581818
34092 1980 6.25_50.75 6.25 50.75    9.500000
34224 1980 6.75_51.75 6.75 51.75   10.354545
34225 1980 6.75_51.25 6.75 51.25   10.636364
34226 1980 6.75_50.75 6.75 50.75    9.872727

$air_temp.1981
       Year         ID long   lat Annual_Temp
119884 1981 6.25_51.75 6.25 51.75   10.727273
119885 1981 6.25_51.25 6.25 51.25   10.563636
119886 1981 6.25_50.75 6.25 50.75    9.654545
120018 1981 6.75_51.75 6.75 51.75   10.409091
120019 1981 6.75_51.25 6.75 51.25   10.654545
120020 1981 6.75_50.75 6.75 50.75    9.954545

$air_temp.1982
       Year         ID long   lat Annual_Temp
205678 1982 6.25_51.75 6.25 51.75    11.80909
205679 1982 6.25_51.25 6.25 51.25    11.58182
205680 1982 6.25_50.75 6.25 50.75    10.61818
205812 1982 6.75_51.75 6.75 51.75    11.44545
205813 1982 6.75_51.25 6.75 51.25    11.73636
205814 1982 6.75_50.75 6.75 50.75    10.85455

Desired output (UPDATE) : 所需的输出(更新)

I want to produce new data.frame where Annual_Temp will be added as the new column where Annual_Temp and Year must be concatenated. 我想产生新的data.frame,其中将Annual_Temp添加为必须将Annual_TempYear串联在一起的新列。 Here is the desired data.frame that I want to have: 这是我想要的所需data.frame:

      ID long   lat Ann_temp_1980 Ann_temp_1981 Ann_temp_1982
1 6.25_51.75 6.25 51.75     10.709091     10.727273        11.80909
2 6.25_51.25 6.25 51.25     10.581818     10.563636        11.58182
3 6.25_50.75 6.25 50.75      9.500000      9.654545        10.61818
4 6.75_51.75 6.75 51.75     10.354545     10.409091        11.44545
5 6.75_51.25 6.75 51.25     10.636364     10.654545        11.73636
6 6.75_50.75 6.75 50.75      9.872727      9.954545        10.85455

How can I make this happen programmatically in R? 如何在R中以编程方式实现此目的? Any idea? 任何想法?

To repro example data: 要复制示例数据:

multiple_DF = structure(list(air_temp.1980 = structure(list(Year = c(1980L, 
1980L, 1980L, 1980L, 1980L, 1980L), ID = c("6.25_51.75", "6.25_51.25", 
"6.25_50.75", "6.75_51.75", "6.75_51.25", "6.75_50.75"), long = c(6.25, 
6.25, 6.25, 6.75, 6.75, 6.75), lat = c(51.75, 51.25, 50.75, 51.75, 
51.25, 50.75), Annual_Temp = c(10.709091, 10.581818, 9.5, 10.354545, 
10.636364, 9.872727)), .Names = c("Year", "ID", "long", "lat", 
"Annual_Temp"), row.names = c(NA, -6L), class = "data.frame"), 
    air_temp.1981 = structure(list(Year = c(1981L, 1981L, 1981L, 
    1981L, 1981L, 1981L), ID = c("6.25_51.75", "6.25_51.25", 
    "6.25_50.75", "6.75_51.75", "6.75_51.25", "6.75_50.75"), 
        long = c(6.25, 6.25, 6.25, 6.75, 6.75, 6.75), lat = c(51.75, 
        51.25, 50.75, 51.75, 51.25, 50.75), Annual_Temp = c(10.727273, 
        10.563636, 9.654545, 10.409091, 10.654545, 9.954545)), .Names = c("Year", 
    "ID", "long", "lat", "Annual_Temp"), row.names = c(NA, -6L
    ), class = "data.frame"), air_temp.1982 = structure(list(
        Year = c(1982L, 1982L, 1982L, 1982L, 1982L, 1982L), ID = c("6.25_51.75", 
        "6.25_51.25", "6.25_50.75", "6.75_51.75", "6.75_51.25", 
        "6.75_50.75"), long = c(6.25, 6.25, 6.25, 6.75, 6.75, 
        6.75), lat = c(51.75, 51.25, 50.75, 51.75, 51.25, 50.75
        ), Annual_Temp = c(11.80909, 11.58182, 10.61818, 11.44545, 
        11.73636, 10.85455)), .Names = c("Year", "ID", "long", 
    "lat", "Annual_Temp"), row.names = c(NA, -6L), class = "data.frame")), .Names = c("air_temp.1980", 
"air_temp.1981", "air_temp.1982"))

First, combining the tables in long form: 首先,以长格式组合表格:

library(data.table)
L = lapply(multiple_DF, data.table)

bigDT = rbindlist(L, id="src")

              src Year         ID long   lat Annual_Temp
 1: air_temp.1980 1980 6.25_51.75 6.25 51.75   10.709091
 2: air_temp.1980 1980 6.25_51.25 6.25 51.25   10.581818
 3: air_temp.1980 1980 6.25_50.75 6.25 50.75    9.500000
 4: air_temp.1980 1980 6.75_51.75 6.75 51.75   10.354545
 5: air_temp.1980 1980 6.75_51.25 6.75 51.25   10.636364
 6: air_temp.1980 1980 6.75_50.75 6.75 50.75    9.872727
 7: air_temp.1981 1981 6.25_51.75 6.25 51.75   10.727273
 8: air_temp.1981 1981 6.25_51.25 6.25 51.25   10.563636
 9: air_temp.1981 1981 6.25_50.75 6.25 50.75    9.654545
10: air_temp.1981 1981 6.75_51.75 6.75 51.75   10.409091
11: air_temp.1981 1981 6.75_51.25 6.75 51.25   10.654545
12: air_temp.1981 1981 6.75_50.75 6.75 50.75    9.954545
13: air_temp.1982 1982 6.25_51.75 6.25 51.75   11.809090
14: air_temp.1982 1982 6.25_51.25 6.25 51.25   11.581820
15: air_temp.1982 1982 6.25_50.75 6.25 50.75   10.618180
16: air_temp.1982 1982 6.75_51.75 6.75 51.75   11.445450
17: air_temp.1982 1982 6.75_51.25 6.75 51.25   11.736360
18: air_temp.1982 1982 6.75_50.75 6.75 50.75   10.854550

Then somewhat "normalizing" the data into multiple tables: 然后将数据“标准化”到多个表中:

ID_attr = unique(bigDT[, c("ID", "lat", "long")])

           ID   lat long
1: 6.25_51.75 51.75 6.25
2: 6.25_51.25 51.25 6.25
3: 6.25_50.75 50.75 6.25
4: 6.75_51.75 51.75 6.75
5: 6.75_51.25 51.25 6.75
6: 6.75_50.75 50.75 6.75

meas_data = bigDT[, c("Year", "ID", "Annual_Temp")]

    Year         ID Annual_Temp
 1: 1980 6.25_51.75   10.709091
 2: 1980 6.25_51.25   10.581818
 3: 1980 6.25_50.75    9.500000
 4: 1980 6.75_51.75   10.354545
 5: 1980 6.75_51.25   10.636364
 6: 1980 6.75_50.75    9.872727
 7: 1981 6.25_51.75   10.727273
 8: 1981 6.25_51.25   10.563636
 9: 1981 6.25_50.75    9.654545
10: 1981 6.75_51.75   10.409091
11: 1981 6.75_51.25   10.654545
12: 1981 6.75_50.75    9.954545
13: 1982 6.25_51.75   11.809090
14: 1982 6.25_51.25   11.581820
15: 1982 6.25_50.75   10.618180
16: 1982 6.75_51.75   11.445450
17: 1982 6.75_51.25   11.736360
18: 1982 6.75_50.75   10.854550

I think this format will be easier to work with than the wide format the OP requested (where the year is embedded in the string column name). 我认为,这种格式将比OP要求的宽格式(在字符串列名称中嵌入年份)更容易使用。 Hadley Wickham's tidy data paper may be a useful reference. 哈德利·威克姆(Hadley Wickham) 整洁的数据文件可能是有用的参考。

To do this in dplyr, use bind_rows instead of rbindlist ; 要在dplyr中执行此操作,请使用bind_rows而不是rbindlist or just do.call(rbind, L) in base R. 或者只是在基数R中执行do.call(rbind, L)

As Frank points out, it would be easier with reproducible data, but I think the following will work: 正如Frank指出的那样,使用可重现的数据会更容易,但是我认为以下方法会起作用:

library(tidyverse)
DF<-do.call("rbind", multiple_DF)
DF$Year<-paste0("Ann_temp_",DF$Year)
DF_final<-spread(DF,Year,Annual_Temp)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM