简体   繁体   English

在 R 中加入带有重复行的 data.frame

[英]Join data.frame with repeated lines in R

I have two dataframes.我有两个数据框。 One is quality with different dates throughout the year of collection points at station st .一是st站收集点全年不同日期的质量。 The other is the land use of each st season for that year.另一个是当年每个st的土地利用情况。

Example例子

library(dplyr)图书馆(dplyr)

 water_df=read.table(text="st Date OD pH DBO A 01/07/2005 8 6.3 3 A 02/06/2005 7 6.2 2.2 A 01/01/2005 7.3 6.5 3.1 A 03/05/2006 6 6.3 4 A 09/08/2006 6.8 7.1 1.1 A 12/12/2006 7.3 8.1 2.9 B 02/07/2005 6.8 5.4 2.6 B 03/06/2005 6.0 5.3 1.9 B 02/01/2005 6.2 5.5 2.6 B 04/05/2006 5.1 5.4 3.4 B 10/08/2006 5.8 6.0 0.9 B 13/12/2006 6.2 6.9 2.5 C 20/12/2006 6.5 7.2 2.6 C 27/12/2006 6.8 7.6 2.7 C 03/01/2007 7.2 8.0 2.9 C 10/01/2007 7.5 8.4 3.0 C 17/01/2007 7.9 8.8 3.1 C 24/01/2007 8.3 9.2 3.3 C 31/01/2007 8.7 9.7 3.5 C 07/02/2007 9.2 10.2 3.6", sep="", header=TRUE)%>%as.data.frame() land_df=read.table(text = "st year Veg Water Soil Crop Grass A 2005 100 200 80 130 70 B 2006 98 180 84 132 86 C 2007 93 175 79 127 106", sep="", header = TRUE)%>%as.data.frame()

I would like to add to the quality data.frame the land use values of that station even if for the same station st is repeated.我想在质量数据中添加该站的土地使用价值,即使对于同一站st重复。

在此处输入图像描述

I tried some things but it didn't work我尝试了一些东西,但没有奏效

#I tryed cbind
join_df<-cbind(water_df, land_df)

#I tryed
library(purrr)
joind_df2<-purrr::reduce(water_df, land_df)

This operation it commonly called a "join" or "merge".这种操作通常称为“加入”或“合并”。 You need columns to join on, which means we need to extract the year from your date, and then it's a left_join command.您需要加入列,这意味着我们需要从您的日期中提取年份,然后它是一个left_join命令。 See this FAQ for more information about joining data in R .有关在 R 中加入数据的更多信息,请参阅此常见问题解答

library(dplyr)
library(lubridate)

water_df %>%
  mutate(year = year(dmy(Date))) %>%
  left_join(land_df, by = c("st", "year"))
#    st       Date  OD   pH DBO year Veg Water Soil Crop Grass
# 1   A 01/07/2005 8.0  6.3 3.0 2005 100   200   80  130    70
# 2   A 02/06/2005 7.0  6.2 2.2 2005 100   200   80  130    70
# 3   A 01/01/2005 7.3  6.5 3.1 2005 100   200   80  130    70
# 4   A 03/05/2006 6.0  6.3 4.0 2006  NA    NA   NA   NA    NA
# 5   A 09/08/2006 6.8  7.1 1.1 2006  NA    NA   NA   NA    NA
# 6   A 12/12/2006 7.3  8.1 2.9 2006  NA    NA   NA   NA    NA
# 7   B 02/07/2005 6.8  5.4 2.6 2005  NA    NA   NA   NA    NA
# 8   B 03/06/2005 6.0  5.3 1.9 2005  NA    NA   NA   NA    NA
# 9   B 02/01/2005 6.2  5.5 2.6 2005  NA    NA   NA   NA    NA
# 10  B 04/05/2006 5.1  5.4 3.4 2006  98   180   84  132    86
# 11  B 10/08/2006 5.8  6.0 0.9 2006  98   180   84  132    86
# 12  B 13/12/2006 6.2  6.9 2.5 2006  98   180   84  132    86
# 13  C 20/12/2006 6.5  7.2 2.6 2006  NA    NA   NA   NA    NA
# 14  C 27/12/2006 6.8  7.6 2.7 2006  NA    NA   NA   NA    NA
# 15  C 03/01/2007 7.2  8.0 2.9 2007  93   175   79  127   106
# 16  C 10/01/2007 7.5  8.4 3.0 2007  93   175   79  127   106
# 17  C 17/01/2007 7.9  8.8 3.1 2007  93   175   79  127   106
# 18  C 24/01/2007 8.3  9.2 3.3 2007  93   175   79  127   106
# 19  C 31/01/2007 8.7  9.7 3.5 2007  93   175   79  127   106
# 20  C 07/02/2007 9.2 10.2 3.6 2007  93   175   79  127   106

There's some missing values where your land_df didn't have observations for a particular station in a particular year.有一些缺失值,您的land_df在特定年份没有对特定站点的观测。 Have a look at ?tidyr::fill and this FAQ if you want to fill those in with, eg, the previous observation.看看?tidyr::fill这个常见问题解答,如果你想用之前的观察来填写这些内容。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM