[英]Join data.frame with repeated lines in R
I have two dataframes.我有两个数据框。 One is quality with different dates throughout the year of collection points at station
st
.一是
st
站收集点全年不同日期的质量。 The other is the land use of each st
season for that year.另一个是当年每个
st
的土地利用情况。
Example
例子
library(dplyr)
图书馆(dplyr)
water_df=read.table(text="st Date OD pH DBO A 01/07/2005 8 6.3 3 A 02/06/2005 7 6.2 2.2 A 01/01/2005 7.3 6.5 3.1 A 03/05/2006 6 6.3 4 A 09/08/2006 6.8 7.1 1.1 A 12/12/2006 7.3 8.1 2.9 B 02/07/2005 6.8 5.4 2.6 B 03/06/2005 6.0 5.3 1.9 B 02/01/2005 6.2 5.5 2.6 B 04/05/2006 5.1 5.4 3.4 B 10/08/2006 5.8 6.0 0.9 B 13/12/2006 6.2 6.9 2.5 C 20/12/2006 6.5 7.2 2.6 C 27/12/2006 6.8 7.6 2.7 C 03/01/2007 7.2 8.0 2.9 C 10/01/2007 7.5 8.4 3.0 C 17/01/2007 7.9 8.8 3.1 C 24/01/2007 8.3 9.2 3.3 C 31/01/2007 8.7 9.7 3.5 C 07/02/2007 9.2 10.2 3.6", sep="", header=TRUE)%>%as.data.frame() land_df=read.table(text = "st year Veg Water Soil Crop Grass A 2005 100 200 80 130 70 B 2006 98 180 84 132 86 C 2007 93 175 79 127 106", sep="", header = TRUE)%>%as.data.frame()
I would like to add to the quality data.frame the land use values of that station even if for the same station st
is repeated.我想在质量数据中添加该站的土地使用价值,即使对于同一站
st
重复。
I tried some things but it didn't work我尝试了一些东西,但没有奏效
#I tryed cbind
join_df<-cbind(water_df, land_df)
#I tryed
library(purrr)
joind_df2<-purrr::reduce(water_df, land_df)
This operation it commonly called a "join" or "merge".这种操作通常称为“加入”或“合并”。 You need columns to join on, which means we need to extract the year from your date, and then it's a
left_join
command.您需要加入列,这意味着我们需要从您的日期中提取年份,然后它是一个
left_join
命令。 See this FAQ for more information about joining data in R .有关在 R 中加入数据的更多信息,请参阅此常见问题解答。
library(dplyr)
library(lubridate)
water_df %>%
mutate(year = year(dmy(Date))) %>%
left_join(land_df, by = c("st", "year"))
# st Date OD pH DBO year Veg Water Soil Crop Grass
# 1 A 01/07/2005 8.0 6.3 3.0 2005 100 200 80 130 70
# 2 A 02/06/2005 7.0 6.2 2.2 2005 100 200 80 130 70
# 3 A 01/01/2005 7.3 6.5 3.1 2005 100 200 80 130 70
# 4 A 03/05/2006 6.0 6.3 4.0 2006 NA NA NA NA NA
# 5 A 09/08/2006 6.8 7.1 1.1 2006 NA NA NA NA NA
# 6 A 12/12/2006 7.3 8.1 2.9 2006 NA NA NA NA NA
# 7 B 02/07/2005 6.8 5.4 2.6 2005 NA NA NA NA NA
# 8 B 03/06/2005 6.0 5.3 1.9 2005 NA NA NA NA NA
# 9 B 02/01/2005 6.2 5.5 2.6 2005 NA NA NA NA NA
# 10 B 04/05/2006 5.1 5.4 3.4 2006 98 180 84 132 86
# 11 B 10/08/2006 5.8 6.0 0.9 2006 98 180 84 132 86
# 12 B 13/12/2006 6.2 6.9 2.5 2006 98 180 84 132 86
# 13 C 20/12/2006 6.5 7.2 2.6 2006 NA NA NA NA NA
# 14 C 27/12/2006 6.8 7.6 2.7 2006 NA NA NA NA NA
# 15 C 03/01/2007 7.2 8.0 2.9 2007 93 175 79 127 106
# 16 C 10/01/2007 7.5 8.4 3.0 2007 93 175 79 127 106
# 17 C 17/01/2007 7.9 8.8 3.1 2007 93 175 79 127 106
# 18 C 24/01/2007 8.3 9.2 3.3 2007 93 175 79 127 106
# 19 C 31/01/2007 8.7 9.7 3.5 2007 93 175 79 127 106
# 20 C 07/02/2007 9.2 10.2 3.6 2007 93 175 79 127 106
There's some missing values where your land_df
didn't have observations for a particular station in a particular year.有一些缺失值,您的
land_df
在特定年份没有对特定站点的观测。 Have a look at ?tidyr::fill
and this FAQ if you want to fill those in with, eg, the previous observation.看看
?tidyr::fill
和这个常见问题解答,如果你想用之前的观察来填写这些内容。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.