[英]Subset a dataframe based on reciprocity conditions
I have the following data frame structure我有以下数据框结构
date![]() |
latitude![]() |
---|---|
1951-03-22 ![]() |
66.08106 ![]() |
1951-03-22 ![]() |
59.59117 ![]() |
1951-04-08 ![]() |
59.59117 ![]() |
1952-10-20 ![]() |
55.41972 ![]() |
1960-08-12 ![]() |
66.05653 ![]() |
1960-09-10 ![]() |
66.08106 ![]() |
What I would like to do is: select the rows for all unique latitude and if there are 2 (or more) exact same latitudes, I want to keep only the one that has the earliest date but for each year .我想做的是:select 所有唯一纬度的行,如果有 2 个(或更多)完全相同的纬度,我只想保留最早日期但每年的那个。
So, for my previous example, it would give the following subset, without only the 3rd row:因此,对于我之前的示例,它将给出以下子集,而不仅仅是第 3 行:
date![]() |
latitude![]() |
---|---|
1951-03-22 ![]() |
66.08106 ![]() |
1951-03-22 ![]() |
59.59117 ![]() |
1952-10-20 ![]() |
55.41972 ![]() |
1960-08-12 ![]() |
66.05653 ![]() |
1960-09-10 ![]() |
66.08106 ![]() |
Many thanks for the help.非常感谢您的帮助。
PS: maybe it is important to precise that class(df$date) is "Date" and class(df$latitude) is "numeric". PS:也许准确地说 class(df$date) 是“Date”并且 class(df$latitude) 是“numeric”很重要。
Grouped by 'latitude' and year
extracted from 'date', use slice_max
to extract the row with max date, and then remove the 'year' column按'latitude'和从'date'中提取的
year
分组,使用slice_max
提取具有最大日期的行,然后删除'year'列
library(dplyr)
library(lubridate)
df1 %>%
# grouped by latitude, and year extracted from Date class
# year is from lubridate
group_by(latitude, year = year(date)) %>%
# slice 1 row from each group, ordered by the 'date' column
slice_max(n = 1, order_by = date) %>%
# remove the grouping
ungroup %>%
# remove the year column
select(-year) %>%
arrange(date)
-output -输出
# A tibble: 5 × 2
date latitude
<chr> <dbl>
1 1951-03-22 66.1
2 1951-04-08 59.6
3 1952-10-20 55.4
4 1960-08-12 66.1
5 1960-09-10 66.1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.