根据B列中的日期和R中C列中指定的不匹配值在A列中添加元素

Question

I run a hunting program and have a data frame with columns: Date, Species type, Effort, and several columns that represent number of species harvested in a particular hunting area on that date. 我运行了一个狩猎程序，并具有一个带有列的数据框：日期，物种类型，工作量，以及代表该日期在特定狩猎区中收获的物种数量的几列。 However, the "species type" column breaks apart males, females, and juveniles for the same species. 但是，“物种类型”列将同一物种的雄性，雌性和幼体分开。 I need to collapse the harvest numbers of the same species for each area, while retaining all other common information. 我需要在保留所有其他公共信息的同时，对每个地区的相同物种的收成数量进行分类。 Here is an example of my df: 这是我的df的示例：

Date        Species       Area.1.Harvest  Area.2.Harvest   Effort
2016-04-02  Wild Sheep-M        1              NA            30
2016-04-02  Wild Sheep-F        4              NA            30
2016-04-17  Feral Goat-M        NA             5             50
2016-04-17  Feral Goat-F        NA             3             50
2016-09-18  Wild Sheep-M        NA             6             60
2016-09-18  Wild Sheep-F        NA             1             60
2016-09-18  Wild Sheep-J        NA             1             60

Here is the result I am looking for: 这是我要寻找的结果：

Date        Species       Area.1.Harvest  Area.2.Harvest   Effort
2016-04-02  Wild Sheep          5              NA            30
2016-04-17  Feral Goat          NA             8             50
2016-09-18  Wild Sheep          NA             8             60

I have 6 different areas to do this for and 3 years worth of harvest data. 我有6个不同的区域可以执行此操作，并且有3年的收获数据。

Answer 1

You could also do this quite easily using the data.table library 您也可以使用data.table库轻松完成此操作

library(data.table)
df <- data.table(Date = as.Date(c(rep('2016-04-02',2), rep('2016-04-17',2), rep('2016-09-18',3))), Species = c('Wild Sheep-M', 'Wild Sheep-F', 'Feral Goat-M', 'Feral Goat-F', 'Wild Sheep-M', 'Wild Sheep-F','Wild Sheep-J'), Area.1.Harvest = c(1,4,NA,NA,NA,NA,NA), Area.2.Harvest = c(NA,NA,5,3,6,1,1), Effort = c(30, 30, 50, 50, 60, 60, 60))


df[,Species := substr(Species,1,nchar(Species)-2)][,.(Area.1.Harvest = sum(Area.1.Harvest, na.rm=TRUE), 
                                                        Area.2.Harvest = sum(Area.2.Harvest, na.rm=TRUE),
                                                        Effort = mean(Effort, na.rm=TRUE)), by=list(Date, Species)]

#         Date    Species Area.1.Harvest Area.2.Harvest Effort
#1: 2016-04-02 Wild Sheep              5              0     30
#2: 2016-04-17 Feral Goat              0              8     50
#3: 2016-09-18 Wild Sheep              0              8     60

Answer 2

Look at the library dplyr , where functions group_by() and summarise() are very helpful for the kind of aggregation you are looking for. 看一下dplyr库，其中的group_by()和summarise()函数对于您要查找的聚合非常有用。

Look at the library stringr , where functions like str_sub() help you to manage and transform strings (in this case, the column Species should character and not factor ). 查看库stringr ，其中str_sub()类的函数可帮助您管理和转换字符串（在这种情况下，Species列应该是character而不是factor ）。

library(dplyr)
library(stringr)

df %>% 
 mutate(
    Species = str_sub(Species, 1, nchar(Species) - 2)
  ) %>% 
  group_by(Date, Species) %>% 
  summarise(
    Area.1.Harvest = sum(Area.1.Harvest, na.rm = T),
    Area.2.Harvest = sum(Area.2.Harvest, na.rm = T),
    Effort         = mean(Effort, na.rm = T)
  )

Answer 3

You could do the following using only dplyr : 您可以仅使用dplyr执行以下操作：

library(dplyr)

df %>%
  group_by(Species = gsub("-.*", "", Species), Date) %>%
  mutate_at(vars(contains("Area")), function(x) sum(x, na.rm = any(!is.na(x))))  %>%
  mutate_at(vars(contains("Effort")), function(x) mean(x, na.rm = any(!is.na(x)))) %>%
  distinct()

This would work regardless of the number of Area or Effort variables you have (since you mentioned you have several and your example is just a partial representation). 无论您拥有Area或Effort变量的数量如何，它都将起作用（因为您提到了多个变量，并且示例只是部分表示）。

Output: 输出：

# A tibble: 3 x 5
# Groups:   Species, Date [3]
  Date       Species   Area.1.Harvest Area.2.Harvest Effort
  <chr>      <chr>              <int>          <int>  <dbl>
1 2016-04-02 WildSheep              5             NA     30
2 2016-04-17 FeralGoat             NA              8     50
3 2016-09-18 WildSheep             NA              8     60

A custom function is used for mean and sum , as the usual eg mean(x, na.rm = T) would return 0 instead of NA as specified in your desired output. 自定义函数用于mean和sum ，因为通常，例如mean(x, na.rm = T)将返回0，而不是所需输出中指定的NA 。

根据B列中的日期和R中C列中指定的不匹配值在A列中添加元素

问题描述

3 个解决方案

解决方案1
1 2019-02-05 18:50:35

解决方案2
0 2019-02-05 18:30:06

解决方案3
0 已采纳 2019-02-05 18:51:56

根据B列中的日期和R中C列中指定的不匹配值在A列中添加元素

问题描述

3 个解决方案

解决方案1 1 2019-02-05 18:50:35

解决方案2 0 2019-02-05 18:30:06

解决方案3 0 已采纳 2019-02-05 18:51:56

解决方案1
1 2019-02-05 18:50:35

解决方案2
0 2019-02-05 18:30:06

解决方案3
0 已采纳 2019-02-05 18:51:56