关于R [DPLYR包]中Group_by和Summarize函数的误解

Question

I had to plot the graph of Fatalities per year. 我不得不绘制每年死亡人数的图表。 So I took out the year from Date and then grouped by it and then I summarized so that I get Fatalities per year. 所以我从Date中取出了这一年，然后按照它进行分组，然后我进行了总结，以便我每年都能获得死亡。 But when I run then it it gives me Fatalities throughout the dataset. 但是当我运行时它会在整个数据集中给出我的死亡率。

I don't understand why? 我不明白为什么？ And Any other alternate to get Fatalities per year. 以及任何其他替代方案每年都会发生死亡事故。

In Dataset,Fatalities is given per incident and every year a lot of incidents happened. 在数据集中，每次事故都会发生死亡，每年都会发生很多事故。

crash_data=read.csv("https://raw.githubusercontent.com/gluque/analytics_task2/master/Airplane_Crashes_and_Fatalities_Since_1908.csv")
    > crash_data$Date <- as.Date(crash_data$Date, "%m/%d/%Y")
    > crash_data$Date <- format(crash_data$Date, '%Y')
    > cd<-subset(crash_data,select = c(Fatalities,Date))
    > ab<-group_by(cd,Date)
    > ef<-summarize(ab,Fatalities=sum(Fatalities,na.rm = TRUE))
    > ef
      Fatalities
    1     105479

Answer 1

> group_by(cd,Date) %>% summarize(Fatalities = sum(Fatalities, na.rm = TRUE))
#    # A tibble: 98 x 2
#       Date Fatalities
#      <chr>      <int>
#  1   1908          1
#  2   1912          5
#  3   1913         45
#  4   1915         40
#  5   1916        108
#  6   1917        124
#  7   1918         65
#  8   1919          5
#  9   1920         24
#  10  1921         68
# ... with 88 more rows

关于R [DPLYR包]中Group_by和Summarize函数的误解

问题描述

1 个解决方案

解决方案1
0 已采纳 2016-07-28 10:11:56

关于R [DPLYR包]中Group_by和Summarize函数的误解

问题描述

1 个解决方案

解决方案1 0 已采纳 2016-07-28 10:11:56

解决方案1
0 已采纳 2016-07-28 10:11:56