How to use summarize and group by to create a summary table in r?

Question

So I have the following data set (this is a small sample/example of what it looks like, with the original being 7k rows and 30 columns over 7 decades):

Year,Location,Population Total, Median Age, Household Total
2000, Adak, 220, 45, 67
2000, Akiachak, 567, NA, 98
2000, Rainfall, 2, NA, 11
1990, Adak, NA, 33, 56
1990, Akiachak, 456, NA, 446
1990, Tioga, 446, NA, NA

I want to create a summary table that indicates how many years of data is available by location for each variable. So something like this would work (for the small example from before):

Location,Population Total, Median Age, Household Total
Adak,1,2,2
Akiachak,2,0,2
Rainfall,1,0,1
Tioga,1,0,0

I'm new to R and haven't used these two commands together so I'm unsure of the syntax. Any help would be wonderful or alternatives.

Answer 1

You can do something like this:

x %>%
  group_by(Location) %>%
  summarise(count_years = n(), 
            count_pop_total = sum(!is.na(Population_Total)),
            count_median_age = sum(!is.na(Median_Age)),
            count_house_total = sum(!is.na(Household_Total)))

where you can replace the mean with whatever operation you want to perform. You should take a look at the dplyr vignette for more general solutions.

Answer 2

A solution with summarize_all from dplyr :

library(dplyr)
df %>%
  group_by(Location) %>%
  summarize_all(funs(sum(!is.na(.)))) %>%
  select(-Year)

Or you can use summarize_at :

df %>%
  group_by(Location) %>%
  summarize_at(vars(-Year), funs(sum(!is.na(.))))

Result:

# A tibble: 4 x 4
   Location PopulationTotal MedianAge HouseholdTotal
      <chr>           <int>     <int>          <int>
1      Adak               1         2              2
2  Akiachak               2         0              2
3  Rainfall               1         0              1
4     Tioga               1         0              0

Data:

df = read.table(text = "Year,Location,PopulationTotal, MedianAge, HouseholdTotal
                2000, Adak, 220, 45, 67
                2000, Akiachak, 567, NA, 98
                2000, Rainfall, 2, NA, 11
                1990, Adak, NA, 33, 56
                1990, Akiachak, 456, NA, 446
                1990, Tioga, 446, NA, NA", header = TRUE, sep = ",", stringsAsFactors = FALSE)

library(dplyr)    
df = df %>%
  mutate_at(vars(PopulationTotal:HouseholdTotal), as.numeric)

How to use summarize and group by to create a summary table in r?

Question

2 answers

solution1
1 2017-10-04 17:56:58

solution2
1 ACCPTED 2017-10-04 17:58:53

How to use summarize and group by to create a summary table in r?

Question

2 answers

solution1 1 2017-10-04 17:56:58

solution2 1 ACCPTED 2017-10-04 17:58:53

solution1
1 2017-10-04 17:56:58

solution2
1 ACCPTED 2017-10-04 17:58:53