简体   繁体   中英

How do I creathe then apply function over certain columns of a tibble?

I am trying to practice R and learn more in general. I would like to make a ratio of x crime per 100,000 people. The following is the head of my data. I decided to only use the 5 largest cities.

# A tibble: 6 x 13
City       Popula~ `Viol~ `Mur~ `Rap~ `Rap~ Robbe~ `Aggr~ `Prop~ Burgl~ `Larc~ `Moto~ Arson
 <chr>        <dbl>  <dbl> <dbl> <dbl> <lgl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>
1 Abingdon      8186  10.0   0     3.00 NA      1.00   6.00  233    20.0  198    15.0   4.00
2 Alexandria  148519 258     5.00 21.0  NA    118    114    2967   249    2427   291    13.0 
3 Altavista     3486   8.00  0     0    NA      2.00   6.00   56.0   4.00   52.0   0     0   
4 Amherst       2223   2.00  0     2.00 NA      0      0      27.0   6.00   19.0   2.00  0   
5 Appalachia    1728  12.0   0     2.00 NA      2.00   8.00   77.0  25.0    51.0   1.00  0   
6 Ashland       7310  26.0   0     1.00 NA      8.00  17.0   246    14.0   221    11.0   1.00

The following code is my attempt.

virginia_crime %>%
 filter(Population > 180000) %>%
 group_by(City) %>%
 summarise(ratio_violent = `Violent
 crime`/(Population/100000),
 ratio_murder = `Murder and
 nonnegligent
 manslaughter`/(Population/100000))

The output is:

# A tibble: 5 x 3
City           ratio_violent ratio_murder
<chr>                  <dbl>        <dbl>
1 Chesapeake               320         3.90
2 Newport News             439         8.28
3 Norfolk                  573        11.3 
4 Richmond                 624        17.4 
5 Virginia Beach           162         3.77

I realize that I should be able to make a function that essentially creates a rate. Something like... rate <- (crime columns/(Population/1000). Am I even close in my idea, or should I be using one of the apply functions (sapply(summarise()))? I feel this task could be automated somehow, I just cannot figure it out. Would appreciate some insight

You can gather your columns (all besides city and population) first, which lets you operate on all of them at once:

library(tidyr)

crime_rates <- virginia_crime %>%
  filter(Population > 180000) %>%
  gather(Crime, Number, -City, -Population) %>%
  mutate(Rate = Number / (Population / 100000))

This will end up with one row for each pair of city and crime, alongside the population, number, and rate.

If you want to turn it back into a wide form, you can use spread (after removing the Number column):

crime_rates %>%
  select(-Number) %>%
  spread(Crime, Rate)

It's worth noting that the gathered (tidied) version is still quite useful, for example if you want to find the cities with the highest rates of each crime (perhaps to use in a graph):

crime_rates %>%
  group_by(City) %>%
  top_n(1, Rate)

Here is an option with mutate_at . In the OP's code, summarise is used, but it is to summarise an object with 'n' rows to a single row. The ratio always will be not be a single row (based on the OP's code and mutate should be used in place of summarise )

library(dplyr)
df1 %>% 
   filter(Population > 180000) %>% 
   mutate_at(3:13, funs(./Population/100000))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM