简体   繁体   English

如何创建函数然后将其应用于小标题的某些列?

[英]How do I creathe then apply function over certain columns of a tibble?

I am trying to practice R and learn more in general. 我正在尝试练习R,并全面了解更多信息。 I would like to make a ratio of x crime per 100,000 people. 我想将犯罪率定为每10万人x。 The following is the head of my data. 以下是我的数据的头。 I decided to only use the 5 largest cities. 我决定只使用5个最大的城市。

# A tibble: 6 x 13
City       Popula~ `Viol~ `Mur~ `Rap~ `Rap~ Robbe~ `Aggr~ `Prop~ Burgl~ `Larc~ `Moto~ Arson
 <chr>        <dbl>  <dbl> <dbl> <dbl> <lgl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>
1 Abingdon      8186  10.0   0     3.00 NA      1.00   6.00  233    20.0  198    15.0   4.00
2 Alexandria  148519 258     5.00 21.0  NA    118    114    2967   249    2427   291    13.0 
3 Altavista     3486   8.00  0     0    NA      2.00   6.00   56.0   4.00   52.0   0     0   
4 Amherst       2223   2.00  0     2.00 NA      0      0      27.0   6.00   19.0   2.00  0   
5 Appalachia    1728  12.0   0     2.00 NA      2.00   8.00   77.0  25.0    51.0   1.00  0   
6 Ashland       7310  26.0   0     1.00 NA      8.00  17.0   246    14.0   221    11.0   1.00

The following code is my attempt. 以下代码是我的尝试。

virginia_crime %>%
 filter(Population > 180000) %>%
 group_by(City) %>%
 summarise(ratio_violent = `Violent
 crime`/(Population/100000),
 ratio_murder = `Murder and
 nonnegligent
 manslaughter`/(Population/100000))

The output is: 输出为:

# A tibble: 5 x 3
City           ratio_violent ratio_murder
<chr>                  <dbl>        <dbl>
1 Chesapeake               320         3.90
2 Newport News             439         8.28
3 Norfolk                  573        11.3 
4 Richmond                 624        17.4 
5 Virginia Beach           162         3.77

I realize that I should be able to make a function that essentially creates a rate. 我意识到我应该能够创建一个实质上可以创建费率的函数。 Something like... rate <- (crime columns/(Population/1000). Am I even close in my idea, or should I be using one of the apply functions (sapply(summarise()))? I feel this task could be automated somehow, I just cannot figure it out. Would appreciate some insight 比率...- <(犯罪列/(人口/ 1000)。我什至不知道我是应该使用其中一种应用功能(sapply(summarise()))吗?以某种方式实现自动化,我只是想不通。不胜感激

You can gather your columns (all besides city and population) first, which lets you operate on all of them at once: 您可以首先收集您的列(除城市和人口之外的所有列),从而可以一次处理所有列:

library(tidyr)

crime_rates <- virginia_crime %>%
  filter(Population > 180000) %>%
  gather(Crime, Number, -City, -Population) %>%
  mutate(Rate = Number / (Population / 100000))

This will end up with one row for each pair of city and crime, alongside the population, number, and rate. 最后,每一对城市和犯罪都会排成一排,并附带人口,数量和比率。

If you want to turn it back into a wide form, you can use spread (after removing the Number column): 如果要将其转换为宽格式,可以使用点差(删除“ Number列之后):

crime_rates %>%
  select(-Number) %>%
  spread(Crime, Rate)

It's worth noting that the gathered (tidied) version is still quite useful, for example if you want to find the cities with the highest rates of each crime (perhaps to use in a graph): 值得注意的是,收集(整理)的版本仍然非常有用,例如,如果您想找到每种犯罪发生率最高的城市(也许在图表中使用):

crime_rates %>%
  group_by(City) %>%
  top_n(1, Rate)

Here is an option with mutate_at . 这是mutate_at的选项。 In the OP's code, summarise is used, but it is to summarise an object with 'n' rows to a single row. 在OP的代码, summarise被使用,但它是总结的对象与“n”个行的单个行。 The ratio always will be not be a single row (based on the OP's code and mutate should be used in place of summarise ) 比率始终不会是一行(基于OP的代码,应使用mutate代替summarise

library(dplyr)
df1 %>% 
   filter(Population > 180000) %>% 
   mutate_at(3:13, funs(./Population/100000))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM