[英]How do I creathe then apply function over certain columns of a tibble?
I am trying to practice R and learn more in general. 我正在尝试练习R,并全面了解更多信息。 I would like to make a ratio of x crime per 100,000 people. 我想将犯罪率定为每10万人x。 The following is the head of my data. 以下是我的数据的头。 I decided to only use the 5 largest cities. 我决定只使用5个最大的城市。
# A tibble: 6 x 13
City Popula~ `Viol~ `Mur~ `Rap~ `Rap~ Robbe~ `Aggr~ `Prop~ Burgl~ `Larc~ `Moto~ Arson
<chr> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Abingdon 8186 10.0 0 3.00 NA 1.00 6.00 233 20.0 198 15.0 4.00
2 Alexandria 148519 258 5.00 21.0 NA 118 114 2967 249 2427 291 13.0
3 Altavista 3486 8.00 0 0 NA 2.00 6.00 56.0 4.00 52.0 0 0
4 Amherst 2223 2.00 0 2.00 NA 0 0 27.0 6.00 19.0 2.00 0
5 Appalachia 1728 12.0 0 2.00 NA 2.00 8.00 77.0 25.0 51.0 1.00 0
6 Ashland 7310 26.0 0 1.00 NA 8.00 17.0 246 14.0 221 11.0 1.00
The following code is my attempt. 以下代码是我的尝试。
virginia_crime %>%
filter(Population > 180000) %>%
group_by(City) %>%
summarise(ratio_violent = `Violent
crime`/(Population/100000),
ratio_murder = `Murder and
nonnegligent
manslaughter`/(Population/100000))
The output is: 输出为:
# A tibble: 5 x 3
City ratio_violent ratio_murder
<chr> <dbl> <dbl>
1 Chesapeake 320 3.90
2 Newport News 439 8.28
3 Norfolk 573 11.3
4 Richmond 624 17.4
5 Virginia Beach 162 3.77
I realize that I should be able to make a function that essentially creates a rate. 我意识到我应该能够创建一个实质上可以创建费率的函数。 Something like... rate <- (crime columns/(Population/1000). Am I even close in my idea, or should I be using one of the apply functions (sapply(summarise()))? I feel this task could be automated somehow, I just cannot figure it out. Would appreciate some insight 比率...- <(犯罪列/(人口/ 1000)。我什至不知道我是应该使用其中一种应用功能(sapply(summarise()))吗?以某种方式实现自动化,我只是想不通。不胜感激
You can gather your columns (all besides city and population) first, which lets you operate on all of them at once: 您可以首先收集您的列(除城市和人口之外的所有列),从而可以一次处理所有列:
library(tidyr)
crime_rates <- virginia_crime %>%
filter(Population > 180000) %>%
gather(Crime, Number, -City, -Population) %>%
mutate(Rate = Number / (Population / 100000))
This will end up with one row for each pair of city and crime, alongside the population, number, and rate. 最后,每一对城市和犯罪都会排成一排,并附带人口,数量和比率。
If you want to turn it back into a wide form, you can use spread (after removing the Number
column): 如果要将其转换为宽格式,可以使用点差(删除“ Number
列之后):
crime_rates %>%
select(-Number) %>%
spread(Crime, Rate)
It's worth noting that the gathered (tidied) version is still quite useful, for example if you want to find the cities with the highest rates of each crime (perhaps to use in a graph): 值得注意的是,收集(整理)的版本仍然非常有用,例如,如果您想找到每种犯罪发生率最高的城市(也许在图表中使用):
crime_rates %>%
group_by(City) %>%
top_n(1, Rate)
Here is an option with mutate_at
. 这是mutate_at
的选项。 In the OP's code, summarise
is used, but it is to summarise an object with 'n' rows to a single row. 在OP的代码, summarise
被使用,但它是总结的对象与“n”个行的单个行。 The ratio always will be not be a single row (based on the OP's code and mutate
should be used in place of summarise
) 比率始终不会是一行(基于OP的代码,应使用mutate
代替summarise
)
library(dplyr)
df1 %>%
filter(Population > 180000) %>%
mutate_at(3:13, funs(./Population/100000))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.