简体   繁体   English

dplyr,purrr,动态生成/计算 R 中的新列

[英]dplyr, purrr, dynamically generate/calculate new columns in R

I have the following problem.我有以下问题。 I have a data frame/tibble that has (a lot) of columns that represent a value in different years, eg the number of inhabitants in a city at different points in time.我有一个数据框/小标题,它有(很多)代表不同年份的值的列,例如不同时间点城市中的居民数量。 I want to generate now columns that give me the growth rate (see pictures attached).我想现在生成给我增长率的列(见附图)。 It should be something like using mutate() while looping over the columns.它应该类似于在遍历列时使用 mutate() 。 I think that should be a common task but I can't find any hint how to do it.我认为这应该是一项常见的任务,但我找不到任何提示如何去做。

Edit:编辑:

A minimal example could look like this:一个最小的示例可能如下所示:

## Minimal example

library(tidyverse)

## Given data frame

df <- tibble(
        City = c("Melbourne", "Sydney", "Adelaide"),
        year_2000 = c(100, 100, 205),
        year_2001 = c(101, 100, 207),
        year_2002 = c(102, 100, 209)
        )

## Result

df <- df %>%
  mutate(
    gr_2000_2001 = year_2001/year_2000*100 - 100,
    gr_2001_2002 = year_2002/year_2001*100 - 100
  )

I want to find a way to automate/do the mutate command in a smart way, as I have to do it for 150 years.我想找到一种以智能方式自动化/执行 mutate 命令的方法,因为我必须这样做 150 年。

enter image description here在此处输入图像描述

enter image description here在此处输入图像描述

The easiest way in this example would probably be to make your data tidy and then apply whatever formula you are using to calculate growth rates by using dplyr's lag() function to a data frame grouped by City :此示例中最简单的方法可能是整理数据,然后通过使用 dplyr 的lag()函数将用于计算增长率的任何公式应用于按City分组的数据框:

## Minimal example
library(tidyverse)
df <- data.frame(City = c("Melbourne", "Sydney"),
             year_2000 = c(100, 100),
             year_2001 = c(101,100),
             year_2002 = c(102, 102))

df %>%
  gather(year, value, 2:4) %>%
  group_by(City) %>%
  mutate(growth = value/dplyr::lag(value,n=1))

The result is this:结果是这样的:

# A tibble: 6 x 4
# Groups:   City [2]
  City      year      value growth
  <fct>     <chr>     <dbl>  <dbl>
1 Melbourne year_2000   100  NA   
2 Sydney    year_2000   100  NA   
3 Melbourne year_2001   101   1.01
4 Sydney    year_2001   100   1   
5 Melbourne year_2002   102   1.01
6 Sydney    year_2002   102   1.02

If you absolutely need the data in the format you provided in the screenshots, you can then apply spread() to reshape it into the original format.如果您绝对需要屏幕截图中提供的格式的数据,则可以应用spread()将其重塑为原始格式。 This is not generally recommended, however.但是,通常不建议这样做。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM