简体   繁体   English

R:在数据框的列之间添加插值?

[英]R: Add interpolated values in between columns of dataframe?

I have a data frame that looks like this我有一个看起来像这样的数据框

Region      2000    2001   2002    2003    2004      2005
Australia   15.6    18.4   19.2    20.2    39.1      50.2
Norway      19.05   20.2   15.3    10      10.1      5.6

and basically I need a quick way to add extra columns in-between the currently existing columns that contain interpolated values of the surrounding columns.基本上我需要一种快速的方法来在包含周围列的插值的当前现有列之间添加额外的列。

Think of it like this: say you don't want columns for every year, but rather columns for every quarter.可以这样想:假设您不想要每年的列,而是每个季度的列。 Then, for every pair of years (like 2000 and 2001), we would need to add 3 extra columns in-between these years.然后,对于每一对年份(如 2000 年和 2001 年),我们需要在这些年份之间添加 3 个额外的列。

The values of these columns will just be interpolated values.这些列的值只是内插值。 So, for Australia, the value in 2000 is 15.6 and in 2001 it is 18.4.因此,对于澳大利亚,2000 年的值为 15.6,2001 年为 18.4。 So we calculate (18.4 - 15.6)/4 = 0.7, and then the values should now be 15.6, 16.3, 17, 17.7, and finally 18.4.所以我们计算 (18.4 - 15.6)/4 = 0.7,然后值现在应该是 15.6、16.3、17、17.7,最后是 18.4。

I have a working solution that builds up the new dataframe from scratch using a for loop.我有一个工作解决方案,它使用 for 循环从头开始构建新的数据框。 It is EXTREMELY slow.它非常慢。 How to speed this up?如何加快速度?

This is how I did it when I had a similar problem.当我遇到类似问题时,我就是这样做的。 Not the most sophisticated solution but it works.不是最复杂的解决方案,但它有效。

Australia=c(  15.6,  18.4,  19.2,  20.2,   39.1,     50.2)

library(zoo)
midpoints=rollmean(Australia, 2)
biyearly=c(rbind(Australia,midpoints))
midpoints=rollmean(biyearly, 2)
quarterly=c(rbind(biyearly,midpoints))
quarterly
#[1] 15.600 16.300 17.000 17.700 18.400 18.600 18.800 19.000 19.200 19.450 19.700
#[12] 19.950 20.200 24.925 29.650 34.375 39.100 41.875 44.650 47.425 50.200 33.600
#[23] 17.000 16.300

Here is a solution using dplyr.这是使用 dplyr 的解决方案。 Should be more consistent and much faster than a loop:应该比循环更一致和更快:

# dummy data
df <- tibble(Region = LETTERS[1:5],
             `2000` = 1:5,
             `2001` = 3:7,
             `2002` = 10:14)

# function to calculate quarterly values
into_quarter <- function(x) x / 4

df %>% 
  # create new variables that contain quarterly values
  mutate_at(vars(starts_with("200")), 
            .funs = list("Q1" = into_quarter,
                         "Q2" = into_quarter,
                         "Q3" = into_quarter,
                         "Q4" = into_quarter)) %>% 
  # sort them approriatly.
  # can also be done with base R and order(names), depending on desired result
  select(Region, 
         starts_with("2000"),
         starts_with("2001"),
         starts_with("2002"),
         # in case there are also other variables and to not loose any information
         everything())

Here is one way with tidyverse :这是tidyverse一种方式:

library(tidyverse)

df %>%
  #get data in long format
  pivot_longer(cols = -Region) %>%
  #group by Region
  group_by(Region) %>%
  #Create 4 number sequence between every 2 value
  summarise(temp = list(unlist(map2(value[-n()], value[-1], seq, length.out = 4)))) %>%
  #Get data in long format
  unnest(temp) %>%
  group_by(Region) %>%
  #Create column name
  mutate(col = paste0(rep(names(df)[-c(1, ncol(df))], each = 4), "Q", 1:4)) %>%
  #Spread data in wide format
  pivot_wider(names_from = col, values_from = temp)

# A tibble: 2 x 21
# Groups:   Region [2]
#  Region `2000Q1` `2000Q2` `2000Q3` `2000Q4` `2001Q1` `2001Q2` `2001Q3` `2001Q4` `2002Q1`
#  <fct>     <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#1 Austr…     15.6     16.5     17.5     18.4     18.4     18.7     18.9     19.2     19.2
#2 Norway     19.0     19.4     19.8     20.2     20.2     18.6     16.9     15.3     15.3
# … with 11 more variables: `2002Q2` <dbl>, `2002Q3` <dbl>, `2002Q4` <dbl>,
#   `2003Q1` <dbl>, `2003Q2` <dbl>, `2003Q3` <dbl>, `2003Q4` <dbl>, `2004Q1` <dbl>,
#   `2004Q2` <dbl>, `2004Q3` <dbl>, `2004Q4` <dbl>

data数据

df <- structure(list(Region = structure(1:2, .Label = c("Australia", 
"Norway"), class = "factor"), `2000` = c(15.6, 19.05), `2001` = c(18.4, 
20.2), `2002` = c(19.2, 15.3), `2003` = c(20.2, 10), `2004` = c(39.1, 
10.1), `2005` = c(50.2, 5.6)), class = "data.frame", row.names = c(NA, -2L))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM