简体   繁体   中英

Calculate difference between consecutive rows per group in dataframe Julia

I have the following dataframe called df:

using DataFrames

df = DataFrame(group = ["A", "A", "A", "A", "B", "B", "B", "B"],
               value = [2,1,4,3,3,5,2,1])

8×2 DataFrame
 Row │ group   value 
     │ String  Int64 
─────┼───────────────
   1 │ A           2
   2 │ A           1
   3 │ A           4
   4 │ A           3
   5 │ B           3
   6 │ B           5
   7 │ B           2
   8 │ B           1

I would like to calculate the difference with previous values of consecutive rows in column value per group. The offset should have NaN, 0, or missing. Here is the desired output:

8×3 DataFrame
 Row │ group   value  diff    
     │ String  Int64  Float64 
─────┼────────────────────────
   1 │ A           2    NaN
   2 │ A           1     -1.0
   3 │ A           4      3.0
   4 │ A           3     -1.0
   5 │ B           3    NaN
   6 │ B           5      2.0
   7 │ B           2     -3.0
   8 │ B           1     -2.0

So I was wondering if anyone knows how to calculate the difference between consecutive rows per group in Julia ?

Using DataFrames.jl (you can replace missing by any value you like):

julia> select(groupby(df, :group),
              :value => (x -> [missing; diff(x)]) => :diff)
8×2 DataFrame
 Row │ group   diff
     │ String  Int64?
─────┼─────────────────
   1 │ A       missing
   2 │ A            -1
   3 │ A             3
   4 │ A            -1
   5 │ B       missing
   6 │ B             2
   7 │ B            -3
   8 │ B            -1

Using DataFramesMeta.jl:

julia> @chain df begin
           groupby(:group)
           @select :diff = [missing; diff(:value)]
       end
8×2 DataFrame
 Row │ group   diff
     │ String  Int64?
─────┼─────────────────
   1 │ A       missing
   2 │ A            -1
   3 │ A             3
   4 │ A            -1
   5 │ B       missing
   6 │ B             2
   7 │ B            -3
   8 │ B            -1

Normally diff in Julia like in eg R would produce one less row (and the syntax would be simpler:

julia> combine(groupby(df, :group), :value => diff => :diff)
6×2 DataFrame
 Row │ group   diff
     │ String  Int64
─────┼───────────────
   1 │ A          -1
   2 │ A           3
   3 │ A          -1
   4 │ B           2
   5 │ B          -3
   6 │ B          -1

julia> @chain df begin
           groupby(:group)
           @combine :diff = diff(:value)
       end
6×2 DataFrame
 Row │ group   diff
     │ String  Int64
─────┼───────────────
   1 │ A          -1
   2 │ A           3
   3 │ A          -1
   4 │ B           2
   5 │ B          -3
   6 │ B          -1

Yet another way would be to use lag from ShiftedArrays.jl:

julia> using ShiftedArrays: lag

julia> @chain df begin
           groupby(:group)
           @combine :diff = :value - lag(:value)
       end
8×2 DataFrame
 Row │ group   diff
     │ String  Int64?
─────┼─────────────────
   1 │ A       missing
   2 │ A            -1
   3 │ A             3
   4 │ A            -1
   5 │ B       missing
   6 │ B             2
   7 │ B            -3
   8 │ B            -1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM