简体   繁体   中英

Recursive aggregation of data tree

I'm using the data.tree package to perform analysis instead of nested lists as my actual data ends up with the lists being too nested and quite difficult to work with.

I've put together sample data based off this question. Aggregating values on a data tree with R . In my own data I have the to, from, hours data and need to create the actual_hours data which is the sum of the children subtracted from the hours value for its parent node.

library(data.tree)
#Have
to <- c("Team1", "Team1-1","Team1-1", "Team1-1-1", "Team1-1-1", "Team1-1-1", "Team1-1-2")
from <- c("Team1-1", "Team1-1-1","Team1-1-2", "Team1-1-1a", "Team1-1-1b", "Team1-1-1c" ,"Team1-1-2a")
hours <- c(NA,150,200,65,20,30, 30)
df <- data.frame(from,to,hours)

# Create data tree
tree <- FromDataFrameNetwork(df)
###Current Output
print(tree, "hours")
              levelName hours
1 Team1                     NA
2  °--Team1-1               NA
3      ¦--Team1-1-1        150
4      ¦   ¦--Team1-1-1a    65
5      ¦   ¦--Team1-1-1b    20
6      ¦   °--Team1-1-1c    30
7      °--Team1-1-2        200
8          °--Team1-1-2a    30


#Need to create
actual_hours <- c(NA,35,170, 65,20,30, 30)
df <- data.frame(from,to,hours, actual_hours)
# Create data tree
tree <- FromDataFrameNetwork(df)
#Desired output
print(tree, "hours", 'actual_hours')

               levelName hours actual_hours
1 Team1                     NA           NA
2  °--Team1-1               NA           NA
3      ¦--Team1-1-1        150           35
4      ¦   ¦--Team1-1-1a    65           65
5      ¦   ¦--Team1-1-1b    20           20
6      ¦   °--Team1-1-1c    30           30
7      °--Team1-1-2        200          170
8          °--Team1-1-2a    30           30

I'm not sure exactly how to do it since it involves upward and downward movement in the tree? I think using the height and/or the level attributes of the tree is the way to go but not sure.

Assume that the data.tree looks like this

               levelName hours
1 Team1                     NA
2  °--Team1-1                0
3      ¦--Team1-1-1        150
4      ¦   ¦--Team1-1-1a    65
5      ¦   ¦--Team1-1-1b    20
6      ¦   °--Team1-1-1c    30
7      °--Team1-1-2        200
8          °--Team1-1-2a    30

Here are two versions for you to try out as I'm not sure about which one you want.


Non-cumulative

tree$Do(function(node) {
  node$actual_hours <- node$hours - if (node$isLeaf) 0 else Aggregate(node, attribute = "hours", aggFun = sum)
}, traversal = "post-order")
> print(tree, "hours", "actual_hours")
               levelName hours actual_hours
1 Team1                     NA           NA
2  °--Team1-1                0         -350 # see here, -350=0-(150+200)
3      ¦--Team1-1-1        150           35
4      ¦   ¦--Team1-1-1a    65           65
5      ¦   ¦--Team1-1-1b    20           20
6      ¦   °--Team1-1-1c    30           30
7      °--Team1-1-2        200          170
8          °--Team1-1-2a    30           30

Cumulative

tree$Do(function(node) {
  node$actual_hours <- node$hours - if (node$isLeaf) 0 else Aggregate(node, attribute = "actual_hours", aggFun = sum)
}, traversal = "post-order")
> print(tree, "hours", "actual_hours")
               levelName hours actual_hours
1 Team1                     NA           NA
2  °--Team1-1                0         -205 # -205=0-(35+170)
3      ¦--Team1-1-1        150           35
4      ¦   ¦--Team1-1-1a    65           65
5      ¦   ¦--Team1-1-1b    20           20
6      ¦   °--Team1-1-1c    30           30
7      °--Team1-1-2        200          170
8          °--Team1-1-2a    30           30

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM