简体   繁体   English

从R中的分组数据帧重塑JSON输出

[英]Reshaping JSON output from a grouped dataframe in R

I have an R dataframe of the form: 我有一个形式的R数据框:

Country Region  Year    V1  V2
AAAA    XXXX    2001    12  13
BBBB    YYYY    2001    14  15
AAAA    XXXX    2002    36  56
AAAA    XXXX    1999    45  67

and would like to generate a JSON equivalent of the form: 并希望生成与以下形式等效的JSON:

[
  {"Country": "AAAA",
   "Region":"XXXX",
    "V1": [ [1999,45], [2001,12] , [2002,36] ],
    "V2":[ [1999,67], [2001,13] , [2002,56] ]
  },
  {"Country": "BBBB",
   "Region":"YYYY",
   "V1":[ [2001,14] ],
   "V2":[ [2001,15] ]
  }
]

I'm imagining this requires: 我在想这需要:

  1. grouping by Country and Region 按国家和地区分组
  2. sorting by Year within the groups 在组中按年份排序
  3. for each of the remaining columns Vx in V1, V2 (an arbitrary number of columns which may have arbitrary names), generating a list that contains list elements [Year, Vx], for ordered Year 对于V1,V2中的其余每个列Vx(任意数目的列,可以具有任意名称),生成一个包含有序年份的列表元素[Year,Vx]的列表

but am struggling to find a way to do it? 但是正在努力寻找一种方法吗?

Here is another way to do this. 这是执行此操作的另一种方法。

dat <- read.table(textConnection("Country Region  Year    V1  V2
AAAA    XXXX    2001    12  13
BBBB    YYYY    2001    14  15
AAAA    XXXX    2002    36  56
AAAA    XXXX    1999    45  67"), header = TRUE)

We add two helper functions to zip vectors together and a custom sort function which sorts a list by elements in a given position. 我们将两个辅助函数添加到zip向量中,并添加了一个自定义排序函数,该函数按给定位置的元素对列表进行排序。

#' Pluck element
pluck_ = function (element){
  function(x) x[[element]]
}

#' Zip two vectors
zip_ <- function(..., names = F){
  x = list(...)
  y = lapply(seq_along(x[[1]]), function(i) lapply(x, pluck_(i)))
  if (names) names(y) = seq_along(y)
  return(y)
}

#' Sort a vector based on elements at a given position
sort_ <- function(v, i = 1){
  v[sort(sapply(v, '[[', i), index.return = T)$ix]
}

Time to put things together and use the split-apply-combine magic to get the output you seek. 是时候将这些东西放在一起,并使用split-apply-combine魔术来获取您想要的输出了。

library(plyr)
dat2 <- dlply(dat, .(Country, Region), function(d){
  list(
    Country = d$Country[1],
    Region = d$Region[1],
    V1 = sort_(zip_(d$Year, d$V1)),
    V2 = sort_(zip_(d$Year, d$V2))
  )
})
cat(rjson::toJSON(setNames(dat2, NULL)))

This gives you the output 这给你输出

[
  {"Country":"AAAA",
   "Region":"XXXX",
   "V1":[[1999,45],[2001,12],[2002,36]],
   "V2":[[1999,67],[2001,13],[2002,56]]
  },
  {"Country":"BBBB",
   "Region":"YYYY",
   "V1":[[2001,14]],
   "V2":[[2001,15]]
  }
]

Here's a sort of messy function to do this (you could easily add sorting by year of the V1 and V2 arrays): 这是一种杂乱的功能(您可以轻松地按V1和V2数组的年份添加排序):

dat <- read.table(textConnection(
'Country Region  Year    V1  V2
AAAA    XXXX    2001    12  13
BBBB    YYYY    2001    14  15
AAAA    XXXX    2002    36  56
AAAA    XXXX    1999    45  67'
), header=TRUE, stringsAsFactors=FALSE)

library(plyr); library(RJSONIO)
myfunc <- function(nn)
{
  tt <- split(nn, nn$Country)
  bar <- function(w){
    foo <- function(x, y, z) paste(x[y], x[z], sep=",")
    V1 <- as.character(apply(w, 1, foo, y="Year", z="V1"))
    V2 <- as.character(apply(w, 1, foo, y="Year", z="V2"))
    datlist <- list(Country = unique(w$Country), 
                    Region = unique(w$Region), 
                    V1 = V1, V2=V2)
  }
  datlist <- lapply(tt, bar)
  names(datlist) <- NULL
  RJSONIO::toJSON(datlist)
}

cat(myfunc(dat))

[
 {
   "Country": "AAAA",
   "Region": "XXXX",
   "V1": [ "2001,12", "2002,36", "1999,45" ],
   "V2": [ "2001,13", "2002,56", "1999,67" ] 
 },
 {
   "Country": "BBBB",
   "Region": "YYYY",
   "V1": "2001,14",
   "V2": "2001,15" 
  } 
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM