简体   繁体   English

从R格式化JSON输出的策略

[英]Strategies for formatting JSON output from R

I'm trying to figure out the best way of producing a JSON file from R. I have the following dataframe tmp in R . 我试图找出从R生成JSON文件的最佳方法。我在R有以下数据帧tmp

> tmp
  gender age welcoming proud tidy unique
1      1  30         4     4    4      4
2      2  34         4     2    4      4
3      1  34         5     3    4      5
4      2  33         2     3    2      4
5      2  28         4     3    4      4
6      2  26         3     2    4      3

The output of dput(tmp) is as follows: dput(tmp)的输出如下:

tmp <- structure(list(gender = c(1L, 2L, 1L, 2L, 2L, 2L), age = c(30, 
34, 34, 33, 28, 26), welcoming = c(4L, 4L, 5L, 2L, 4L, 3L), proud = c(4L, 
2L, 3L, 3L, 3L, 2L), tidy = c(4L, 4L, 4L, 2L, 4L, 4L), unique = c(4L, 
4L, 5L, 4L, 4L, 3L)), .Names = c("gender", "age", "welcoming", 
"proud", "tidy", "unique"), na.action = structure(c(15L, 39L, 
60L, 77L, 88L, 128L, 132L, 172L, 272L, 304L, 305L, 317L, 328L, 
409L, 447L, 512L, 527L, 605L, 618L, 657L, 665L, 670L, 708L, 709L, 
729L, 746L, 795L, 803L, 826L, 855L, 898L, 911L, 957L, 967L, 983L, 
984L, 988L, 1006L, 1161L, 1162L, 1224L, 1245L, 1256L, 1257L, 
1307L, 1374L, 1379L, 1386L, 1387L, 1394L, 1401L, 1408L, 1434L, 
1446L, 1509L, 1556L, 1650L, 1717L, 1760L, 1782L, 1814L, 1847L, 
1863L, 1909L, 1930L, 1971L, 2004L, 2022L, 2055L, 2060L, 2065L, 
2082L, 2109L, 2121L, 2145L, 2158L, 2159L, 2226L, 2227L, 2281L
), .Names = c("15", "39", "60", "77", "88", "128", "132", "172", 
"272", "304", "305", "317", "328", "409", "447", "512", "527", 
"605", "618", "657", "665", "670", "708", "709", "729", "746", 
"795", "803", "826", "855", "898", "911", "957", "967", "983", 
"984", "988", "1006", "1161", "1162", "1224", "1245", "1256", 
"1257", "1307", "1374", "1379", "1386", "1387", "1394", "1401", 
"1408", "1434", "1446", "1509", "1556", "1650", "1717", "1760", 
"1782", "1814", "1847", "1863", "1909", "1930", "1971", "2004", 
"2022", "2055", "2060", "2065", "2082", "2109", "2121", "2145", 
"2158", "2159", "2226", "2227", "2281"), class = "omit"), row.names = c(NA, 
6L), class = "data.frame")

Using the rjson package, I run the line toJSON(tmp) which produces the following JSON file: 使用rjson包,我运行toJSON(tmp) ,生成以下JSON文件:

 {"gender":[1,2,1,2,2,2],
 "age":[30,34,34,33,28,26],
 "welcoming":[4,4,5,2,4,3],
 "proud":[4,2,3,3,3,2],
  "tidy":[4,4,4,2,4,4],
  "unique":[4,4,5,4,4,3]}

I also experimented with the RJSONIO package; 我还尝试了RJSONIO包; the output of toJSON() was the same. toJSON()的输出是相同的。 What I would like to produce is the following structure: 我想要产生的是以下结构:

  {"traits":["gender","age","welcoming","proud", "tidy", "unique"],
   "values":[   
            {"gender":1,"age":30,"welcoming":4,"proud":4,"tidy":4, "unique":4},
            {"gender":2,"age":34,"welcoming":4,"proud":2,"tidy":4, "unique":4},
            ....
            ]

I'm not sure how best to do this. 我不确定如何最好地做到这一点。 I realize that I can parse it line by line using python but I feel like there is probably a better way of doing this. 我意识到我可以使用python逐行解析它,但我觉得可能有更好的方法来做到这一点。 I also realize that my data structure in R does not reflect the meta-information desired in my JSON file (specifically the traits line), but I am mainly interested in producing the data formatted like the line 我也意识到我在R中的数据结构并不反映我的JSON文件中所需的元信息(特别是traits行),但我主要感兴趣的是生成格式化为行的数据

{"gender":1,"age":30,"welcoming":4,"proud":4,"tidy":4, "unique":4}

as I can manually add the first line. 因为我可以手动添加第一行。


EDIT: I found a useful blog post where the author dealt with a similar problem and provided a solution. 编辑:我找到了一篇有用的博客文章,其中作者处理了类似的问题并提供了解决方案。 This function produces a formatted JSON file from a data frame. 此函数从数据框生成格式化的JSON文件。

toJSONarray <- function(dtf){
clnms <- colnames(dtf)

name.value <- function(i){
quote <- '';
# if(class(dtf[, i])!='numeric'){
if(class(dtf[, i])!='numeric' && class(dtf[, i])!= 'integer'){ # I modified this line so integers are also not enclosed in quotes
quote <- '"';
}

paste('"', i, '" : ', quote, dtf[,i], quote, sep='')
}

objs <- apply(sapply(clnms, name.value), 1, function(x){paste(x, collapse=', ')})
objs <- paste('{', objs, '}')

# res <- paste('[', paste(objs, collapse=', '), ']')
res <- paste('[', paste(objs, collapse=',\n'), ']') # added newline for formatting output

return(res)
}

Building upon Andrie's idea with apply , you can get exactly what you want by modifying the tmp variable before calling toJSON . 在Andrie与思想建设apply ,你可以得到你想要什么通过修改tmp调用之前变量toJSON

library(RJSONIO)
modified <- list(
  traits = colnames(tmp),
  values = unname(apply(tmp, 1, function(x) as.data.frame(t(x))))
)
cat(toJSON(modified))

Using the package jsonlite : 使用jsonlite包:

> jsonlite::toJSON(list(traits = names(tmp), values = tmp), pretty = TRUE)
{
  "traits": ["gender", "age", "welcoming", "proud", "tidy", "unique"],
  "values": [
    {
      "gender": 1,
      "age": 30,
      "welcoming": 4,
      "proud": 4,
      "tidy": 4,
      "unique": 4
    },
    {
      "gender": 2,
      "age": 34,
      "welcoming": 4,
      "proud": 2,
      "tidy": 4,
      "unique": 4
    },
    {
      "gender": 1,
      "age": 34,
      "welcoming": 5,
      "proud": 3,
      "tidy": 4,
      "unique": 5
    },
    {
      "gender": 2,
      "age": 33,
      "welcoming": 2,
      "proud": 3,
      "tidy": 2,
      "unique": 4
    },
    {
      "gender": 2,
      "age": 28,
      "welcoming": 4,
      "proud": 3,
      "tidy": 4,
      "unique": 4
    },
    {
      "gender": 2,
      "age": 26,
      "welcoming": 3,
      "proud": 2,
      "tidy": 4,
      "unique": 3
    }
  ]
} 

Building further on Andrie and Richie's ideas, use alply instead of apply to avoid converting numbers to characters: 进一步构建Andrie和Richie的想法,使用alply而不是apply以避免将数字转换为字符:

library(RJSONIO)
library(plyr)
modified <- list(
  traits = colnames(tmp),
  values = unname(alply(tmp, 1, identity))
)
cat(toJSON(modified))

plyr's alply is similar to apply but returns a list automatically; plyr的alply类似于apply但会自动返回一个列表; whereas without the more complicated function inside Richie Cotton's answer, apply would return a vector or array. 而在Richie Cotton的答案中没有更复杂的函数, apply会返回一个向量或数组。 And those extra steps, including t , mean that if your dataset has any non-numeric columns, the numbers will get converted to strings. 这些额外的步骤,包括t ,意味着如果您的数据集有任何非数字列,则数字将转换为字符串。 So use of alply avoids that concern. 因此,使用alply可以避免这种担忧。

For example, take your tmp dataset and add 例如,获取您的tmp数据集并添加

tmp$grade <- c("A","B","C","D","E","F")

Then compare this code (with alply ) vs the other example (with apply ). 然后将此代码(与alply )与另一个示例(使用apply )进行比较。

It seems to me you can do this by sending each row of your data.frame to JSON with the appropriate apply statement. 在我看来,你可以通过使用适当的apply语句将data.frame每一行发送到JSON来实现。

For a single row: 对于单行:

library(RJSONIO)

> x <- toJSON(tmp[1, ])
> cat(x)
{
 "gender": 1,
"age":     30,
"welcoming": 4,
"proud": 4,
"tidy": 4,
"unique": 4 
}

The entire data.frame : 整个data.frame

x <- apply(tmp, 1, toJSON)
cat(x)
{
 "gender": 1,
"age":     30,
"welcoming": 4,
"proud": 4,
"tidy": 4,
"unique": 4 
} {

...

} {
 "gender": 2,
"age":     26,
"welcoming": 3,
"proud": 2,
"tidy": 4,
"unique": 3 
}

Another option is to use the split to split your data.frame with N rows into N data.frames with 1 row. 另一种选择是使用split将具有N行的data.frame split为具有1行的N个data.frames。

library(RJSONIO)
modified <- list(
   traits = colnames(tmp),
   values = split(tmp, seq_len(nrow(tmp)))
)
cat(toJSON(modified))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM