简体   繁体   中英

Nested lists in R to JSON for Power BI

I have created a dataset in R consisting of 800 observations of 20 variables, some of which are vectors (of varying length) eg observation1: var1=1, var2="a", vec1=c("a", "b", "c"), vec2 = c(1,2,3) observation2: var1=1, var2="a", vec1=c("a"), vec2 = c(1,2,3,4,5)

I tried to create a single data frame but it doesn't like the varying length of the vectors, so currently the data exists as multiple vectors of length 800 (one for var1, one for var2 etc) and multiple lists of length 800 (containing vec1, vec2 etc)

Is the only way of combining this into a single data object to use a nested list?

Ultimately I need to output as a JSON to bring into Power BI, but I don't know how to combine the existing elements to achieve that. I tried creating a nested list and then toJSON(), but this does not resolve to a table with columns (in Power BI), rather each list item appears as a row which needs to be expanded into 800 rows.

Any help much appreciated!

Since your ultimate goal is to import this dataset into powerBI, going by dataframe would be best, because in powerBI you would have to again convert it into tabular structure, so converting from dataframe to json and then json to dataframe is a bit of overdo.

Now coming to converting vectors of different length to a dataframe, you will get an error as the elements in dataframe need to have same length. So the trick is to make those elements of same length, by filling extra elements with NA (or any placeholder like 'blank').

#function to generate vectors of varying length
temp_vectors = function(n){
  vector_list = vector('list', n)
  max_size = 0
  
  for(i in 1:n){
    vec_size = sample(1:20, 1)
    vector_list[[i]] = sample(1:10, vec_size, replace=TRUE)
    if(max_size < vec_size) max_size = vec_size
  }
  
  return(list('max_size' = max_size, 'vector_list' = vector_list))
}

x = temp_vectors(10)
vectors = x[['vector_list']]
n = x[['max_size']]

#the above code produces the following vector
vectors[1:2]
#> [[1]]
#>  [1] 6 8 5 5 9 2 1 2 4 7 3 8
#> 
#> [[2]]
#> [1] 1 4 8 8 7 9

#for loop to fill the extra places with NA
for(i in 1:10){
  if(length(vectors[[i]] != n)){
    length(vectors[[i]]) = n
  }
}

vectors[1:2]
#> [[1]]
#>  [1]  6  8  5  5  9  2  1  2  4  7  3  8 NA NA NA NA NA NA NA NA
#> 
#> [[2]]
#>  [1]  1  4  8  8  7  9 NA NA NA NA NA NA NA NA NA NA NA NA NA NA

#combining the vectors into df
df = data.frame(vectors)
colnames(df) = paste0('col.', 1:ncol(df))
df
#>    col.1 col.2 col.3 col.4 col.5 col.6 col.7 col.8 col.9 col.10
#> 1      6     1     3     7     8     6     5     3     6      9
#> 2      8     4    NA     7    10     8     5     1     8     NA
#> 3      5     8    NA    10     5     9     7    NA     1     NA
#> 4      5     8    NA     4     4     2     8    NA     8     NA
#> 5      9     7    NA    NA     6     6     6    NA     9     NA
#> 6      2     9    NA    NA     5     4     7    NA     4     NA
#> 7      1    NA    NA    NA     1     1     2    NA     8     NA
#> 8      2    NA    NA    NA     9     7     6    NA     5     NA
#> 9      4    NA    NA    NA     4     8    10    NA     8     NA
#> 10     7    NA    NA    NA     3     6    NA    NA     1     NA
#> 11     3    NA    NA    NA     1     6    NA    NA     9     NA
#> 12     8    NA    NA    NA     6     7    NA    NA     8     NA
#> 13    NA    NA    NA    NA     6     8    NA    NA    10     NA
#> 14    NA    NA    NA    NA     2     8    NA    NA    10     NA
#> 15    NA    NA    NA    NA     1     9    NA    NA     5     NA
#> 16    NA    NA    NA    NA     2     3    NA    NA     2     NA
#> 17    NA    NA    NA    NA     1     1    NA    NA    NA     NA
#> 18    NA    NA    NA    NA     6     6    NA    NA    NA     NA
#> 19    NA    NA    NA    NA    NA     9    NA    NA    NA     NA
#> 20    NA    NA    NA    NA    NA     9    NA    NA    NA     NA

Created on 2020-07-04 by the reprex package (v0.3.0)

Note you can do the same with your 800 vectors, iterate through them and change their size to the size of longest vector, the extra indexes would be automatically filled with NA.

As I understand, you currently have:

vector_var1 <- c(1, 1)
vector_var2 <- c("a", "a")
list_vec1 <- list(c("a", "b", "c"), c("a"))
list_vec2 <- list(c(1,2,3), c(1,2,3,4,5))

You can put that in a dataframe with list columns :

dat <- data.frame(
  var1 = vector_var1,
  var2 = vector_var2,
  vec1 = I(list_vec1),
  vec2 = I(list_vec2)
)

This gives this JSON:

jsonlite::toJSON(dat, pretty = TRUE)
# [
#   {
#     "var1": 1,
#     "var2": "a",
#     "vec1": ["a", "b", "c"],
#     "vec2": [1, 2, 3]
#   },
#   {
#     "var1": 1,
#     "var2": "a",
#     "vec1": ["a"],
#     "vec2": [1, 2, 3, 4, 5]
#   }
# ]

Is it what you need?

EDIT

Following the discussion in the comments, here is how you can achieve the desired result, using purrr::transpose and rlist::list.zip :

vector_var1 <- c(1, 1)
vector_var2 <- c("a", "a")
list_vec1 <- list(c("a", "b", "c"), c("a", "b", "c", "d", "e"))
list_vec2 <- list(c(1,2,3), c(1,2,3,4,5))

L1 <- purrr::transpose(list(list_vec1, list_vec2))
L2 <- lapply(L1, function(vecs){
  do.call(rlist::list.zip, c(vecs, list(use.names=FALSE)))
})

dat <- data.frame(var1 = vector_var1, var2 = vector_var1, vec = I(L2))

jsonlite::toJSON(dat, auto_unbox = TRUE)
# [{"var1":1,"var2":1,"vec":[["a",1],["b",2],["c",3]]},{"var1":1,"var2":1,"vec":[["a",1],["b",2],["c",3],["d",4],["e",5]]}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM