简体   繁体   中英

R extracting lists within dataframes

What is the best way to parse lists embedded in variables within a dataframe?

When parsing json in R (I typically use the jsonlite package), I frequently end up with data frame columns containing lists (of other lists or data frames). A trivial example of this is parsing Twitter stream data where the coordinates are returned as as a list of latitude and longitude. A more complex example (and the one I am currently wrestling with) is a JSON of doctors that parses the addresses into a list of dataframes. Here is some example data illustrating the structure (this is public data, by the way):

> str(df)
Classes ‘tbl_df’ and 'data.frame':  2 obs. of  2 variables:
 $ addresses:List of 2
  ..$ :'data.frame':    1 obs. of  6 variables:
  .. ..$ address  : chr "Department of Palliative Care"
  .. ..$ address_2: chr "2525 Cumberland Parkway, SE"
  .. ..$ city     : chr "Atlanta"
  .. ..$ state    : chr "GA"
  .. ..$ zip      : chr "30305"
  .. ..$ phone    : chr "4043650966"
  ..$ :'data.frame':    2 obs. of  6 variables:
  .. ..$ address  : chr  "5445 Meridian Mark Road" "3619 South Fulton Avenue"
  .. ..$ address_2: chr  "Suite 370" ""
  .. ..$ city     : chr  "Atlanta" "Hapeville"
  .. ..$ state    : chr  "GA" "GA"
  .. ..$ zip      : chr  "30342" "30354"
  .. ..$ phone    : chr  "4047652020" "4047652020"
 $ npi      : chr  "1497831390" "1578667986"

jsonlite has a function (flatten) for extracting embedded data frames to individual variables, but it does not work on lists.

In the Twitter example, I can extract the list items to variables in the same dataframe using a for loop:

for (i in 1:nrow(df)){
  #sometimes coordinates is blank, so check
  if (length(df2$coordinates.coordinates[[i]]>0)){
    df2[i,"coordinates.lon"]<- df2$coordinates.coordinates[[i]][1]
    df2[i,"coordinates.lat"]<- df2$coordinates.coordinates[[i]][2]
  }

In the Doctor example, since each Doctor can have multiple addresses, I need to create a new dataset.

library(dplyr)
addresses = data.frame()
for (i in 1:nrow(df)){
  x<-df$addresses[[i]]
  #need an identifier
  x$id <-df[[i,"npi"]]
  addresses <-bind_rows(addresses, x)
}

While both of these examples work, they are both a) slow and b) not the "R" way of doing things (as I understand it).

So, my question is: what's a better, faster, more "R" way of extracting lists from data frame variables?

Thanks to Richard Scriven. unnest in tidr gave me exactly what I needed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM