简体   繁体   中英

How do I unnest list embeded in data.frame column?

I'm new to working with nested lists, so I'm hoping the solution provided can also provide some commenting on the how. I have a nested list that I scraped using jsonlite . How do I take how to take the list data for all teams, and bind together into a single data.frame ? The list is setup below. I copied one element of the list (for 1 team)

Here is the code I used to get to the list that I've pasted below. I'm showing simply so that I can provide how the list is setup.

json <-
  url %>%
  fromJSON(simplifyDataFrame = T)


df <- json$body$rosters

# DF with each team showing up on it's own line, but nested lists in players
df_teams <- df$teams

# One teams worth of data 
JSON_list <- df_teams[1, ]

My list content is below.

JSON_list <- structure(list(
  projected_points = NA, long_abbr = "KE", lineup_status = "ok",
  short_name = "Kramerica", total_roster_salary = 22L, division = "",
  players = list(structure(list(
    firstname = c(
      "Jonathan", "Anthony"
    ), wildcards = structure(list(
      contract = c("1", "1"),
      salary = c("1", "21")
    ), class = "data.frame", row.names = c(
      NA,
      2L
    )), on_waivers = c(
      0L, 0L
    ), photo = c(
      "http://sports.cbsimg.net/images/baseball/mlb/players/170x170/1657581.png",
      "http://sports.cbsimg.net/images/baseball/mlb/players/170x170/1670417.png"
    ),
    eligible_for_offense_and_defense = c(0L, 0L),
    opponents = list(
      structure(list(
        game_id = c(
          "", ""
        ), weather_error = c(
          "Weather is not available for this game yet",
          "Weather is not available for this game yet"
        ),
        weather_icon_code = c(
          "", ""
        ), home_team = c("true", "true"),
        abbrev = c("OAK", "OAK"),
        time = c(
          1553803620L,
          1553911620L
        ),
        date = c(
          "20190328",
          "20190329"
        ), weather_icon_url = c(
          "", ""
        ), venue_type = c("", ""), game_abbr = c("", ""),
        weather = c("", ""), temperature = c(
          NA, NA
        )
      ), class = "data.frame", row.names = c(NA, 2L)),
      structure(list(game_id = c("", "", ""), weather_error = c(
        "Weather is not available for this game yet",
        "Weather is not available for this game yet", "Weather is not available for this game yet"
      ), weather_icon_code = c("", "", ""), home_team = c(
        "true",
        "true", "true"
      ), abbrev = c("TEX", "TEX", "TEX"), time = c(
        1553803500L,
        1553990700L, 1554062700L
      ), date = c(
        "20190328", "20190330",
        "20190331"
      ), weather_icon_url = c("", "", ""), venue_type = c(
        "",
        "", ""
      ), game_abbr = c("", "", ""), weather = c(
        "", "",
        ""
      ), temperature = c(NA, NA, NA)), class = "data.frame", row.names = c(
        NA,
        3L
      ))
    ), icons = structure(list(
      headline = c(
        "Angels' Jonathan Lucroy: Inks deal with Angels",
        NA
      ),
      hot = c(NA, 1L),
      cold = c(1L, NA),
      injury = c(
        "Knee: Questionable for start of season",
        NA
      )
    ), class = "data.frame", row.names = c(NA, 21L)), elias_id = c(
      "LUC758619", "RIZ253611"
    ), percentstarted = c(
      "48%", "97%"
    ),
    profile_link = c(
      "<a class='playerLink' aria-label=' Jonathan Lucroy C LAA' href='http://baseball.cbssports.com/players/playerpage/1657581'>Jonathan Lucroy</a> <span class=\"playerPositionAndTeam\">C | LAA</span> ",
      "<a class='playerLink' aria-label=' Anthony Rizzo 1B CHC' href='http://baseball.cbssports.com/players/playerpage/1670417'>Anthony Rizzo</a> <span class=\"playerPositionAndTeam\">1B | CHC</span>"
    ),
    id = c(
      "1657581", "1670417"
    ), pro_status = c(
      "A", "A"
    ), on_waivers_until = c(NA, NA), jersey = c("20", "44"),
    percentowned = c("61%", "99%"),
    pro_team = c(
      "LAA", "CHC"
    ), position = c(
      "C", "1B"
    ), lastname = c(
      "Lucroy", "Rizzo"
    ),
    roster_pos = c("C", "1B"),
    update_type = c("normal", "normal"),
    age = c(
      32L, 29L
    ), eligible = c(
      "C,U", "1B,U"
    ), is_locked = c(
      0L,
      0L
    ), bats = c(
      "R", "L"
    ), owned_by_team_id = c(
      12L, 12L
    ), ytd_points = c(
      0L, 0L
    ), roster_status = c(
      "A", "A"
    ), is_keeper = c(
      0L, 0L
    ), profile_url = c(
      "http://baseball.cbssports.com/players/playerpage/1657581",
      "http://baseball.cbssports.com/players/playerpage/1670417"
    ), fullname = c(
      "Jonathan Lucroy", "Anthony Rizzo"
    ), throws = c(
      "R",
      "L"
    ), headline = c(
      "Angels' Jonathan Lucroy: Inks deal with Angels",
      NA
    ), `starting-pitcher-today` = c(
      NA, "false"
    ), injury = c(NA, "Knee"), return = c(
      "Questionable for start of season",
      NA
    )
  ), class = "data.frame", row.names = c(NA, 2L))),
  name = "Kramerica Enterprises", logo = "http://baseball.cbssports.com/images/team-logo/main-36x36.jpg",
  abbr = "KE", point = "20190328", id = "12", active_roster_salary = 22L,
  warning = structure(list(description = NA_character_), row.names = 1L, class = "data.frame")
), row.names = 1L, class = "data.frame")

# Desired table sample (does not include all columns)
tibble::tribble(
  ~projected_points, ~long_abbr, ~lineup_status, ~short_name, ~total_roster_salary, ~division,               ~name, ~logo, ~abbr,  ~point5, ~active_roster_salary,    ~id2, ~firstname, ~contract, ~salary,
                 NA,       "KE",           "ok", "Kramerica",                   22,        NA, "Biloxi Blackjacks",    NA,  "KE", 20190328,                    22, 1657581, "Jonathan",         1,       1
  )                    

The issue I'm running into is that the players column looks to be a nested df , and also has other nested df in it. Specifically: "wildcards", "opponents" and "icons". I am looking for a data frame that contains all of the columns. For the nested lists, I'd like their content to show up as columns for that particular player. IE Wildcards, create a column for "contract" and "salary". Also, how would I bind the list together if I wanted to specifically choose columns from JSON_list IE "long_abbr" , "lineup_status" , etc. from the and "firstname" , both wildcard columns, "id" , and some other from the JSON_list$players ?

You can isolate the list elements using [[]] and the columns using [] if you have a nested structure. If the number if rows are equal, you can directly make your dataframe using cbind

Let's make a reproducible example

Create 3 data frames of similar dimensions

 df1 <- data.frame(var1=c('a', 'b', 'c'), var2=c('d', 'e', 'f'), var3=1:3)
 df2 <- data.frame(var4=c('g', 'h', 'i'), var5=c('j', 'k', 'l'), var6=4:6)
 df3 <- data.frame(var7=c(6:8), var8=c('j', 'k', 'l'), var9=4:6)

Put the data frames in a nested list structure

 list <- list(df1,df2)
 nested.list <- list(list, df3)

Make a binded data frame made of var2, var6 and var7

binded.df <- cbind(nested.list[[1]][[1]][2],nested.list[[1]][[2]][3],nested.list[[2]][1])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM