简体   繁体   中英

Trouble with dataframes from nested lists with Knitr

I have a list called ct2 from doing a webscrape. I can name the columns on non-nested lists with the below code. However I want to add the coordinates and when I call head(ct2$business) I see ct2$business$coordinates$latitude and ct2$business$coordinates$longitude . I cant seem to pull these out into the dataframe without receiving an error. So I am unsure of what I am doing incorrect. Below is the code that works for assigning columns. Also below is the error I receive when I try and add the coordinates.

biz_info <- ct2$businesses %>% 
map_df(`[`, c("name", "id", "phone", "review_count","rating","url")) 
biz_info %>% knitr::kable()

When I add the coordinates I get the error during the attempt to execute

biz_info <- ct2$businesses %>% 
map_df(`[`, c("name", "id", "phone", 
"review_count","rating","url","coordinates")) 
Error in bind_rows_(x, .id) : Argument 7 must be length 1, not 2
biz_info %>% knitr::kable()

**edit to add data example

dput(head(ct2$businesses))
list(structure(list(id = "mcdonalds-hartford-7", name = "McDonald's", 
image_url = "https://s3-
media3.fl.yelpcdn.com/bphoto/hgpL9l7A10vRoWy84NPV_g/o.jpg", 
is_closed = FALSE, url = "https://www.yelp.com/biz/mcdonalds-hartford-7?", 
review_count = 4L, categories = list(structure(list(alias = "hotdogs", 
    title = "Fast Food"), .Names = c("alias", "title")), 
    structure(list(alias = "burgers", title = "Burgers"), .Names = c("alias", 
    "title"))), rating = 3.5, coordinates = structure(list(
    latitude = 41.738611, longitude = -72.65921), .Names = c("latitude", 
"longitude")), transactions = list(), price = "$", location = structure(list(
    address1 = "76 Brainard Rd", address2 = "", address3 = "", 
    city = "Hartford", zip_code = "06114", country = "US", 
    state = "CT", display_address = list("76 Brainard Rd", 
        "Hartford, CT 06114")), .Names = c("address1", "address2", 
"address3", "city", "zip_code", "country", "state", "display_address"
)), phone = "+18602477300", display_phone = "(860) 247-7300", 
distance = 3155.923625766), .Names = c("id", "name", "image_url", 
"is_closed", "url", "review_count", "categories", "rating", "coordinates", 
"transactions", "price", "location", "phone", "display_phone", 
"distance")), structure(list(id = "mcdonalds-restaurants-hartford-3", 
name = "McDonalds Restaurants", image_url = "", is_closed = FALSE, 
url = "https://www.yelp.com/biz/mcdonalds-restaurants-hartford-3?", 
review_count = 2L, categories = list(structure(list(alias = "restaurants", 
    title = "Restaurants"), .Names = c("alias", "title"))), 
rating = 2.5, coordinates = structure(list(latitude = 41.75251, 
    longitude = -72.71448), .Names = c("latitude", "longitude"
)), transactions = list(), location = structure(list(address1 = "214 Prospect Ave", 
    address2 = "", address3 = "", city = "Hartford", zip_code = "06106", 
    country = "US", state = "CT", display_address = list(
        "214 Prospect Ave", "Hartford, CT 06106")), .Names = c("address1", 
"address2", "address3", "city", "zip_code", "country", "state", 
"display_address")), phone = "+18605238859", display_phone = "(860) 523-8859", 
distance = 2591.628349648), .Names = c("id", "name", "image_url", 
"is_closed", "url", "review_count", "categories", "rating", "coordinates", 
"transactions", "location", "phone", "display_phone", "distance"
)), structure(list(id = "mcdonalds-hartford-9", name = "McDonald's", 
image_url = "https://s3-media4.fl.yelpcdn.com/bphoto/49EjiRF2Yb91rBV6wbuHZw/o.jpg", 
is_closed = FALSE, url = "https://www.yelp.com/biz/mcdonalds-hartford-9?", 
review_count = 9L, categories = list(structure(list(alias = "burgers", 
    title = "Burgers"), .Names = c("alias", "title")), structure(list(
    alias = "hotdogs", title = "Fast Food"), .Names = c("alias", 
"title"))), rating = 2.5, coordinates = structure(list(latitude = 41.75251, 
    longitude = -72.71448), .Names = c("latitude", "longitude"
)), transactions = list(), price = "$", location = structure(list(
    address1 = "214 Prospect Ave", address2 = "", address3 = "", 
    city = "Hartford", zip_code = "06106", country = "US", 
    state = "CT", display_address = list("214 Prospect Ave", 
        "Hartford, CT 06106")), .Names = c("address1", "address2", 
"address3", "city", "zip_code", "country", "state", "display_address"
)), phone = "+18605235303", display_phone = "(860) 523-5303", 
distance = 2591.628349648), .Names = c("id", "name", "image_url", 
"is_closed", "url", "review_count", "categories", "rating", "coordinates", 
"transactions", "price", "location", "phone", "display_phone", 
"distance")), structure(list(id = "mcdonalds-hartford-10", name = "McDonald's", 
image_url = "https://s3-media4.fl.yelpcdn.com/bphoto/da-sL4n1xX2VkLbqIWr5hw/o.jpg", 
is_closed = FALSE, url = "https://www.yelp.com/biz/mcdonalds-hartford-10?", 
review_count = 9L, categories = list(structure(list(alias = "burgers", 
    title = "Burgers"), .Names = c("alias", "title")), structure(list(
    alias = "hotdogs", title = "Fast Food"), .Names = c("alias", 
"title"))), rating = 1, coordinates = structure(list(latitude = 41.7573503, 
    longitude = -72.68223), .Names = c("latitude", "longitude"
)), transactions = list(), location = structure(list(address1 = "172 Washington St", 
    address2 = "", address3 = "", city = "Hartford", zip_code = "06106", 
    country = "US", state = "CT", display_address = list(
        "172 Washington St", "Hartford, CT 06106")), .Names = c("address1", 
"address2", "address3", "city", "zip_code", "country", "state", 
"display_address")), phone = "+18605602292", display_phone = "(860) 560-2292", 
distance = 374.2938759334), .Names = c("id", "name", "image_url", 
"is_closed", "url", "review_count", "categories", "rating", "coordinates", 
"transactions", "location", "phone", "display_phone", "distance"
)), structure(list(id = "mcdonalds-hartford-12", name = "McDonald's", 
image_url = "https://s3-media4.fl.yelpcdn.com/bphoto/B0SDIM3ylqAN6hOgOkyybQ/o.jpg", 
is_closed = FALSE, url = "https://www.yelp.com/biz/mcdonalds-hartford-12", 
review_count = 4L, categories = list(structure(list(alias = "hotdogs", 
    title = "Fast Food"), .Names = c("alias", "title")), 
    structure(list(alias = "burgers", title = "Burgers"), .Names = c("alias", 
    "title"))), rating = 2, coordinates = structure(list(
    latitude = 41.7828446485687, longitude = -72.6981766090747), .Names = c("latitude", 
"longitude")), transactions = list(), price = "$", location = structure(list(
    address1 = "1303 Albany Ave", address2 = "", address3 = "", 
    city = "Hartford", zip_code = "06112", country = "US", 
    state = "CT", display_address = list("1303 Albany Ave", 
        "Hartford, CT 06112")), .Names = c("address1", "address2", 
"address3", "city", "zip_code", "country", "state", "display_address"
)), phone = "+18602473612", display_phone = "(860) 247-3612", 
distance = 2730.544003504), .Names = c("id", "name", "image_url", 
"is_closed", "url", "review_count", "categories", "rating", "coordinates", 
"transactions", "price", "location", "phone", "display_phone", 
"distance")), structure(list(id = "mcdonalds-hartford", name = "McDonald's", 
image_url = "https://s3-media2.fl.yelpcdn.com/bphoto/rnWHncxwC1qK5T9KvSIVBA/o.jpg", 
is_closed = FALSE, url = "https://www.yelp.com/biz/mcdonalds-hartford?", 
review_count = 8L, categories = list(structure(list(alias = "hotdogs", 
    title = "Fast Food"), .Names = c("alias", "title")), 
    structure(list(alias = "burgers", title = "Burgers"), .Names = c("alias", 
    "title"))), rating = 1.5, coordinates = structure(list(
    latitude = 41.7876, longitude = -72.66214), .Names = c("latitude", 
"longitude")), transactions = list(), price = "$", location = structure(list(
    address1 = "98 Weston St", address2 = "", address3 = "", 
    city = "Hartford", zip_code = "06120", country = "US", 
    state = "CT", display_address = list("98 Weston St", 
        "Hartford, CT 06120")), .Names = c("address1", "address2", 
"address3", "city", "zip_code", "country", "state", "display_address"
)), phone = "+18607240200", display_phone = "(860) 724-0200", 
distance = 3622.578151942), .Names = c("id", "name", "image_url", 
"is_closed", "url", "review_count", "categories", "rating", "coordinates", 
"transactions", "price", "location", "phone", "display_phone", 
"distance")))

Argh a lot of wrangling around to make it work. The problem is with the nested lists, such as $location which contains multiple values. I figured that you can solve this by calling glue::collapse(sep = ";") on each of the lists. A bit of a hack, but you end up with a data structure that is easier to handle. Try this:

library(tidyverse)

extractor <- function(list_element){
  map(list_element, glue::collapse, sep = ";")
}

nested_list %>% 
  map(extractor) %>% 
  transpose() %>% 
  as_tibble() %>% 
  View()

The nested_list() is the part of the dataset you've included in your question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM