简体   繁体   中英

Removing "." from column names in R when using a JSON

I am trying to clean up column names in R. I am working with a JSON dataset that I used a jsonlite function called "stream_in" to import into R.

First off, I tried the "gsub" command and the "paste" command but both didn't work.

The problem seems to me like so: when I use the command head to inspect the data, it reads to me all column names even the ones containing "." and, strangely, "spaces" but if I use the names command, it only reads the ones without "dots" or "spaces". Any suggestions? I have columns with names such as

hours.Monday.open attributes.Alcohol

and I would like to remove the "."

I tried something like this

names(restaurant.data)[3] <- paste("HoursMondayOpen")

but that only removed the word before the first "." and the new column name was "HoursMondayOpen.Monday.Open"

I also tried

names(restaurant.data) <- gsub("\\.", "", names(restaurant.data))

but that simply didn't change anything, neither did it give me an error.

Does that help?

Here's the output from dput()

> dput(head(restaurant.data))
structure(list(business_id = c("5UmKMjUEUNdYWqANhGckJw", "UsFtqoBl7naz8AVUBZMjQQ", 
"3eu6MEFlq2Dg7bQh8QbdOg", "cE27W9VPgO88Qxe4ol6y_g", "HZdLhv6COCleJMo7nPl-RA", 
"mVHrayjG3uZ_RLHkLj-AMg"), FullAddress = c("4734 Lebanon Church Rd\nDravosburg, PA 15034", 
"202 McClure St\nDravosburg, PA 15034", "1 Ravine St\nDravosburg, PA 15034", 
"1530 Hamilton Rd\nBethel Park, PA 15234", "301 South Hills Village\nPittsburgh, PA 15241", 
"414 Hawkins Ave\nrankin, PA 15104"), HoursFridayClose = structure(list(
    Friday = structure(list(close = c("21:00", NA, NA, NA, "17:00", 
    "20:00"), open = c("11:00", NA, NA, NA, "10:00", "10:00")), .Names = c("close", 
    "open"), row.names = c(NA, 6L), class = "data.frame"), Tuesday = structure(list(
        close = c("21:00", NA, NA, NA, "21:00", "19:00"), open = c("11:00", 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Thursday = structure(list(
        close = c("21:00", NA, NA, NA, "17:00", "19:00"), open = c("11:00", 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Wednesday = structure(list(
        close = c("21:00", NA, NA, NA, "21:00", "19:00"), open = c("11:00", 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Monday = structure(list(
        close = c("21:00", NA, NA, NA, "21:00", NA), open = c("11:00", 
        NA, NA, NA, "10:00", NA)), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Sunday = structure(list(
        close = c(NA, NA, NA, NA, "18:00", NA), open = c(NA, 
        NA, NA, NA, "11:00", NA)), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Saturday = structure(list(
        close = c(NA, NA, NA, NA, "21:00", "16:00"), open = c(NA, 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame")), .Names = c("Friday", 
"Tuesday", "Thursday", "Wednesday", "Monday", "Sunday", "Saturday"
), row.names = c(NA, 6L), class = "data.frame"), open = c(TRUE, 
TRUE, TRUE, FALSE, TRUE, TRUE), categories = list(c("Fast Food", 
"Restaurants"), "Nightlife", c("Auto Repair", "Automotive"), 
    c("Active Life", "Mini Golf", "Golf"), c("Shopping", "Home Services", 
    "Internet Service Providers", "Mobile Phones", "Professional Services", 
    "Electronics"), c("Bars", "American (New)", "Nightlife", 
    "Lounges", "Restaurants")), city = c("Dravosburg", "Dravosburg", 
"Dravosburg", "Bethel Park", "Pittsburgh", "rankin"), review_count = c(4L, 
4L, 3L, 5L, 5L, 20L), name = c("Mr Hoagie", "Clancy's Pub", "Joe Cislo's Auto", 
"Cool Springs Golf Center", "Verizon", "Emil's Lounge"), neighborhoods = list(
    character(0), character(0), character(0), character(0), character(0), 
    character(0)), longitude = c(-79.9007057, -79.8868138, -79.889059, 
-80.0146597, -80.05998, -79.8802474), state = c("PA", "PA", "PA", 
"PA", "PA", "PA"), stars = c(4.5, 3.5, 5, 2.5, 2.5, 5), latitude = c(40.3543266, 
40.3505527, 40.3509559, 40.3541155, 40.35762, 40.4134643), attributes = structure(list(
    `Take-out` = c(TRUE, NA, NA, NA, NA, TRUE), `Drive-Thru` = c(FALSE, 
    NA, NA, NA, NA, NA), `Good For` = structure(list(dessert = c(FALSE, 
    NA, NA, NA, NA, FALSE), latenight = c(FALSE, NA, NA, NA, 
    NA, FALSE), lunch = c(FALSE, NA, NA, NA, NA, TRUE), dinner = c(FALSE, 
    NA, NA, NA, NA, FALSE), brunch = c(FALSE, NA, NA, NA, NA, 
    FALSE), breakfast = c(FALSE, NA, NA, NA, NA, FALSE)), .Names = c("dessert", 
    "latenight", "lunch", "dinner", "brunch", "breakfast"), row.names = c(NA, 
    6L), class = "data.frame"), Caters = c(FALSE, NA, NA, NA, 
    NA, TRUE), `Noise Level` = c("average", NA, NA, NA, NA, "average"
    ), `Takes Reservations` = c(FALSE, NA, NA, NA, NA, FALSE), 
    Delivery = c(FALSE, NA, NA, NA, NA, FALSE), Ambience = structure(list(
        romantic = c(FALSE, NA, NA, NA, NA, FALSE), intimate = c(FALSE, 
        NA, NA, NA, NA, FALSE), classy = c(FALSE, NA, NA, NA, 
        NA, FALSE), hipster = c(FALSE, NA, NA, NA, NA, FALSE), 
        divey = c(FALSE, NA, NA, NA, NA, FALSE), touristy = c(FALSE, 
        NA, NA, NA, NA, FALSE), trendy = c(FALSE, NA, NA, NA, 
        NA, FALSE), upscale = c(FALSE, NA, NA, NA, NA, FALSE), 
        casual = c(FALSE, NA, NA, NA, NA, FALSE)), .Names = c("romantic", 
    "intimate", "classy", "hipster", "divey", "touristy", "trendy", 
    "upscale", "casual"), row.names = c(NA, 6L), class = "data.frame"), 
    Parking = structure(list(garage = c(FALSE, NA, NA, NA, FALSE, 
    FALSE), street = c(FALSE, NA, NA, NA, FALSE, FALSE), validated = c(FALSE, 
    NA, NA, NA, FALSE, FALSE), lot = c(FALSE, NA, NA, NA, FALSE, 
    FALSE), valet = c(FALSE, NA, NA, NA, FALSE, FALSE)), .Names = c("garage", 
    "street", "validated", "lot", "valet"), row.names = c(NA, 
    6L), class = "data.frame"), `Has TV` = c(FALSE, NA, NA, NA, 
    NA, TRUE), `Outdoor Seating` = c(FALSE, FALSE, NA, NA, NA, 
    FALSE), Attire = c("casual", NA, NA, NA, NA, "casual"), Alcohol = c("none", 
    NA, NA, NA, NA, "full_bar"), `Waiter Service` = c(FALSE, 
    NA, NA, NA, NA, TRUE), `Accepts Credit Cards` = c(TRUE, TRUE, 
    NA, NA, FALSE, TRUE), `Good for Kids` = c(TRUE, NA, NA, TRUE, 
    NA, TRUE), `Good For Groups` = c(TRUE, TRUE, NA, NA, NA, 
    TRUE), `Price Range` = c(1L, 1L, NA, NA, 2L, 1L), `Happy Hour` = c(NA, 
    TRUE, NA, NA, NA, FALSE), `Good For Dancing` = c(NA, NA, 
    NA, NA, NA, FALSE), `Coat Check` = c(NA, NA, NA, NA, NA, 
    FALSE), Smoking = c(NA, NA, NA, NA, NA, "no"), `Wi-Fi` = c(NA, 
    NA, NA, NA, NA, "no"), Music = structure(list(dj = c(NA, 
    NA, NA, NA, NA, FALSE), background_music = c(NA, NA, NA, 
    NA, NA, NA), jukebox = c(NA, NA, NA, NA, NA, NA), live = c(NA, 
    NA, NA, NA, NA, NA), video = c(NA, NA, NA, NA, NA, NA), karaoke = c(NA, 
    NA, NA, NA, NA, NA)), .Names = c("dj", "background_music", 
    "jukebox", "live", "video", "karaoke"), row.names = c(NA, 
    6L), class = "data.frame"), `Wheelchair Accessible` = c(NA, 
    NA, NA, NA, NA, NA), `Dogs Allowed` = c(NA, NA, NA, NA, NA, 
    NA), BYOB = c(NA, NA, NA, NA, NA, NA), Corkage = c(NA, NA, 
    NA, NA, NA, NA), `BYOB/Corkage` = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), `Order at Counter` = c(NA, NA, NA, NA, NA, NA), `By Appointment Only` = c(NA, 
    NA, NA, NA, NA, NA), `Open 24 Hours` = c(NA, NA, NA, NA, 
    NA, NA), `Hair Types Specialized In` = structure(list(coloring = c(NA, 
    NA, NA, NA, NA, NA), africanamerican = c(NA, NA, NA, NA, 
    NA, NA), curly = c(NA, NA, NA, NA, NA, NA), perms = c(NA, 
    NA, NA, NA, NA, NA), kids = c(NA, NA, NA, NA, NA, NA), extensions = c(NA, 
    NA, NA, NA, NA, NA), asian = c(NA, NA, NA, NA, NA, NA), straightperms = c(NA, 
    NA, NA, NA, NA, NA)), .Names = c("coloring", "africanamerican", 
    "curly", "perms", "kids", "extensions", "asian", "straightperms"
    ), row.names = c(NA, 6L), class = "data.frame"), `Accepts Insurance` = c(NA, 
    NA, NA, NA, NA, NA), `Ages Allowed` = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), `Dietary Restrictions` = structure(list(`dairy-free` = c(NA, 
    NA, NA, NA, NA, NA), `gluten-free` = c(NA, NA, NA, NA, NA, 
    NA), vegan = c(NA, NA, NA, NA, NA, NA), kosher = c(NA, NA, 
    NA, NA, NA, NA), halal = c(NA, NA, NA, NA, NA, NA), `soy-free` = c(NA, 
    NA, NA, NA, NA, NA), vegetarian = c(NA, NA, NA, NA, NA, NA
    )), .Names = c("dairy-free", "gluten-free", "vegan", "kosher", 
    "halal", "soy-free", "vegetarian"), row.names = c(NA, 6L), class = "data.frame")), .Names = c("Take-out", 
"Drive-Thru", "Good For", "Caters", "Noise Level", "Takes Reservations", 
"Delivery", "Ambience", "Parking", "Has TV", "Outdoor Seating", 
"Attire", "Alcohol", "Waiter Service", "Accepts Credit Cards", 
"Good for Kids", "Good For Groups", "Price Range", "Happy Hour", 
"Good For Dancing", "Coat Check", "Smoking", "Wi-Fi", "Music", 
"Wheelchair Accessible", "Dogs Allowed", "BYOB", "Corkage", "BYOB/Corkage", 
"Order at Counter", "By Appointment Only", "Open 24 Hours", "Hair Types Specialized In", 
"Accepts Insurance", "Ages Allowed", "Dietary Restrictions"), row.names = c(NA, 
6L), class = "data.frame"), type = c("business", "business", 
"business", "business", "business", "business")), .Names = c("business_id", 
"FullAddress", "HoursFridayClose", "open", "categories", "city", 
"review_count", "name", "neighborhoods", "longitude", "state", 
"stars", "latitude", "attributes", "type"), row.names = c(NA, 
6L), class = "data.frame")
> 

Here's all of it in, in all of its glory!

Your data is quite convoluted (I would say messy...) since you have a data.frame where some columns are in turn a data.frame whose columns are again data.frame 's... also some columns are just list with elements of different lenght inside (ie columns "neighborhoods" and "categories" )

So, I would flatten where possible with this custom function:

poormansUnnest <- function(nestedDF){
  toBind <- list()
  for(col in names(nestedDF)){
    if(is.data.frame(nestedDF[[col]])){
      df <- poormansUnnest(nestedDF[[col]])
      names(df) <- paste0(col,'.',names(df))
      toBind[[length(toBind)+1]] <- df
    }else{
      toBind[[length(toBind)+1]] <- nestedDF[col]
    }
  }
  final <- do.call(cbind.data.frame,toBind)
  return(final)
}

res <- poormansUnnest(restaurant.data)

# store list columns in separate object (then you would do whatever you need with them...)
categories <- res$categories
neighborhoods<- res$neighborhoods

# remove the list columns from the data.frame
res$categories <- NULL
res$neighborhoods <- NULL

So now, you should be able to rename the columns of res with gsub

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM