简体   繁体   English

删除“。” 使用 JSON 时来自 R 中的列名

[英]Removing "." from column names in R when using a JSON

I am trying to clean up column names in R. I am working with a JSON dataset that I used a jsonlite function called "stream_in" to import into R.我正在尝试清理 R 中的列名。我正在处理一个 JSON 数据集,我使用名为“stream_in”的jsonlite函数将其导入到 R 中。

First off, I tried the "gsub" command and the "paste" command but both didn't work.首先,我尝试了“gsub”命令和“paste”命令,但都不起作用。

The problem seems to me like so: when I use the command head to inspect the data, it reads to me all column names even the ones containing "."在我看来,问题是这样的:当我使用命令检查数据时,它会向我读取所有列名,甚至包括包含“.”的列名。 and, strangely, "spaces" but if I use the names command, it only reads the ones without "dots" or "spaces".而且,奇怪的是,“空格”,但如果我使用名称命令,它只会读取没有“点”或“空格”的那些。 Any suggestions?有什么建议? I have columns with names such as我有名称的列,例如

hours.Monday.open attributes.Alcohol hours.Monday.open attributes.Alcohol

and I would like to remove the "."我想删除“。”

I tried something like this我试过这样的事情

names(restaurant.data)[3] <- paste("HoursMondayOpen")名称(restaurant.data)[3] <- paste(“HoursMondayOpen”)

but that only removed the word before the first "."但这只是删除了第一个“。”之前的单词。 and the new column name was "HoursMondayOpen.Monday.Open"新列名称为“HoursMondayOpen.Monday.Open”

I also tried我也试过

names(restaurant.data) <- gsub("\\.", "", names(restaurant.data))名称(餐厅.数据)<- gsub(“\\.”,“”,名称(餐厅.数据))

but that simply didn't change anything, neither did it give me an error.但这根本没有改变任何东西,也没有给我一个错误。

Does that help?这有帮助吗?

Here's the output from dput()这是 dput() 的输出

> dput(head(restaurant.data))
structure(list(business_id = c("5UmKMjUEUNdYWqANhGckJw", "UsFtqoBl7naz8AVUBZMjQQ", 
"3eu6MEFlq2Dg7bQh8QbdOg", "cE27W9VPgO88Qxe4ol6y_g", "HZdLhv6COCleJMo7nPl-RA", 
"mVHrayjG3uZ_RLHkLj-AMg"), FullAddress = c("4734 Lebanon Church Rd\nDravosburg, PA 15034", 
"202 McClure St\nDravosburg, PA 15034", "1 Ravine St\nDravosburg, PA 15034", 
"1530 Hamilton Rd\nBethel Park, PA 15234", "301 South Hills Village\nPittsburgh, PA 15241", 
"414 Hawkins Ave\nrankin, PA 15104"), HoursFridayClose = structure(list(
    Friday = structure(list(close = c("21:00", NA, NA, NA, "17:00", 
    "20:00"), open = c("11:00", NA, NA, NA, "10:00", "10:00")), .Names = c("close", 
    "open"), row.names = c(NA, 6L), class = "data.frame"), Tuesday = structure(list(
        close = c("21:00", NA, NA, NA, "21:00", "19:00"), open = c("11:00", 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Thursday = structure(list(
        close = c("21:00", NA, NA, NA, "17:00", "19:00"), open = c("11:00", 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Wednesday = structure(list(
        close = c("21:00", NA, NA, NA, "21:00", "19:00"), open = c("11:00", 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Monday = structure(list(
        close = c("21:00", NA, NA, NA, "21:00", NA), open = c("11:00", 
        NA, NA, NA, "10:00", NA)), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Sunday = structure(list(
        close = c(NA, NA, NA, NA, "18:00", NA), open = c(NA, 
        NA, NA, NA, "11:00", NA)), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame"), Saturday = structure(list(
        close = c(NA, NA, NA, NA, "21:00", "16:00"), open = c(NA, 
        NA, NA, NA, "10:00", "10:00")), .Names = c("close", "open"
    ), row.names = c(NA, 6L), class = "data.frame")), .Names = c("Friday", 
"Tuesday", "Thursday", "Wednesday", "Monday", "Sunday", "Saturday"
), row.names = c(NA, 6L), class = "data.frame"), open = c(TRUE, 
TRUE, TRUE, FALSE, TRUE, TRUE), categories = list(c("Fast Food", 
"Restaurants"), "Nightlife", c("Auto Repair", "Automotive"), 
    c("Active Life", "Mini Golf", "Golf"), c("Shopping", "Home Services", 
    "Internet Service Providers", "Mobile Phones", "Professional Services", 
    "Electronics"), c("Bars", "American (New)", "Nightlife", 
    "Lounges", "Restaurants")), city = c("Dravosburg", "Dravosburg", 
"Dravosburg", "Bethel Park", "Pittsburgh", "rankin"), review_count = c(4L, 
4L, 3L, 5L, 5L, 20L), name = c("Mr Hoagie", "Clancy's Pub", "Joe Cislo's Auto", 
"Cool Springs Golf Center", "Verizon", "Emil's Lounge"), neighborhoods = list(
    character(0), character(0), character(0), character(0), character(0), 
    character(0)), longitude = c(-79.9007057, -79.8868138, -79.889059, 
-80.0146597, -80.05998, -79.8802474), state = c("PA", "PA", "PA", 
"PA", "PA", "PA"), stars = c(4.5, 3.5, 5, 2.5, 2.5, 5), latitude = c(40.3543266, 
40.3505527, 40.3509559, 40.3541155, 40.35762, 40.4134643), attributes = structure(list(
    `Take-out` = c(TRUE, NA, NA, NA, NA, TRUE), `Drive-Thru` = c(FALSE, 
    NA, NA, NA, NA, NA), `Good For` = structure(list(dessert = c(FALSE, 
    NA, NA, NA, NA, FALSE), latenight = c(FALSE, NA, NA, NA, 
    NA, FALSE), lunch = c(FALSE, NA, NA, NA, NA, TRUE), dinner = c(FALSE, 
    NA, NA, NA, NA, FALSE), brunch = c(FALSE, NA, NA, NA, NA, 
    FALSE), breakfast = c(FALSE, NA, NA, NA, NA, FALSE)), .Names = c("dessert", 
    "latenight", "lunch", "dinner", "brunch", "breakfast"), row.names = c(NA, 
    6L), class = "data.frame"), Caters = c(FALSE, NA, NA, NA, 
    NA, TRUE), `Noise Level` = c("average", NA, NA, NA, NA, "average"
    ), `Takes Reservations` = c(FALSE, NA, NA, NA, NA, FALSE), 
    Delivery = c(FALSE, NA, NA, NA, NA, FALSE), Ambience = structure(list(
        romantic = c(FALSE, NA, NA, NA, NA, FALSE), intimate = c(FALSE, 
        NA, NA, NA, NA, FALSE), classy = c(FALSE, NA, NA, NA, 
        NA, FALSE), hipster = c(FALSE, NA, NA, NA, NA, FALSE), 
        divey = c(FALSE, NA, NA, NA, NA, FALSE), touristy = c(FALSE, 
        NA, NA, NA, NA, FALSE), trendy = c(FALSE, NA, NA, NA, 
        NA, FALSE), upscale = c(FALSE, NA, NA, NA, NA, FALSE), 
        casual = c(FALSE, NA, NA, NA, NA, FALSE)), .Names = c("romantic", 
    "intimate", "classy", "hipster", "divey", "touristy", "trendy", 
    "upscale", "casual"), row.names = c(NA, 6L), class = "data.frame"), 
    Parking = structure(list(garage = c(FALSE, NA, NA, NA, FALSE, 
    FALSE), street = c(FALSE, NA, NA, NA, FALSE, FALSE), validated = c(FALSE, 
    NA, NA, NA, FALSE, FALSE), lot = c(FALSE, NA, NA, NA, FALSE, 
    FALSE), valet = c(FALSE, NA, NA, NA, FALSE, FALSE)), .Names = c("garage", 
    "street", "validated", "lot", "valet"), row.names = c(NA, 
    6L), class = "data.frame"), `Has TV` = c(FALSE, NA, NA, NA, 
    NA, TRUE), `Outdoor Seating` = c(FALSE, FALSE, NA, NA, NA, 
    FALSE), Attire = c("casual", NA, NA, NA, NA, "casual"), Alcohol = c("none", 
    NA, NA, NA, NA, "full_bar"), `Waiter Service` = c(FALSE, 
    NA, NA, NA, NA, TRUE), `Accepts Credit Cards` = c(TRUE, TRUE, 
    NA, NA, FALSE, TRUE), `Good for Kids` = c(TRUE, NA, NA, TRUE, 
    NA, TRUE), `Good For Groups` = c(TRUE, TRUE, NA, NA, NA, 
    TRUE), `Price Range` = c(1L, 1L, NA, NA, 2L, 1L), `Happy Hour` = c(NA, 
    TRUE, NA, NA, NA, FALSE), `Good For Dancing` = c(NA, NA, 
    NA, NA, NA, FALSE), `Coat Check` = c(NA, NA, NA, NA, NA, 
    FALSE), Smoking = c(NA, NA, NA, NA, NA, "no"), `Wi-Fi` = c(NA, 
    NA, NA, NA, NA, "no"), Music = structure(list(dj = c(NA, 
    NA, NA, NA, NA, FALSE), background_music = c(NA, NA, NA, 
    NA, NA, NA), jukebox = c(NA, NA, NA, NA, NA, NA), live = c(NA, 
    NA, NA, NA, NA, NA), video = c(NA, NA, NA, NA, NA, NA), karaoke = c(NA, 
    NA, NA, NA, NA, NA)), .Names = c("dj", "background_music", 
    "jukebox", "live", "video", "karaoke"), row.names = c(NA, 
    6L), class = "data.frame"), `Wheelchair Accessible` = c(NA, 
    NA, NA, NA, NA, NA), `Dogs Allowed` = c(NA, NA, NA, NA, NA, 
    NA), BYOB = c(NA, NA, NA, NA, NA, NA), Corkage = c(NA, NA, 
    NA, NA, NA, NA), `BYOB/Corkage` = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), `Order at Counter` = c(NA, NA, NA, NA, NA, NA), `By Appointment Only` = c(NA, 
    NA, NA, NA, NA, NA), `Open 24 Hours` = c(NA, NA, NA, NA, 
    NA, NA), `Hair Types Specialized In` = structure(list(coloring = c(NA, 
    NA, NA, NA, NA, NA), africanamerican = c(NA, NA, NA, NA, 
    NA, NA), curly = c(NA, NA, NA, NA, NA, NA), perms = c(NA, 
    NA, NA, NA, NA, NA), kids = c(NA, NA, NA, NA, NA, NA), extensions = c(NA, 
    NA, NA, NA, NA, NA), asian = c(NA, NA, NA, NA, NA, NA), straightperms = c(NA, 
    NA, NA, NA, NA, NA)), .Names = c("coloring", "africanamerican", 
    "curly", "perms", "kids", "extensions", "asian", "straightperms"
    ), row.names = c(NA, 6L), class = "data.frame"), `Accepts Insurance` = c(NA, 
    NA, NA, NA, NA, NA), `Ages Allowed` = c(NA_character_, NA_character_, 
    NA_character_, NA_character_, NA_character_, NA_character_
    ), `Dietary Restrictions` = structure(list(`dairy-free` = c(NA, 
    NA, NA, NA, NA, NA), `gluten-free` = c(NA, NA, NA, NA, NA, 
    NA), vegan = c(NA, NA, NA, NA, NA, NA), kosher = c(NA, NA, 
    NA, NA, NA, NA), halal = c(NA, NA, NA, NA, NA, NA), `soy-free` = c(NA, 
    NA, NA, NA, NA, NA), vegetarian = c(NA, NA, NA, NA, NA, NA
    )), .Names = c("dairy-free", "gluten-free", "vegan", "kosher", 
    "halal", "soy-free", "vegetarian"), row.names = c(NA, 6L), class = "data.frame")), .Names = c("Take-out", 
"Drive-Thru", "Good For", "Caters", "Noise Level", "Takes Reservations", 
"Delivery", "Ambience", "Parking", "Has TV", "Outdoor Seating", 
"Attire", "Alcohol", "Waiter Service", "Accepts Credit Cards", 
"Good for Kids", "Good For Groups", "Price Range", "Happy Hour", 
"Good For Dancing", "Coat Check", "Smoking", "Wi-Fi", "Music", 
"Wheelchair Accessible", "Dogs Allowed", "BYOB", "Corkage", "BYOB/Corkage", 
"Order at Counter", "By Appointment Only", "Open 24 Hours", "Hair Types Specialized In", 
"Accepts Insurance", "Ages Allowed", "Dietary Restrictions"), row.names = c(NA, 
6L), class = "data.frame"), type = c("business", "business", 
"business", "business", "business", "business")), .Names = c("business_id", 
"FullAddress", "HoursFridayClose", "open", "categories", "city", 
"review_count", "name", "neighborhoods", "longitude", "state", 
"stars", "latitude", "attributes", "type"), row.names = c(NA, 
6L), class = "data.frame")
> 

Here's all of it in, in all of its glory!一切尽在其中,尽显荣耀!

Your data is quite convoluted (I would say messy...) since you have a data.frame where some columns are in turn a data.frame whose columns are again data.frame 's... also some columns are just list with elements of different lenght inside (ie columns "neighborhoods" and "categories" )您的数据非常复杂(我会说很乱...),因为您有一个data.frame ,其中一些列又是一个data.frame其列又是data.frame的...还有一些列只是列出内部不同长度的元素(即列"neighborhoods""categories"

So, I would flatten where possible with this custom function:所以,我会尽可能用这个自定义函数展平:

poormansUnnest <- function(nestedDF){
  toBind <- list()
  for(col in names(nestedDF)){
    if(is.data.frame(nestedDF[[col]])){
      df <- poormansUnnest(nestedDF[[col]])
      names(df) <- paste0(col,'.',names(df))
      toBind[[length(toBind)+1]] <- df
    }else{
      toBind[[length(toBind)+1]] <- nestedDF[col]
    }
  }
  final <- do.call(cbind.data.frame,toBind)
  return(final)
}

res <- poormansUnnest(restaurant.data)

# store list columns in separate object (then you would do whatever you need with them...)
categories <- res$categories
neighborhoods<- res$neighborhoods

# remove the list columns from the data.frame
res$categories <- NULL
res$neighborhoods <- NULL

So now, you should be able to rename the columns of res with gsub所以现在,您应该能够使用gsub重命名res的列

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM