R 解析具有 JSON 数组的 dataframe 列并转换为一键编码

Question

I have a dataframe with a column having JSON array in the string form.我有一个 dataframe 列，该列具有字符串形式的 JSON 数组。 My goal is to parse the column and convert into one-hot encoding but I'm facing an error while parsing the JSON.我的目标是解析列并转换为单热编码，但在解析 JSON 时遇到错误。

library(jsonlite)
> df <- data_frame(Amenities=c("[\"Parking\", \"Lawn\", \"Garage\", \"Frontyard\"]", "[\"Parking\", \"Lawn\", \"Garage\", \"Backyard\"]", "[\"Parking\", \"Lawn\", \"Garage\"]"))
> df
# A tibble: 3 x 1
  Amenities                                           
  <chr>                                               
1 "[\"Parking\", \"Lawn\", \"Garage\", \"Frontyard\"]"
2 "[\"Parking\", \"Lawn\", \"Garage\", \"Backyard\"]" 
3 "[\"Parking\", \"Lawn\", \"Garage\"]"               
> df <- df %>% mutate(Amenities=fromJSON(Amenities))
Error: parse error: trailing garbage
          awn", "Garage", "Frontyard"] ["Parking", "Lawn", "Garage", "
                     (right here) ------^
>

Expected Output:预期 Output：

Parking  Lawn  Garage  Frontyard  Backyard
      1     1       1          1         0
      1     1       1          0         1
      1     1       1          0         0

Solution: Preserving the existing dataframe as well.解决方案：同时保留现有的 dataframe。

library(qdapTools)
df <- cbind(df, +(mtabulate(str_extract_all(df$amenities, "\\w+( +\\w+)*"))))

Answer 1

We can do this in a single line with mtabulate我们可以用mtabulate在一行中做到这一点

library(qdapTools)
library(stringr)
mtabulate(str_extract_all(df$Amenities, "\\w+"))

-output -输出

#  Backyard Frontyard Garage Lawn Parking
#1        0         1      1    1       1
#2        1         0      1    1       1
#3        0         0      1    1       1

Answer 2

You can treat the json as strings, clean them and expand the dataset.您可以将 json 视为字符串，清理它们并扩展数据集。

library(dplyr)

df %>%
  mutate(Amenities = gsub('\\[|\\]|"', '', Amenities)) %>%
  splitstackshape::cSplit_e("Amenities", sep = ',\\s*', 
                            type = 'character', fill = 0, fixed = FALSE) %>%
  rename_with(~sub('Amenities_', '', .))

#                         Amenities Backyard Frontyard Garage Lawn Parking
#1 Parking, Lawn, Garage, Frontyard        0         1      1    1       1
#2  Parking, Lawn, Garage, Backyard        1         0      1    1       1
#3            Parking, Lawn, Garage        0         0      1    1       1

R 解析具有 JSON 数组的 dataframe 列并转换为一键编码

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-12-08 21:13:08

解决方案2
0 2020-12-08 03:08:53

R 解析具有 JSON 数组的 dataframe 列并转换为一键编码

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-12-08 21:13:08

解决方案2 0 2020-12-08 03:08:53

解决方案1
1 已采纳 2020-12-08 21:13:08

解决方案2
0 2020-12-08 03:08:53