[英]Splitting a dataframe string column into multiple different columns in the same table
我要完成的工作是將同一表中的一列拆分為多個列。
我的資料:
eventCategory eventAction eVentLabel
HomePage Click {"Name":"Ariel","number":"aaa"}
HomePage Click {"Name":"Dan","number":"bbb"}
HomePage Click {"Name":"Daf","number":"ccc"}
我需要的:
eventCategory eventAction eVentLabel Name number
HomePage Click {"Name":"Ariel","number":"aaa"} Ariel aaa
HomePage Click {"Name":"Dan","number":"bbb"} Dan bbb
HomePage Click {"Name":"Daf","number":"ccc"} Daf ccc
另一個tidyverse
答案; 這次使用jsonlite::fromJSON
和purrr
。 該解決方案透明地處理JSON中嵌入的其他列,並適當地填充缺失值。
library(tidyverse)
library(jsonlite)
data.raw <- 'eventCategory eventAction eVentLabel
HomePage Click {"Name":"Ariel","number":"aaa"}
HomePage Click {"Name":"Dan","number":"bbb"}
HomePage Click {"Name":"Daf","number":"ccc"}'
data = read_tsv(data.raw)
data %>%
mutate(new_cols = map(eVentLabel, fromJSON),
new_cols = map(new_cols, as_data_frame)) %>%
unnest(new_cols)
#> # A tibble: 3 x 5
#> eventCategory eventAction eVentLabel Name number
#> <chr> <chr> <chr> <chr> <chr>
#> 1 HomePage Click {"Name":"Ariel","number":"aaa"} Ariel aaa
#> 2 HomePage Click {"Name":"Dan","number":"bbb"} Dan bbb
#> 3 HomePage Click {"Name":"Daf","number":"ccc"} Daf ccc
請注意, unnest
將刪除數據中具有空值的所有行。 考慮以下示例:
data.raw <- 'eventCategory eventAction eVentLabel
HomePage Click {"Name":"Ariel","number":"aaa"}
HomePage Click {"Name":"Dan","number":"bbb"}
HomePage Click {"Name":"Daf","number":"ccc"}
HomePage Click {}
HomePage Click {"Account": "010001"}'
data = read_tsv(data.raw)
data %>%
mutate(new_cols = map(eVentLabel, fromJSON),
new_cols = map(new_cols, as_data_frame)) %>%
unnest(new_cols)
#> # A tibble: 4 x 6
#> eventCategory eventAction eVentLabel Name number Account
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 HomePage Click {"Name":"Ariel","number":"aaa"} Ariel aaa <NA>
#> 2 HomePage Click {"Name":"Dan","number":"bbb"} Dan bbb <NA>
#> 3 HomePage Click {"Name":"Daf","number":"ccc"} Daf ccc <NA>
#> 4 HomePage Click {"Account": "010001"} <NA> <NA> 010001
請注意,我們刪除了在原始數據中具有空JSON( {}
)的行。 我們還為新變量Account添加一列,並適當填寫NA
值。
最后,如果JSON行上有空白行(例如( ""
或NA
)),則嘗試運行將失敗; 您需要先刪除那些,然后fromJSON
使用filter
語句將其傳遞到fromJSON
。 例如:
data %>%
filter(nchar(eVentLabel) > 0, !is.na(eVentLabel)) %>%
...
一種選擇是將字符串分割為:
以提取元素
v1 <- lapply(strsplit(gsub('[{"},]', ':', df1$eVentLabel), ":"),
function(x) {x1 <- trimws(x[nzchar(x)])
setNames(x1[c(FALSE, TRUE)], x1[c(TRUE, FALSE)]) })[[1]]
df1[names(v1)] <- v1
df1
# eventCategory eventAction eVentLabel Name number
#1 HomePage Click {"Name":"Ariel","number":"aaa"} Ariel aaa
對於新數據集
res <- do.call(rbind, lapply(strsplit(gsub('[{"},]', ':', df2$eVentLabel), ":"),
function(x) {x1 <- trimws(x[nzchar(x)])
setNames(x1[c(FALSE, TRUE)], x1[c(TRUE, FALSE)]) }))
df2[names(res)] <- res
df2
# eventCategory eventAction eVentLabel Name number
#1 HomePage Click {"Name":"Ariel","number":"aaa"} Ariel aaa
#2 HomePage Click {"Name":"Dan","number":"bbb"} Dan bbb
#3 HomePage Click {"Name":"Daf","number":"ccc"} Daf ccc
df1 <- structure(list(eventCategory = "HomePage", eventAction = "Click",
eVentLabel = "{\"Name\":\"Ariel\",\"number\":\"aaa\"}"),
.Names = c("eventCategory",
"eventAction", "eVentLabel"), class = "data.frame", row.names = c(NA,
-1L))
df2 <- structure(list(eventCategory = c("HomePage", "HomePage", "HomePage"
), eventAction = c("Click", "Click", "Click"),
eVentLabel = c("{\"Name\":\"Ariel\",\"number\":\"aaa\"}",
"{\"Name\":\"Dan\",\"number\":\"bbb\"}", "{\"Name\":\"Daf\",\"number\":\"ccc\"}"
), Name = c("Ariel", "Dan", "Daf"), number = c("aaa", "bbb",
"ccc")), .Names = c("eventCategory", "eventAction", "eVentLabel",
"Name", "number"), class = "data.frame", row.names = c(NA, -3L
))
tidyverse
方法
library(tidyverse)
library(stringr)
df <- structure(list(eventCategory = c("HomePage", "HomePage", "HomePage"
), eventAction = c("Click", "Click", "Click"), eventLabel = c("{\"Name\":\"Ariel\",\"number\":\"aaa\"}",
"{\"Name\":\"Dan\",\"number\":\"bbb\"}", "{\"Name\":\"Daf\",\"number\":\"ccc\"}"
)), .Names = c("eventCategory", "eventAction", "eventLabel"), row.names = c(NA,
-3L), class = "data.frame")
eventCategory eventAction eventLabel
1 HomePage Click {"Name":"Ariel","number":"aaa"}
2 HomePage Click {"Name":"Dan","number":"bbb"}
3 HomePage Click {"Name":"Daf","number":"ccc"}
vars <- c("name", "number")
df %>%
separate(eventLabel, into = c("name", "number"), sep = ",") %>%
map_at(vars, ~str_split(., ":")) %>%
as_data_frame() %>%
unnest() %>%
map_at(vars, ~str_replace_all(., "[[:punct:]]", "")) %>%
as_data_frame() %>%
filter(name != "Name")
eventCategory eventAction name number
<chr> <chr> <chr> <chr>
1 HomePage Click Ariel aaa
2 HomePage Click Dan bbb
3 HomePage Click Daf ccc
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.