簡體   English   中英

將數據框字符串列拆分為同一表中的多個不同列

[英]Splitting a dataframe string column into multiple different columns in the same table

我要完成的工作是將同一表中的一列拆分為多個列。

我的資料:

eventCategory   eventAction  eVentLabel
HomePage        Click        {"Name":"Ariel","number":"aaa"}
HomePage        Click        {"Name":"Dan","number":"bbb"}
HomePage        Click        {"Name":"Daf","number":"ccc"}

我需要的:

eventCategory   eventAction eVentLabel                      Name    number
HomePage        Click       {"Name":"Ariel","number":"aaa"} Ariel   aaa
HomePage        Click       {"Name":"Dan","number":"bbb"}   Dan     bbb
HomePage        Click       {"Name":"Daf","number":"ccc"}   Daf     ccc

另一個tidyverse答案; 這次使用jsonlite::fromJSONpurrr 該解決方案透明地處理JSON中嵌入的其他列,並適當地填充缺失值。

library(tidyverse)
library(jsonlite)

data.raw <- 'eventCategory  eventAction eVentLabel
HomePage    Click   {"Name":"Ariel","number":"aaa"}
HomePage    Click   {"Name":"Dan","number":"bbb"}
HomePage    Click   {"Name":"Daf","number":"ccc"}'

data = read_tsv(data.raw)

data %>%
    mutate(new_cols = map(eVentLabel, fromJSON),
           new_cols = map(new_cols, as_data_frame)) %>%
    unnest(new_cols)

#> # A tibble: 3 x 5
#>   eventCategory eventAction                      eVentLabel  Name number
#>           <chr>       <chr>                           <chr> <chr>  <chr>
#> 1      HomePage       Click {"Name":"Ariel","number":"aaa"} Ariel    aaa
#> 2      HomePage       Click   {"Name":"Dan","number":"bbb"}   Dan    bbb
#> 3      HomePage       Click   {"Name":"Daf","number":"ccc"}   Daf    ccc

請注意, unnest將刪除數據中具有空值的所有行。 考慮以下示例:

data.raw <- 'eventCategory  eventAction eVentLabel
HomePage    Click   {"Name":"Ariel","number":"aaa"}
HomePage    Click   {"Name":"Dan","number":"bbb"}
HomePage    Click   {"Name":"Daf","number":"ccc"}
HomePage    Click   {}
HomePage    Click   {"Account": "010001"}'

data = read_tsv(data.raw)

data %>%
    mutate(new_cols = map(eVentLabel, fromJSON),
           new_cols = map(new_cols, as_data_frame)) %>%
    unnest(new_cols)

#> # A tibble: 4 x 6
#>   eventCategory eventAction                      eVentLabel  Name number   Account
#>           <chr>       <chr>                           <chr> <chr>  <chr>     <chr>
#> 1      HomePage       Click {"Name":"Ariel","number":"aaa"} Ariel    aaa      <NA>
#> 2      HomePage       Click   {"Name":"Dan","number":"bbb"}   Dan    bbb      <NA>
#> 3      HomePage       Click   {"Name":"Daf","number":"ccc"}   Daf    ccc      <NA>
#> 4      HomePage       Click           {"Account": "010001"}  <NA>   <NA>      010001

請注意,我們刪除了在原始數據中具有空JSON( {} )的行。 我們還為新變量Account添加一列,並適當填寫NA值。

最后,如果JSON行上有空白行(例如( ""NA )),則嘗試運行將失敗; 您需要先刪除那些,然后fromJSON使用filter語句將其傳遞到fromJSON 例如:

data %>%
    filter(nchar(eVentLabel) > 0, !is.na(eVentLabel)) %>%
    ...

一種選擇是將字符串分割為:以提取元素

v1 <- lapply(strsplit(gsub('[{"},]', ':', df1$eVentLabel), ":"), 
        function(x) {x1 <- trimws(x[nzchar(x)])
             setNames(x1[c(FALSE, TRUE)], x1[c(TRUE, FALSE)]) })[[1]]
df1[names(v1)] <- v1
df1
#  eventCategory eventAction                      eVentLabel  Name number
#1      HomePage       Click {"Name":"Ariel","number":"aaa"} Ariel    aaa

對於新數據集

res <- do.call(rbind, lapply(strsplit(gsub('[{"},]', ':', df2$eVentLabel), ":"),
              function(x) {x1 <- trimws(x[nzchar(x)])
              setNames(x1[c(FALSE, TRUE)], x1[c(TRUE, FALSE)]) }))
df2[names(res)] <- res
df2
#  eventCategory eventAction                      eVentLabel  Name number
#1      HomePage       Click {"Name":"Ariel","number":"aaa"} Ariel    aaa
#2      HomePage       Click   {"Name":"Dan","number":"bbb"}   Dan    bbb
#3      HomePage       Click   {"Name":"Daf","number":"ccc"}   Daf    ccc

數據

df1 <- structure(list(eventCategory = "HomePage", eventAction = "Click", 
eVentLabel = "{\"Name\":\"Ariel\",\"number\":\"aaa\"}"), 
.Names = c("eventCategory", 
"eventAction", "eVentLabel"), class = "data.frame", row.names = c(NA, 
-1L))

df2 <- structure(list(eventCategory = c("HomePage", "HomePage", "HomePage"
 ), eventAction = c("Click", "Click", "Click"), 
  eVentLabel = c("{\"Name\":\"Ariel\",\"number\":\"aaa\"}", 
 "{\"Name\":\"Dan\",\"number\":\"bbb\"}", "{\"Name\":\"Daf\",\"number\":\"ccc\"}"
 ), Name = c("Ariel", "Dan", "Daf"), number = c("aaa", "bbb", 
 "ccc")), .Names = c("eventCategory", "eventAction", "eVentLabel", 
 "Name", "number"), class = "data.frame", row.names = c(NA, -3L
 ))

tidyverse方法

library(tidyverse)
library(stringr) 

    df <- structure(list(eventCategory = c("HomePage", "HomePage", "HomePage"
), eventAction = c("Click", "Click", "Click"), eventLabel = c("{\"Name\":\"Ariel\",\"number\":\"aaa\"}", 
"{\"Name\":\"Dan\",\"number\":\"bbb\"}", "{\"Name\":\"Daf\",\"number\":\"ccc\"}"
)), .Names = c("eventCategory", "eventAction", "eventLabel"), row.names = c(NA, 
-3L), class = "data.frame")

  eventCategory eventAction                      eventLabel
1      HomePage       Click {"Name":"Ariel","number":"aaa"}
2      HomePage       Click   {"Name":"Dan","number":"bbb"}
3      HomePage       Click   {"Name":"Daf","number":"ccc"}

vars <- c("name", "number")

df %>% 
  separate(eventLabel, into = c("name", "number"), sep = ",") %>% 
  map_at(vars, ~str_split(., ":")) %>% 
  as_data_frame() %>% 
  unnest() %>% 
  map_at(vars, ~str_replace_all(., "[[:punct:]]", "")) %>% 
  as_data_frame() %>% 
  filter(name != "Name")

  eventCategory eventAction  name number
          <chr>       <chr> <chr>  <chr>
1      HomePage       Click Ariel    aaa
2      HomePage       Click   Dan    bbb
3      HomePage       Click   Daf    ccc

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM