簡體   English   中英

將 readr col_cpec 應用於 data.frame,獨立於從文件中讀取

[英]Apply readr col_cpec to data.frame, independently of reading from file

我有一個data.frame tibble ,我需要對其應用許多類型更新。 我有一個描述所需類型的readr :: col_spec object,但由於數據並非源自 csv 文件,因此我無法使用read_csv(..., col_types=cspec)將更改應用於指定的列。

由於col_spec是一種精確設計用於指定所需數據類型的數據結構,因此我仍將其直接用作 function 的輸入,為我應用更改,而不是編寫長的自定義腳本來應用不同的列。 請參見以下示例:

library(tidyverse)

# Subset starwars to get sw (comparable to my input data)
sw <- starwars %>%
  select(name, height, ends_with("_color")) %>%
  slice(c(1,4,5,19))
sw
#> # A tibble: 4 × 5
#>   name           height hair_color skin_color eye_color
#>   <chr>           <int> <chr>      <chr>      <chr>    
#> 1 Luke Skywalker    172 blond      fair       blue     
#> 2 Darth Vader       202 none       white      yellow   
#> 3 Leia Organa       150 brown      light      brown    
#> 4 Yoda               66 white      green      brown

# The col_spec that I have
cspec <- cols(
  hair_color = col_factor(c("brown", "blond", "white", "none")),
  skin_color = col_factor(c( "green", "light", "fair", "white")),
  eye_color = col_factor(c("blue", "brown", "yellow"))
)

# I would like to apply the col_spec directly to sw

# A not so great workaround is to use a tempfile
tf <- tempfile()
sw %>% write_csv(tf)
sw_fct <- read_csv(tf, col_types=cspec)

# This is more or less the result I am after:
# But note how info on other columns (height) is lost in the roundtrip
sw_fct
#> # A tibble: 4 × 5
#>   name           height hair_color skin_color eye_color
#>   <chr>           <dbl> <fct>      <fct>      <fct>    
#> 1 Luke Skywalker    172 blond      fair       blue     
#> 2 Darth Vader       202 none       white      yellow   
#> 3 Leia Organa       150 brown      light      brown    
#> 4 Yoda               66 white      green      brown

我們可以通過遍歷cols從對象中提取元素來做到這一點

library(readr)
library(purrr)
sw[names(cspec$cols)] <- imap(cspec$cols, ~ parse_factor(sw[[.y]],
     levels = .x$levels, ordered = .x$ordered, include_na = .x$include_na))

- 檢查輸出

> sw
# A tibble: 4 × 5
  name           height hair_color skin_color eye_color
  <chr>           <int> <fct>      <fct>      <fct>    
1 Luke Skywalker    172 blond      fair       blue     
2 Darth Vader       202 none       white      yellow   
3 Leia Organa       150 brown      light      brown    
4 Yoda               66 white      green      brown    

> str(sw)
tibble [4 × 5] (S3: tbl_df/tbl/data.frame)
 $ name      : chr [1:4] "Luke Skywalker" "Darth Vader" "Leia Organa" "Yoda"
 $ height    : int [1:4] 172 202 150 66
 $ hair_color: Factor w/ 4 levels "brown","blond",..: 2 4 1 3
 $ skin_color: Factor w/ 4 levels "green","light",..: 3 4 2 1
 $ eye_color : Factor w/ 3 levels "blue","brown",..: 1 3 2 2

如果我們還需要'spec'的attr ,請進行賦值

attr(sw, "spec") <- cspec

- 檢查str

> str(sw)
tibble [4 × 5] (S3: tbl_df/tbl/data.frame)
 $ name      : chr [1:4] "Luke Skywalker" "Darth Vader" "Leia Organa" "Yoda"
 $ height    : int [1:4] 172 202 150 66
 $ hair_color: Factor w/ 4 levels "brown","blond",..: 2 4 1 3
 $ skin_color: Factor w/ 4 levels "green","light",..: 3 4 2 1
 $ eye_color : Factor w/ 3 levels "blue","brown",..: 1 3 2 2
 - attr(*, "spec")=
  .. cols(
  ..   hair_color = col_factor(levels = c("brown", "blond", "white", "none"), ordered = FALSE, include_na = FALSE),
  ..   skin_color = col_factor(levels = c("green", "light", "fair", "white"), ordered = FALSE, include_na = FALSE),
  ..   eye_color = col_factor(levels = c("blue", "brown", "yellow"), ordered = FALSE, include_na = FALSE)
  .. )

這個答案將@akrun 的解決方案包裝成一個函數,供那些可能不太熟悉 purrr 的人使用。

apply_col_spec <- function(d, cspec, set_spec_attribute=FALSE) {
  
  # A bit of input checking
  if (!all(inherits(d, "data.frame"), inherits(cspec, "col_spec"), 
           is.logical(set_spec_attribute))) {
    stop("apply_col_spec(): wrong input types")
  }
  if (!all(sapply(cspec$cols, inherits, "collector_factor"))) {
    stop("apply_col_spec(): only implemented for factor columns")
  }
  
  # Do the actual application of the col_spec
  d[names(cspec$cols)] <- imap(cspec$cols, ~ parse_factor(d[[.y]],
     levels = .x$levels, ordered = .x$ordered, include_na = .x$include_na))
  
  # If requested, set col_spec as an attribute, for consistency with readr
  if (set_spec_attribute) {
    attr(d, "spec") <- cspec
  }
  d
}

並在問題中定義的變量上運行該函數會產生預期的結果:

> apply_col_spec(sw, cspec)
# A tibble: 4 × 5
  name           height hair_color skin_color eye_color
  <chr>           <int> <fct>      <fct>      <fct>    
1 Luke Skywalker    172 blond      fair       blue     
2 Darth Vader       202 none       white      yellow   
3 Leia Organa       150 brown      light      brown    
4 Yoda               66 white      green      brown    

實現這一點的另一種方法,回想起來似乎更好,是使用readr::type_convert() function。 這個 function 與下面的apply_col_spec() function 具有幾乎完全相同的行為,並與readr一起收縮包裝。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM