簡體   English   中英

如何將帶有德語字符串日期的列轉換為 r 中的日期變量?

[英]how to convert a column with string date in German into a date variable in r?

我有一個帶有德國日期的變量,並希望將其轉換為日期變量,以便稍后過濾掉每年的季度。

像這樣。:

      Date         newDate quarter
1  21. Mrz 10       <NA>    <NA>
2  21. Jan 10 2010-01-21 2010 Q1
3  30. Mrz 10       <NA>    <NA>
4  21. Mrz 10       <NA>    <NA>
5  21. Jan 10 2010-01-21 2010 Q1

不幸的是,R 不識別德國月份的縮寫,例如“Mrz”代表三月。 我已經嘗試將語言更改為德語,但沒有幫助。

Sys.setlocale(category = "LC_TIME", locale="de_DE.UTF-8")
[1] "de_DE.UTF-8"
alldata_LOR_BZ$newErstesAngebot = as.Date(as.character(alldata_LOR_BZ$newErstesAngebot), "%d. %b %y")
library(zoo)
Dateproblem$quarter <- as.yearqtr(Dateproblem$newDate)

視訊信息

R version 3.5.1 (2018-07-02)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.4

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks    /vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/de_DE.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] reprex_0.2.0   tidyr_0.8.1    zoo_1.8-3      foreign_0.8-71 car_3.0-0     
 [6] carData_3.0-1  gplots_3.0.1   plm_1.6-6      Formula_1.2-3  dplyr_0.7.6   
[11] ggplot2_3.0.0 

loaded via a namespace (and not attached):
 [1] Rcpp_0.12.18       bdsmatrix_1.3-3    lattice_0.20-35    gtools_3.8.1      
 [5] assertthat_0.2.0   rprojroot_1.3-2    digest_0.6.15      lmtest_0.9-36     
 [9] R6_2.2.2           cellranger_1.1.0   plyr_1.8.4         backports_1.1.2   
[13] evaluate_0.11      pillar_1.3.0       miscTools_0.6-22   rlang_0.2.1       
[17] lazyeval_0.2.1     curl_3.2           readxl_1.1.0       data.table_1.11.4 
[21] gdata_2.18.0       whisker_0.3-2      callr_2.0.4        rmarkdown_1.10    
[25] stringr_1.3.1      munsell_0.5.0      compiler_3.5.1     pkgconfig_2.0.1   
[29] clipr_0.4.1        maxLik_1.3-4       htmltools_0.3.6    tidyselect_0.2.4  
[33] tibble_1.4.2       rio_0.5.10         crayon_1.3.4       withr_2.1.2       
[37] MASS_7.3-50        bitops_1.0-6       grid_3.5.1         nlme_3.1-137      
[41] gtable_0.2.0       magrittr_1.5       scales_0.5.0           KernSmooth_2.23-15
[45] zip_1.0.0          stringi_1.2.4      bindrcpp_0.2.2     sandwich_2.4-0    
[49] openxlsx_4.1.0     tools_3.5.1        forcats_0.3.0      glue_1.3.0        
[53] purrr_0.2.5        hms_0.4.2          processx_3.1.0     abind_1.4-5       
[57] yaml_2.2.0         colorspace_1.3-2   caTools_1.17.1.1   knitr_1.20        
[61] bindr_0.1.1        haven_1.1.2       

剛才我注意到語言更改似乎不起作用...... Sys.setlocale(category = "LC_TIME", locale="de_DE.UTF-8") 不是正確的命令嗎?

根據官方的DIN 1355德語 3 字母月份縮寫,編寫自己的矢量化解析器並不難。

# 3-letter months abbreviations DIN 1355
months <- c(
    "Jan", "Feb", "Mrz", "Apr",
    "Mai", "Jun", "Jul", "Aug",
    "Sep", "Okt", "Nov", "Dez")

# Custom function to parse German dates DD. MMM YY
parse.de.date <- function(x) {
    as.Date(
        sapply(x, function(t) {
            dmy <- unlist(strsplit(gsub("\\.", "", t), "\\s"))
            paste(dmy[1], match(dmy[2], months), dmy[3], sep = "-")
        }),
        format = "%d-%m-%y")
}

library(dplyr)
df %>%
    mutate(Date = parse.de.date(Date))
#    Date    newDate quarter
#1 2010-03-21       <NA>    <NA>
#2 2010-01-21 2010-01-21 2010 Q1
#3 2010-03-30       <NA>    <NA>
#4 2010-03-21       <NA>    <NA>
#5 2010-01-21 2010-01-21 2010 Q1

樣本數據

df <- read.table(text =
    "      Date         newDate quarter
1  '21. Mrz 10'       <NA>    <NA>
2  '21. Jan 10' 2010-01-21 '2010 Q1'
3  '30. Mrz 10'       <NA>    <NA>
4  '21. Mrz 10'       <NA>    <NA>
5  '21. Jan 10' 2010-01-21 '2010 Q1'", header = T)

不需要自己寫任何東西, readr包會為你做一切,你只需要定義縮寫的月份名稱:

# example data:
dates <- c("21. Mrz 10",
"21. Jan 10",
"30. Mrz 10",
"21. Mrz 10",
"21. Jan 10")

# load library
library(readr)

# get the default german locale
my_format <- date_names_lang("de")

# change the abbrevated month names
my_format$mon_ab <- c("Jan", "Feb", "Mrz", "Apr", "Mai", "Jun", "Jul", "Aug", "Sep", "Okt", "Nov", "Dez")

# parse using your format
parse_date(dates, format="%d. %b %y", locale=locale(date_names = my_format))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM