简体   繁体   English

如何从 R 中的日期列中提取月份?

[英]How to extract month from a date column in R?

I need to create a dataframe column in R that contains month and year for the observation (in this case, publications from the Web of Science database).我需要在 R 中创建一个 dataframe 列,其中包含观察的月份和年份(在这种情况下,来自科学数据库 Web 的出版物)。 I have tried concatenating the current columns "PD" (publication date) and "PY" (publication year).我尝试连接当前列“PD”(出版日期)和“PY”(出版年份)。 However, the column "PD" uses two formats: abbreviated month alone (eg "MAR") and day-abbreviated month (eg "12-Mar").但是,“PD”列使用两种格式:单独的缩写月份(例如“MAR”)和日期缩写月份(例如“12-Mar”)。 I would like the new "date" column to have a uniform format of "abbreviated-month year" (eg "MAR 2020") so that I can statistically analyze it.我希望新的“日期”列具有统一的“缩写月份年份”格式(例如“2020 年 3 月”),以便我可以对其进行统计分析。

How do I extract the month from the "PD" column (ie "MAR" instead of "12-Mar")?如何从“PD”列中提取月份(即“MAR”而不是“12-Mar”)?

We can use sub我们可以使用sub

 toupper(sub("[0-9 -]+", "", df1$PD))
 #[1] "MAR"  "MAR"  "JUNE" "JUNE"

data数据

df1 <- data.frame(PD = c("MAR", "12-Mar", "JUNE", "24-June"), 
       stringsAsFactors= FALSE)

We can extract only alphabets from PD column.我们只能从PD列中提取字母。

toupper(stringr::str_extract(df$PD, '[A-Za-z]+'))
#[1] "MAR"   "MAY"   "APRIL" "JUNE" 

data数据

df <- data.frame(PD = c("MAR", "13-May", "April", "24-June"), 
                 stringsAsFactors= FALSE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM