简体   繁体   English

使用正则表达式从R中的嵌套列表中提取模式

[英]Extracting pattern from the nested list in R using regex

I have following sorted list (lst) of time periods and I want to split the periods into specific dates and then extract maximum time period without altering order of the list. 我已按照时间顺序对列表进行排序,我想将时间段拆分为特定的日期,然后提取最大时间段而不改变列表的顺序。

$`1`
[1] "01.12.2015 - 21.12.2015"

$`2`
[1] "22.12.2015 - 05.01.2016"

$`3`
[1] "14.09.2015 - 12.10.2015" "29.09.2015 - 26.10.2015"

Therefore, after adjustment list should look like this: 因此,调整后的清单应如下所示:

$`1`
[1] "01.12.2015" "21.12.2015"

$`2`
[1] "22.12.2015"  "05.01.2016" 

$`3`
[1] "14.09.2015"  "12.10.2015" "29.09.2015"  "26.10.2015"

In order to do so, I began with splitting the list: 为此,我首先拆分列表:

   lst_split <- str_split(lst, pattern = " - ")

which leads to the following: 这导致以下结果:

[[1]]
[1] "01.12.2015" "21.12.2015"

[[2]]
[1] "22.12.2015" "05.01.2016"

[[3]]
[1] "c(\"14.09.2015"             "12.10.2015\", \"29.09.2015" "26.10.2015\")"  

Then, I tried to extract the pattern: 然后,我尝试提取模式:

lapply(lst_split, function(x) str_extract(pattern = c("\\d+\\.\\d+\\.\\d+"),x))

but my output is missing one date (29.09.2015) 但我的输出缺少一个日期(29.09.2015)

[[1]]
[1] "01.12.2015" "21.12.2015"

[[2]]
[1] "22.12.2015" "05.01.2016"

[[3]]
[1] "14.09.2015" "12.10.2015" "26.10.2015"

Does anyone have an idea how I could make it work and maybe propose more efficient solution? 有谁知道我如何使它工作并提出更有效的解决方案? Thank you in advance. 先感谢您。

Thanks to comments of @WiktorStribiżew and @akrun it is enough to use str_extract_all . 感谢@WiktorStribiżew和@akrun的评论,足以使用str_extract_all

In this example: 在此示例中:

> str_extract_all(lst,"\\d+\\.\\d+\\.\\d+")
[[1]]
[1] "01.12.2015" "21.12.2015"

[[2]]
[1] "22.12.2015" "05.01.2016"

[[3]]
[1] "14.09.2015" "12.10.2015" "29.09.2015" "26.10.2015"

1) Use strsplit , flatten each component using unlist , convert the dates to "Date" class and then use range to get the maximum time span. 1)使用strsplit ,采用扁平化的每个组件unlist ,转换日期"Date"类,然后使用range ,以获得最大的时间段。 No packages are used. 不使用任何软件包。

> lapply(lst, function(x) range(as.Date(unlist(strsplit(x, " - ")), "%d.%m.%Y")))
$`1`
[1] "2015-12-01" "2015-12-21"

$`2`
[1] "2015-12-22" "2016-01-05"

$`3`
[1] "2015-09-14" "2015-10-26"

2) This variation using a magrittr pipeline also works: 2)使用magrittr管道的这种变体也可以工作:

library(magrittr)
lapply(lst, function(x) 
   x %>% 
     strsplit(" - ") %>% 
     unlist %>% 
     as.Date("%d.%m.%Y") %>% 
     range
)

Note: The input lst in reproducible form is: 注意:可复制形式的输入lst为:

lst <- structure(list(`1` = "01.12.2015 - 21.12.2015", `2` = "22.12.2015 - 05.01.2016", 
`3` = c("14.09.2015 - 12.10.2015", "29.09.2015 - 26.10.2015"
)), .Names = c("1", "2", "3"))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM