简体   繁体   English

r 中的模式和 position 的模式匹配

[英]Pattern match by pattern and position in r

I have a bunch of files that have unique IDs that correspond to state, latitude, longitude, year, month, and day.我有一堆文件,它们的唯一 ID 对应于 state、纬度、经度、年、月和日。 All filenames/ID's have the same length.所有文件名/ID 的长度相同。 eg fl294670818202019例如 fl294670818202019

I'd like to use pattern matching to subset a list by the year.我想使用模式匹配按年份对列表进行子集化。 The following code does not work as desired due to the fact that the 'year' pattern may be matched by various combinations of longitude and year and/or year and month (as shown in the example above).由于“年份”模式可能与经度和年份和/或年份和月份的各种组合匹配(如上面的示例所示),因此以下代码无法按预期工作。

Example:例子:

# unique ID with year 2020
x <- "fl301330850282020"
# unique ID with year 2019 (but also matches the pattern 2020)
y <- "fl294670818202019"

# create a list 
(z <- list(x,y))

# subset list by pattern 
z %>% 
  str_subset(pattern = "2020")

Is it possible to skip the first 13 characters, and then perform the search?是否可以跳过前 13 个字符,然后执行搜索?

I don't want to subset/remove the first 13 characters from the filename because I need the information contained within the filename.我不想从文件名中子集/删除前 13 个字符,因为我需要文件名中包含的信息。

Is the year always the last four?年份总是最后四年吗? If so, how about:如果是这样,如何:

z[str_ends(z,"2020")]

or:或者:

z[grepl("2020$",z)]

If you want to be explicit about skipping the first 13 characters, you can do this:如果您想明确跳过前 13 个字符,可以这样做:

z[grepl("2020", str_sub(z,14))]    

or或者

z[str_detect(str_sub(z,14),"2020")]

or even甚至

grepl("(?<=.{13})2020", z, perl=T)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM