提取第 n 个字符和另一个字符之间的字符串

Question

filelist <- c(
  "http://content.caiso.com/green/renewrpt/20171015_DailyRenewablesWatch.txt",
  "http://content.caiso.com/green/renewrpt/20171016_DailyRenewablesWatch.txt",
  "http://content.caiso.com/green/renewrpt/20171017_DailyRenewablesWatch.txt",
  "http://content.caiso.com/green/renewrpt/20171018_DailyRenewablesWatch.txt",
  "http://content.caiso.com/green/renewrpt/20171019_DailyRenewablesWatch.txt",
  "http://content.caiso.com/green/renewrpt/20171020_DailyRenewablesWatch.txt",
  "http://content.caiso.com/green/renewrpt/20171021_DailyRenewablesWatch.txt",
  "http://content.caiso.com/green/renewrpt/20171022_DailyRenewablesWatch.txt"
)

我希望在第 5 次出现/和_之间提取字符串

例如：从"http://content.caiso.com/green/renewrpt/20171015_DailyRenewablesWatch.txt"我想要20171015 。 我试过了

regmatches(filelist, regexpr("/{4}([^_]+)", filelist))

但它返回空。

Answer 1

这应该工作

gsub("(?:.*/){4}([^_]+)_.*", "\\1", filelist)
# [1] "20171015" "20171016" "20171017" "20171018" "20171019" "20171020" "20171021"
# [8] "20171022"

我们还需要匹配捕获中每个斜线前面的东西。

Answer 2

以下是一些使用正则表达式的方法：

sub(".*(\\d{8}).*", "\\1", filelist)

sub(".*/", "", sub("_.*", "", filelist))

sub("_.*", "", basename(filelist))

sapply(strsplit(filelist, "[/_]"), "[", 6)

gsub("\\D", "", filelist)

m <- gregexpr("\\d{8}", filelist)
unlist(regmatches(filelist, m))

strcapture("(\\d{8})", filelist, data.frame(character()))[[1]]

library(gsubfn)
strapplyc(filelist, "\\d{8}", simplify = TRUE)

这些解决方案根本不使用正则表达式：

substring(filelist, 41, 48)

substring(basename(filelist), 1, 8)

read.table(text = filelist, comment.char = "_", sep = "/")[[6]]

as.Date(basename(filelist), "%Y%m%d")  # returns Date class object

更新：添加了更多方法。

Answer 3

substr(x = filelist,
       start = sapply(gregexpr(pattern = "/", filelist), function(x) x[5])+1,
       stop = sapply(gregexpr(pattern = "_", filelist), function(x) x[1])-1)
#[1] "20171015" "20171016" "20171017" "20171018" "20171019" "20171020" "20171021"
#[8] "20171022"

Answer 4

有一个函数可以先去掉 url：

filelist <- basename(filelist)

然后尝试后删除所有“_”使用str_remove从stringr包：

library(stringr)
str_remove(filelist, "_.*")

输出：

[1] "20171015" "20171016" "20171017" "20171018" "20171019" "20171020" "20171021" "20171022"

如果您想将其转换为日期，请检查lubridate包的ymd函数。

提取第 n 个字符和另一个字符之间的字符串

问题描述

4 个解决方案

解决方案1
4 2017-10-27 16:07:55

解决方案2
1 已采纳 2017-10-27 16:43:12

解决方案3
0 2017-10-27 16:27:41

解决方案4
0 2021-02-02 16:11:45

提取第 n 个字符和另一个字符之间的字符串

问题描述

4 个解决方案

解决方案1 4 2017-10-27 16:07:55

解决方案2 1 已采纳 2017-10-27 16:43:12

解决方案3 0 2017-10-27 16:27:41

解决方案4 0 2021-02-02 16:11:45

解决方案1
4 2017-10-27 16:07:55

解决方案2
1 已采纳 2017-10-27 16:43:12

解决方案3
0 2017-10-27 16:27:41

解决方案4
0 2021-02-02 16:11:45