使用 R 中的 readLines 解析数据前的最后一行注释

Question

我有一个很长的数据文件：

# Comment line 1
# Comment line 2
# ... many more lines
# values intensities
5.556667e+00    4.008450e+02
5.581000e+00    4.008770e+02
... many more values
# End comments

我想创建一个函数，在这个 object 上将提供：

[1] "values" "intensities"

你会建议我做什么？

Answer 1

readLines可以读取数据，然后grep注释字符。 在下面的 function 中，注释字符默认为问题的"#" 。

fun <- function(file, char = "#"){
  x <- readLines(con = file)
  y <- x[which(diff(grep(char, x)) != 1)]
  unlist(strsplit(y, " "))[-1]
}

fun("filename.txt")
#[1] "values"      "intensities"

如果您有一个长数据文件并且它不适合 memory 并且有awk可用，则以下解决方案可以读取数据而不会出现 memory 问题。

read_awk <- function(file, char = "#"){
  cmd <- "awk"
  pattern <- paste0("/^[^", char, "]/")
  awkcmd <- paste0("'", pattern, " {print NR - 1; exit 0}'")
  args <- c(awkcmd, file)
  out <- system2(command = cmd, args = args, stdout = TRUE)
  as.integer(out)
}
fun_awk <- function(file, char = "#"){
  n <- read_awk(file, char = char)
  x <- scan(file = file, what = character(), sep = "\n", skip = n - 1, nlines = 1)
  unlist(strsplit(x, " "))[-1]
}

fun_awk("filename.txt")
#Read 1 item
#[1] "values"      "intensities"

数据

"filename.txt"是以下文件：

# Comment line 1
# Comment line 2
# ... many more lines
# values intensities
5.556667e+00    4.008450e+02
5.581000e+00    4.008770e+02
# End comments

Answer 2

根据列之间有多少空格，您可能希望在此处使用正则表达式：

data <- as.tibble(read.delim('test.txt', header = F))
data <- data[!startsWith(data$V1,'#'),] %>%
    separate(V1, into = c('values', 'intensities'), sep = '\\s+')
data

# A tibble: 2 x 2
  values       intensities 
  <chr>        <chr>       
1 5.556667e+00 4.008450e+02
2 5.581000e+00 4.008770e+02

使用 R 中的 readLines 解析数据前的最后一行注释

问题描述

2 个解决方案

解决方案1
0 2021-05-09 08:54:36

数据

解决方案2
0 2021-05-09 09:14:03

使用 R 中的 readLines 解析数据前的最后一行注释

问题描述

2 个解决方案

解决方案1 0 2021-05-09 08:54:36

数据

解决方案2 0 2021-05-09 09:14:03

解决方案1
0 2021-05-09 08:54:36

解决方案2
0 2021-05-09 09:14:03