如何使用 Perl 在正则表达式中正确匹配 TAB？

Question

我有以下test.txt文件

# Example:
# Comments
# Comments

MC

Attribute 1
Attribute 2
Attribute 3


---

MC

Attribute 1
Attribute 2
Attribute 3

---

MC 

Attribute 1
Attribute 2
Attribute 3

我要表演

删除评论
删除空行
将\n替换为\t
去掉把\t---\t变成\n

这样我就实现了以下

MC <TAB> Attribute 1 <TAB> Attribute 2 <TAB> Attribute 3
MC <TAB> Attribute 1 <TAB> Attribute 2 <TAB> Attribute 3
MC <TAB> Attribute 1 <TAB> Attribute 2 <TAB> Attribute 3

由于某种原因，以下不起作用

perl -pe "s/#.*//g; s/^\n//g; s/\n/\t/g; s/\t---\t/\n/g" test.txt

生产output

MC  Description --- MC  Description --- MC  Description

如果我只是运行以下命令

perl -pe "s/#.*//g; s/^\n//g; s/\n/\t/g;" test.txt

我也有

MC  Description --- MC  Description --- MC  Description

似乎s/#.*//g; s/^\n//g; s/\n/\t/g; s/\t---\t/\n/g s/#.*//g; s/^\n//g; s/\n/\t/g; s/\t---\t/\n/g s/#.*//g; s/^\n//g; s/\n/\t/g; s/\t---\t/\n/g不工作。

Answer 1

您说您要删除\t---\t ，但这似乎不在输入中。

如果要匹配只有空格和---的行，请使用^\s*---\s*$ 。

perl -pe "s/#.*//g; s/^\n//g; s/\n/\t/g; s/^\s*---\s*$/\n/g" test.txt

请注意，如果没有最终--- ，这将使您在文件末尾没有换行符。

如果要处理整行，请使用-0 。 -0控制 Perl 用来决定什么是行的“输入记录分隔符”。 -0单独将其设置为 null（假设没有 null 字节）将读取整个文件。

然后你的原件几乎可以工作。 您需要添加/m以便^匹配行的开头以及字符串的开头。

perl -0pe "s/#.*//g; s/^\n//mg; s/\n/\t/g; s/\t---\t/\n/g" test.txt

但我们可以让这更简单！ 输入记录分隔符分隔记录。 您的记录分隔符是---\n ，因此我们可以将其设置为该分隔符并单独处理每条记录。

要将输入记录分隔符设置为字符串，我们使用$/ 。 为了在单行中执行此操作，我们将其放在BEGIN块中，因此它仅在程序启动时运行一次，而不是针对每一行。

最后，我们使用-l来自动去除记录分隔符---\n ，并在每行的末尾添加一个换行符。 也就是说，它在开头添加一个chomp ，在末尾添加一个$_.= "\n" 。

# Set the input record separator to ---\n.
# -l turns on autochomp to strip the separator.
# -l also adds a newline to each line.
# Strip comments.
# Strip blank lines (again, using /m so ^ works)
# Turn tabs into newlines.
perl -lpe 'BEGIN { $/ = "---\n" } s/#.*//mg; s/^\s*\n//mg; s/\n/\t/g;' test.txt

作为奖励，我们在每一行都有换行符，包括最后一行。

最后，我们可以使用 arrays 来处理这个问题。 与以前相同的基本思想，但我们将它们拆分回行并使用grep过滤掉不需要的行。 然后我们只剩下一个简单的连接。

我将把这个写出来，这样更容易阅读。

#!/usr/bin/env perl -lp

BEGIN { $/ = "---\n" }

# Split into lines.
# Strip comment lines.
# Strip blank lines.
# Join back together with tabs.
$_ = join "\t",
  grep /\S/,
  grep !/^#.*/,
  split /\n/, $_;

我发现这种方法更易于维护； 处理一组行比处理多行字符串中的所有内容更容易。

如何使用 Perl 在正则表达式中正确匹配 TAB？

问题描述

1 个解决方案

解决方案1
4 已采纳 2020-06-23 18:16:43

如何使用 Perl 在正则表达式中正确匹配 TAB？

问题描述

1 个解决方案

解决方案1 4 已采纳 2020-06-23 18:16:43

解决方案1
4 已采纳 2020-06-23 18:16:43