如何使用 Perl 在正則表達式中正確匹配 TAB？

Question

我有以下test.txt文件

# Example:
# Comments
# Comments

MC

Attribute 1
Attribute 2
Attribute 3


---

MC

Attribute 1
Attribute 2
Attribute 3

---

MC 

Attribute 1
Attribute 2
Attribute 3

我要表演

刪除評論
刪除空行
將\n替換為\t
去掉把\t---\t變成\n

這樣我就實現了以下

MC <TAB> Attribute 1 <TAB> Attribute 2 <TAB> Attribute 3
MC <TAB> Attribute 1 <TAB> Attribute 2 <TAB> Attribute 3
MC <TAB> Attribute 1 <TAB> Attribute 2 <TAB> Attribute 3

由於某種原因，以下不起作用

perl -pe "s/#.*//g; s/^\n//g; s/\n/\t/g; s/\t---\t/\n/g" test.txt

生產output

MC  Description --- MC  Description --- MC  Description

如果我只是運行以下命令

perl -pe "s/#.*//g; s/^\n//g; s/\n/\t/g;" test.txt

我也有

MC  Description --- MC  Description --- MC  Description

似乎s/#.*//g; s/^\n//g; s/\n/\t/g; s/\t---\t/\n/g s/#.*//g; s/^\n//g; s/\n/\t/g; s/\t---\t/\n/g s/#.*//g; s/^\n//g; s/\n/\t/g; s/\t---\t/\n/g不工作。

Answer 1

您說您要刪除\t---\t ，但這似乎不在輸入中。

如果要匹配只有空格和---的行，請使用^\s*---\s*$ 。

perl -pe "s/#.*//g; s/^\n//g; s/\n/\t/g; s/^\s*---\s*$/\n/g" test.txt

請注意，如果沒有最終--- ，這將使您在文件末尾沒有換行符。

如果要處理整行，請使用-0 。 -0控制 Perl 用來決定什么是行的“輸入記錄分隔符”。 -0單獨將其設置為 null（假設沒有 null 字節）將讀取整個文件。

然后你的原件幾乎可以工作。 您需要添加/m以便^匹配行的開頭以及字符串的開頭。

perl -0pe "s/#.*//g; s/^\n//mg; s/\n/\t/g; s/\t---\t/\n/g" test.txt

但我們可以讓這更簡單！ 輸入記錄分隔符分隔記錄。 您的記錄分隔符是---\n ，因此我們可以將其設置為該分隔符並單獨處理每條記錄。

要將輸入記錄分隔符設置為字符串，我們使用$/ 。 為了在單行中執行此操作，我們將其放在BEGIN塊中，因此它僅在程序啟動時運行一次，而不是針對每一行。

最后，我們使用-l來自動去除記錄分隔符---\n ，並在每行的末尾添加一個換行符。 也就是說，它在開頭添加一個chomp ，在末尾添加一個$_.= "\n" 。

# Set the input record separator to ---\n.
# -l turns on autochomp to strip the separator.
# -l also adds a newline to each line.
# Strip comments.
# Strip blank lines (again, using /m so ^ works)
# Turn tabs into newlines.
perl -lpe 'BEGIN { $/ = "---\n" } s/#.*//mg; s/^\s*\n//mg; s/\n/\t/g;' test.txt

作為獎勵，我們在每一行都有換行符，包括最后一行。

最后，我們可以使用 arrays 來處理這個問題。 與以前相同的基本思想，但我們將它們拆分回行並使用grep過濾掉不需要的行。 然后我們只剩下一個簡單的連接。

我將把這個寫出來，這樣更容易閱讀。

#!/usr/bin/env perl -lp

BEGIN { $/ = "---\n" }

# Split into lines.
# Strip comment lines.
# Strip blank lines.
# Join back together with tabs.
$_ = join "\t",
  grep /\S/,
  grep !/^#.*/,
  split /\n/, $_;

我發現這種方法更易於維護； 處理一組行比處理多行字符串中的所有內容更容易。

如何使用 Perl 在正則表達式中正確匹配 TAB？

問題描述

1 個解決方案

解決方案1
4 已采納 2020-06-23 18:16:43

如何使用 Perl 在正則表達式中正確匹配 TAB？

問題描述

1 個解決方案

解決方案1 4 已采納 2020-06-23 18:16:43

解決方案1
4 已采納 2020-06-23 18:16:43