简体   繁体   English

我怎么知道在bash中首先使用awk发生了哪个分隔符?

[英]How do I know which delimiter has occurred first using awk in bash?

How to know which delimiter has occurred first using single awk line. 如何使用单个awk行首先知道哪个分隔符已经发生。

Assume I have a file having contents: 假设我有一个包含内容的文件:

AB BC DE
BC DE AB
DE BC AB

And I want to know which of the three DE , AB , BC has occurred first in each line. 而且我想知道在每一行中, DEABBC哪一个首先发生了。

I thought that I could use delimiter BC then take its first field and then BC and then take the first field of AB . 我以为我可以使用分隔符BC然后取第一个字段然后BC ,然后取AB的第一个字段。

This can be done by: 这可以通过以下方式完成:

$ awk -F'AB' '{print $1}' <file>   \
  | awk -F'BC' '{print $1}' <file> \
  | awk -F'DE' '{print $1}' <file>

However, is there any other way in which I can dynamically change delimiter inside awk line and get the above thing done using awk only once? 但是,有没有其他方法可以动态更改awk行中的分隔符,并且只使用awk一次完成上述操作?

Edit: Corrected the mistakes done earlier. 编辑:纠正了之前完成的错误。

If this isn't what you want: 如果这不是你想要的:

awk 'match($0,/AB|BC|DE/){print substr($0,RSTART,RLENGTH)}' file

then edit your question to clarify your requirements and provide concise, testable sample input and expected output. 然后编辑您的问题以阐明您的要求,并提供简明,可测试的样本输入和预期输出。

First of all, if your file only contains the combinations AB , BC or DE in combination with newline , then the answer is straightforward : 首先,如果您的文件仅包含ABBCDE组合以及newline ,那么答案很简单:

awk '{print $1}' file

This is conform your example. 这符合你的榜样。 Nonetheless, I do not believe this is the case. 尽管如此,我不相信这种情况。 It stands to reason that the solution of Ed Morton is clearly the way to forward! 按理说, Ed Morton的解决方案显然是前进的方式! It is clean, simple and on top of that a one-liner. 它干净,简单,最重要的是单线。

However, from a pure educational perspective, a different awk approach is presented here. 然而,从纯粹的教育角度来看,这里提出了一种不同的awk方法。

If you want to find the "first" separator in a line, you could attack the problem from a different angle. 如果要在一行中找到“第一个”分隔符,可以从不同的角度解决问题。 Instead of interpreting the line as a set of columns, you could understand it as a set of records. 您可以将其理解为一组记录,而不是将该行解释为一组列。 This brings the question to "which record separator has been found first : 这带来的问题是“首先找到了哪个记录分隔符:

RT (gawk extention) The input text that matched the text denoted by RS , the record separator. RT (gawk extention)RS表示的文本匹配的输入文本,即记录分隔符。 It is set every time a record is read. 每次读取记录时都会设置它。

For a single line of characters, you could do something like this : 对于单行字符,您可以执行以下操作:

$ echo "AB BC DE BC DE AB DE BC AB" \
   | awk 'BEGIN{RS="DE|AB|BC"}{print RT;exit }' 
AB

Now it is possible to play with the idea a bit more. 现在可以更多地使用这个想法了。 Constantly toggle the RS between a newline and the requested set. 在换行符和请求的集合之间不断切换RS This is just to show how flexible awk is. 这只是为了展示awk灵活性。

$ awk 'BEGIN{RSSET="DE|AB|BC";RS=RSSET}
       (RS=="\n"){RS=RSSET;next}
       {print RT; RS="\n"; next}' file

If file is 如果是文件

AB BC DE BC DE AB DE BC AB
BC DE AB DE BC AB
DE AB DE BC AB

it outputs 它输出

AB
BC
DE

A sed solution, as it was tagged. 一个sed解决方案,因为它被标记了。 The greedy nature of sed made this a tad more confusing, but I think the following works. sed的贪婪本性让这更令人困惑,但我认为以下是有效的。

#!/usr/bin/sed -rnf

# This presumes you only want to print matching rows.
/(AB|CD|EF)/ {
    # add a line number
    =;
    # find first match, then remove rest of line
    s/(AB|CD|EF).*$/\1/;
    # this only leaves one possible match, so the greedy match all 
    # at the start doesnt match what we want.
    s/^.*(AB|CD|EF)/\1/; 
    # so print.
    p 
}

And for an example, I've changed the 'codes' to check it was the first being matched: 举个例子,我已经改变了'代码'来检查它是第一个被匹配的代码:

~$> printf "%b\n" "$letters"
ABa BBa ABb BBb ABc BBc
BBc ABc BBb ABb BBa ABa
ABb ABc BBa BBc
not right

~$> echo "$letters" | sed -rn '/(AB.|BB.)/ {=; s/(AB.|BB.).*$/\1/; s/^.*(AB.|BB.)/ \1/; p }'
1
 ABa
2
 BBc
3
 ABb

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM