[英]Extracting partially repeating patterns in lines of text file
Given a text file of the form: 给定以下形式的文本文件:
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...
where each line can differ from each other, and can have any number of string:number pairs. 每行可以彼此不同,并且可以具有任意数量的string:number对。 "firstword" is always the same. “第一字”始终是相同的。 The contents of the strings and numbers can change, eg numbers could be "12345", string could be "abc" (without the quotes). 字符串和数字的内容可以更改,例如数字可以是“ 12345”,字符串可以是“ abc”(不带引号)。
In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. 此外,同一行可以有多次相同的字符串(多少行是未知的,每行不同),每条都有不同的关联编号。 For example: 例如:
firstword123,abc:123,cde:234,abc:345,def:456
If one now wants to only extract the first word and number (in this case firstword123), as well as all string:number pairs in a line for a specific string, how can one do this? 如果现在只想提取第一个单词和数字(在本例中为firstword123)以及特定字符串的一行中的所有string:number对,那么该怎么做? In the above example, if one choses for the string the value "abc", then the extracted line should look like: 在上面的示例中,如果为字符串选择值“ abc”,则提取的行应如下所示:
firstword123,abc:123,abc:345
I am looking for a solution which works with Bash (and possibly other commands). 我正在寻找一种与Bash(以及其他命令)一起使用的解决方案。
you can use perl for this 您可以为此使用perl
#!/usr/bin/perl
my $first='firstword123';
my $str='abc';
while (<DATA>) {
next if not /^$first/;
print "$first";
print ",$_" for ($_ =~ /$str:\d+/g);
}
__DATA__
firstword123,abc:123,cde:234,abc:345,def:456
out: 出:
firstword123,abc:123,abc:345
Not a one-liner, but an all-bash solution. 不是单线的,而是全力以赴的解决方案。 If you need faster code we can write something in awk
or perl
... 如果您需要更快的代码,我们可以用awk
或perl
编写一些东西。
$: cat keyscan
#! /bin/env bash
key="$1"
while read line
do start=${line//,*/}
line=${line#$start}
line=${line#,}
while [[ -n "$line" ]]
do case "$line" in
$key:[0-9]*) lead="${line//,*/}"
start="$start,$lead"
line="${line#$lead}"
line="${line#,}" ;;
*,*) line="${line#*,}" ;;
*) line='' ;;
esac
done
printf "$start\n"
done
$: cat data
firstword123,abc:123,cde:234,abc:345,def:456
$: ./keyscan abc < data
firstword123,abc:123,abc:345
$: ./keyscan def < data
firstword123,def:456
$: ./keyscan cde < data
firstword123,cde:234
It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave. 它不会很快,因为它在输入的每一行上都有一个处理循环,但是可以在您提供的数据样本行上工作。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.