在文本文件行中提取部分重复的模式

Question

Given a text file of the form: 给定以下形式的文本文件：

firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
firstword<number1>,<string1>:<number2>,<string2>:<number3>,<string>:<number4>...
...

where each line can differ from each other, and can have any number of string:number pairs. 每行可以彼此不同，并且可以具有任意数量的string：number对。 "firstword" is always the same. “第一字”始终是相同的。 The contents of the strings and numbers can change, eg numbers could be "12345", string could be "abc" (without the quotes). 字符串和数字的内容可以更改，例如数字可以是“ 12345”，字符串可以是“ abc”（不带引号）。

In addition, a line can have multiple times the same string (how many times is unknown and different per line), each with a different associated number. 此外，同一行可以有多次相同的字符串（多少行是未知的，每行不同），每条都有不同的关联编号。 For example: 例如：

firstword123,abc:123,cde:234,abc:345,def:456

If one now wants to only extract the first word and number (in this case firstword123), as well as all string:number pairs in a line for a specific string, how can one do this? 如果现在只想提取第一个单词和数字（在本例中为firstword123）以及特定字符串的一行中的所有string：number对，那么该怎么做？ In the above example, if one choses for the string the value "abc", then the extracted line should look like: 在上面的示例中，如果为字符串选择值“ abc”，则提取的行应如下所示：

firstword123,abc:123,abc:345

I am looking for a solution which works with Bash (and possibly other commands). 我正在寻找一种与Bash（以及其他命令）一起使用的解决方案。

Answer 1

you can use perl for this 您可以为此使用perl

#!/usr/bin/perl
my $first='firstword123';
my $str='abc';

while (<DATA>) {
    next if not /^$first/;
    print "$first";
    print ",$_" for ($_ =~ /$str:\d+/g);
}

__DATA__
firstword123,abc:123,cde:234,abc:345,def:456

out: 出：

firstword123,abc:123,abc:345

Answer 2

Not a one-liner, but an all-bash solution. 不是单线的，而是全力以赴的解决方案。 If you need faster code we can write something in awk or perl ... 如果您需要更快的代码，我们可以用awk或perl编写一些东西。

$: cat keyscan
#! /bin/env bash

key="$1"
while read line
do start=${line//,*/}
   line=${line#$start}
   line=${line#,}
   while [[ -n "$line" ]]
   do case "$line" in
      $key:[0-9]*) lead="${line//,*/}"
                   start="$start,$lead"
                   line="${line#$lead}"
                   line="${line#,}"  ;;
              *,*) line="${line#*,}" ;;
                *) line='' ;;
      esac
   done
   printf "$start\n"
done

$: cat data
firstword123,abc:123,cde:234,abc:345,def:456

$: ./keyscan abc < data
firstword123,abc:123,abc:345

$: ./keyscan def < data
firstword123,def:456

$: ./keyscan cde < data
firstword123,cde:234

It will not be fast because it has a processing loop on every line of input, but it works on the sample line of data you gave. 它不会很快，因为它在输入的每一行上都有一个处理循环，但是可以在您提供的数据样本行上工作。

在文本文件行中提取部分重复的模式

问题描述

2 个解决方案

解决方案1
2 2018-11-09 21:52:22

解决方案2
1 已采纳 2018-11-09 18:38:27

在文本文件行中提取部分重复的模式

问题描述

2 个解决方案

解决方案1 2 2018-11-09 21:52:22

解决方案2 1 已采纳 2018-11-09 18:38:27

解决方案1
2 2018-11-09 21:52:22

解决方案2
1 已采纳 2018-11-09 18:38:27