简体   繁体   English

正则表达式:保持同一模式在同一行中多次发现,并通过在前面附加单个模式来替换行

[英]Regex: keep same pattern found multiple times in same line and replace line by appending single pattern in front

Is it possible with notepad++ (or maybe from linux bash shell) to create multiple lines from a pattern found , as many times as the pattern is found and also append single found pattern in the newly created line? 是否可以使用notepad ++(或从linux bash shell中)从找到的模式创建多行,与找到该模式的次数相同,并在新创建的行中追加单个找到的模式?

The multi pattern is val=[0-9]+ The single pattern is id=[a-zA-Z0-9]+ 多重模式为val=[0-9]+单一模式为id=[a-zA-Z0-9]+

Example: 例:

Input lines: 输入线:

id=af2477,val=333,val=777
id=af3456,val=222,val=444,val=678
id=af3327,val=3234,val=123,val=701

Output lines: 输出线:

id=af2477,val=333
id=af2477,val=777
id=af3456,val=222
id=af3456,val=444
id=af3456,val=678
id=af3327,val=3234
id=af3327,val=123
id=af3327,val=701

I have tried with 2 subgroups but it wont work. 我尝试了2个子组,但无法正常工作。 It will only replace the second group once: 它将仅替换第二组:

find what: (id=[a-zA-Z0-9]+,)(val=[0-9]+,)* replace: \\n\\1,\\2 查找内容: (id=[a-zA-Z0-9]+,)(val=[0-9]+,)*替换: \\n\\1,\\2

UPDATE: Both answers from Toto and Wiktor Stribiżew seem to do the job. 更新: TotoWiktor Stribiżew答案似乎都可以胜任。 Haven't tested them yet. 尚未测试过。 I would still like to see how this can work with the use of Notepad++ (even if multiple steps are needed) 我仍然想看看如何使用Notepad ++进行工作(即使需要多个步骤)

Since you also consider using Linux tools for this, an awk solution looks much more viable: 由于您还考虑为此使用Linux工具,因此awk解决方案看起来更可行:

awk 'BEGIN{FS=OFS=","} /^id=[a-zA-Z0-9]+(,val=[0-9]+)*$/{
    for(i=2; i<=NF; i++) {
        print $1,$i
    }; next;
}{print $0}' file > outfile

See the online demo . 请参阅在线演示

Here, any line that matches ^id=[a-zA-Z0-9]+(,val=[0-9]+)*$ (ie matches the format of the lines you need to expand) is split the way you need with for(i=2; i<=NF; i++) {print $1,$i}; next; 在这里,任何与^id=[a-zA-Z0-9]+(,val=[0-9]+)*$匹配的行(即与您需要扩展的行的格式匹配)都按照您的方式拆分需要with for(i=2; i<=NF; i++) {print $1,$i}; next; for(i=2; i<=NF; i++) {print $1,$i}; next; . Else, the line is written as is ( print $0 ). 否则,该行按原样写入( print $0 )。

The BEGIN{FS=OFS=","} part sets the input and output field separator to a comma. BEGIN{FS=OFS=","}部分将输入和输出字段分隔符设置为逗号。

This perl one-liner does the job (output on STDOUT): 这个perl单线工作(在STDOUT上输出):

perl -anE '($id,$vals)=/(id=\w+),(.+)$/;say "$id,$_" for split/,/,$vals' file
id=af2477,val=333
id=af2477,val=777
id=af3456,val=222
id=af3456,val=444
id=af3456,val=678
id=af3327,val=3234
id=af3327,val=123
id=af3327,val=701

Explanation: 说明:

($id,$vals)=/(id=\w+),(.+)$/;       # explode id and values for each line in input file
say "$id,$_" for split/,/,$vals     # print id and each value

You can redirect the output to another file: 您可以将输出重定向到另一个文件:

perl -anE '($id,$vals)=/(id=\w+),(.+)$/;say "$id,$_" for split/,/,$vals' file > outputfile

Or do the change in-place: 或就地进行更改:

perl -i -anE '($id,$vals)=/(id=\w+),(.+)$/;say "$id,$_" for split/,/,$vals' file

It is possible, yet very complex to do that with one regular expression for which you are gonna have to use (?R) and conditional statements. 使用一个正则表达式来执行此操作是可能的,但非常复杂,您将不得不使用(?R)和条件语句。


With multiple steps would be pretty simple. 通过多个步骤将非常简单。 You can for instance do find and replace using the max number of val that you might have in the longest lines, such as, imagine 4 would be the largest number of val , then we'll have four of (,val=[^\\r\\n,]*) in our initial expression: 例如,您可以使用最长的val中的最大val数来查找和替换,例如,假设4是val的最大数,那么我们将有四个(,val=[^\\r\\n,]*)在我们的初始表达式中:

^(id=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)$

and replace that with four lines, 并用四行替换

$1$2\n$1$3\n$1$4\n$1$5
 ---- ---- ---- ----

Demo for Step 1 第一步演示

For any additional step, we can simply remove one val and one line from the end of initial expression and replacement. 对于任何其他步骤,我们只需从初始表达式和替换的末尾删除一个val和一行。 For example, our expression would look like 例如,我们的表达式看起来像

^(id=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)$

in the second step, for which we'd replace it with: 在第二步中,我们将其替换为:

$1$2\n$1$3\n$1$4
----  ----  ----

Demo for Step 2 第2步演示

In the third and final step, our expression has two vals, 在第三步(也是最后一步)中,我们的表达式具有两个值,

^(id=[^\r\n,]*)(,val=[^\r\n,]*)(,val=[^\r\n,]*)$

and our replacement will have two lines: 我们的替代品将有两行:

$1$2\n$1$3
----  ----

Demo for Step 3 第3步演示


For the case exampled in the question, only two steps are required and the second and third expressions would likely work just fine. 对于问题中示例的情况,仅需要两个步骤,第二个和第三个表达式可能就可以正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM