[英]I have file with with 3 lines as follows. Using linux how can i get the split variables of a line and append it to the same line
Using linux, how can I get the below desired output for the given Input.使用 linux,如何为给定的输入获得以下所需的输出。 Input file:
输入文件:
Line1: StringA1, stringB1| stringC1, stringD1, stringE1
Line2: StringA2, stringB2| stringC2, stringD2
Line3: StringA3, stringB3| stringC3, stringD3, stringE3, stringF3
My output should be:我的输出应该是:
StringA1, stringB1| stringC1
StringA1, stringB1| stringD1
StringA1, stringB1| stringE1
StringA2, stringB2| stringC2
StringA2, stringB2| stringD2
StringA3, stringB3| stringC3
StringA3, stringB3| stringD3
StringA3, stringB3| stringE3
StringA3, stringB3| stringF3
Assumptions:假设:
Line#:
(otherwise we just need to modify the proposed script)Line#:
否则我们只需要修改建议的脚本) Sample data:样本数据:
$ cat strings.dat
StringA1, stringB1| stringC1, stringD1, stringE1
StringA2, stringB2| stringC2, stringD2
StringA3, stringB3| stringC3, stringD3, stringE3, stringF3
One awk
solution:一种
awk
解决方案:
awk -F"[,|]" '
{ for ( i=3;i<=NF;i++ )
{ printf "%s,%s|%s\n", $1, $2, $i }
}' strings.dat
Where:在哪里:
-F"[,|]"
- use comma and pipe ( ,|
) as input delimiters -F"[,|]"
- 使用逗号和管道 ( ,|
) 作为输入分隔符for ( i=3;i<=NF;i++ )
- for fields 3 to end of line (NF == number of fields == last field) for ( i=3;i<=NF;i++ )
- 用于字段 3 到行尾(NF == 字段数 == 最后一个字段){ printf ... }
- print 1st, 2nd and ith
fields { printf ... }
- 打印第一个、第二个和ith
字段Results of running the above:以上运行结果:
StringA1, stringB1| stringC1
StringA1, stringB1| stringD1
StringA1, stringB1| stringE1
StringA2, stringB2| stringC2
StringA2, stringB2| stringD2
StringA3, stringB3| stringC3
StringA3, stringB3| stringD3
StringA3, stringB3| stringE3
StringA3, stringB3| stringF3
When you make a solution in sed
, it will become hard to read and hard to maintain:当您在
sed
制定解决方案时,它将变得难以阅读且难以维护:
sed -E 's/,/\v/; :a; s/(.*\|)(.*),(.*)$/\1\2\r\1\3/;ta; s/\v/,/g;s/\r/\n/g' inputfile
Explanation:解释:
s/,/\\v/
Most ,
should be replaced, but not the one in the replacement string. s/,/\\v/
Most ,
应该被替换,但不是替换字符串中的那个。
:a
Repeat next command (until ta
) while a replacement is found. :a
在找到替换时重复下一个命令(直到ta
)。
(.*\\|)(.*),(.*)$
Match 3 substrings: The starter, the middle part util the last ,
and the end part. (.*\\|)(.*),(.*)$
第3子:起动器,中间部分UTIL最后,
和端部。
\\r
Use the windows CR as a marker where we want a newline when finished. \\r
使用 windows CR 作为标记,完成后我们需要换行符。
\\1
Replace with the first remembered string (in example StringA1, stringB1
). \\1
替换为第一个记住的字符串(例如StringA1, stringB1
)。
/\\1\\2\\r\\1\\3/
Replace the last ,
with a newline marker and the starter. /\\1\\2\\r\\1\\3/
用换行符和起始符替换最后一个,
。
ta;
Repeat until all replacements are done.重复直到所有替换完成。
s/\\v/,/g;
Restore the ,
characters.恢复
,
字符。
s/\\r/\\n/g'
Replace new line marker with a real newline. s/\\r/\\n/g'
用真正的换行符替换新行标记。
Other ways are using awk
and a while loop
.其他方法是使用
awk
和while loop
。 For a large file I recommand awk
, perhaps you want to try this yourself before someone posts an answer.对于我推荐的大文件
awk
,也许您想在有人发布答案之前自己尝试一下。
In order to produce your desired output, if you are splitting on [,|]
, you must further remove the beginning of field1 before outputting the results.为了产生您想要的输出,如果您在
[,|]
上进行拆分,则必须在输出结果之前进一步删除field1的开头。 There are two ways I see to do that.我认为有两种方法可以做到这一点。 The first way simply splits field1 into an array with the fieldsep of
' '
, the second is with a combination of substr, match & length
.第一种方法简单地将field1拆分为一个数组,其中fieldsep为
' '
,第二种方法是使用substr, match & length
的组合。 The first is the simple way of doing it using the split()
command, eg第一个是使用
split()
命令的简单方法,例如
awk -F '[,|]' '{
split ($1, arr, / /)
for (i=3; i<=NF; i++) {
printf "%s,%s|%s\n", arr[2], $2, $i
}
}' file
For the second, you can remove split()
above and replace arr[2]
with:对于第二个,您可以删除上面的
split()
并将arr[2]
替换为:
substr($1,match($1,/ /)+1,length($1)-match($1,/ /))
If your data file does not include "Line[0-9]: "
as the prefix for each line, you can include the following as your printf
to handle either case:如果您的数据文件不包括
"Line[0-9]: "
作为每一行的前缀,您可以将以下内容作为您的printf
来处理任何一种情况:
printf "%s,%s|%s\n", arr[2]=="" ? arr[1] : arr[2], $2, $i
The results are the same either way, but using split()
would be the recommended way.两种方式的结果都是一样的,但使用
split()
将是推荐的方式。
Example Use/Output示例使用/输出
Using the proposed awk
solution with your data file (named file
adjust as needed), you can just select-copy/middle-mouse-paste in an xterm with the file
in the current directory to obtain the results, eg将建议的
awk
解决方案与您的数据文件(根据需要调整命名file
)一起使用,您只需在当前目录中的file
的 xterm 中 select-copy/middle-mouse-paste 即可获得结果,例如
$ awk -F '[,|]' '{
> split ($1, arr, / /)
> for (i=3; i<=NF; i++) {
> printf "%s,%s|%s\n", arr[2], $2, $i
> }
> }' file
StringA1, stringB1| stringC1
StringA1, stringB1| stringD1
StringA1, stringB1| stringE1
StringA2, stringB2| stringC2
StringA2, stringB2| stringD2
StringA3, stringB3| stringC3
StringA3, stringB3| stringD3
StringA3, stringB3| stringE3
StringA3, stringB3| stringF3
Look things over and let me know if you have further questions.仔细检查一下,如果您还有其他问题,请告诉我。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.