合并与正则表达式不匹配的行

Question

I have a file which contains logs from the web; 我有一个文件，其中包含来自网络的日志； a simplified version of it is as follows: 其简化版本如下：

en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;
Unix
Linux
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;
START
Solaris
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;
Aix
SCO

I have tried a couple of Regex combinations to identify the Accept-Language which is the beginning of every line using the following with awk/sed: 我已经尝试过几种正则表达式组合，以使用awk / sed使用以下命令来识别接受语言，即每行的开头：

/^[a-z]{2}(-[A-Z]{2})?/
/\*|[A-Z]{1,8}(-[A-Z0-9]{1,8})*/i  
/([^-;]*)(?:-([^;]*))?(?:;q=([0-9]\.[0-9]))?/

So far I haven't managed to get either awk/sed to give me the following results: 到目前为止，我还没有设法通过awk / sed来获得以下结果：

en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;    Unix    Linux
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;    STAR    Solaris
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;    Aix    SCO

Any help is appreciated. 任何帮助表示赞赏。 The file contains about 1 Million+ records so I'm happy to go down a route that doesn't use sed/awk and improves performance. 该文件包含大约100万条记录，因此我很乐意走一条不使用sed / awk并提高性能的路线。

Answer 1

Based on the observation, that we can distinguish the two types of lines on the = , you can use this awk script: 基于观察，我们可以区分=上的两种类型的行，可以使用以下awk脚本：

file.awk file.awk

$0 ~ /=/ { printf("%s%s", v,$0)
           v="\n"
           next
         } 
         { printf("\t%s", $0) } 
END      { printf("\n") }

You use it like this: awk -f file.awk yourfile 您可以这样使用它： awk -f file.awk yourfile

v is empty for the first line, later it contains the linebreak v对于第一行为空，之后包含换行符
for lines with an = , we print $0 preceded by v 对于带有= ，我们在v之前打印$0
for the other lines (note the next in the first action), we print $0 without the newline but with a \\t as separation 对于其他行（请注意第next操作中的next行），我们在不使用换行符但以\\t分隔的情况下打印$0

Answer 2

Just for fun, here's a sed solution: 只是为了好玩，这是一个sed解决方案：

sed -ne 1bgo \
   -e '/^[a-z][a-z]-[A-Z][A-Z]/ { x;p;s/.*//;x; };:go' \
   -e 'H;x;s/^\n//;s/\n/  /;x;${ x;p; }' < input

It works like this: 它是这样的：

Read each line but instead of printing it right away, save it by appending it to the hold space ( H ), except remove any newlines that separate it from whatever was already there ( x;s/^\\n//;s/\\n/ /;x ). 阅读每一行，但不要立即打印，而是通过将其添加到保留空间（ H ）进行保存，除了删除将其与已有内容分开的所有换行符（ x;s/^\\n//;s/\\n/ /;x ）。 (If you want tabs in your output, put them here where I've put a couple of spaces.) （如果要在输出中使用制表符，请将其放置在我已放置几个空格的位置。）
If you come across a line that matches your Accept-Language pattern, flush the hold space before you append anything to it. 如果遇到与“接受语言”模式匹配的行，请在向其添加任何内容之前冲洗保留空间。 Print it and clear it ( x;p;s/.*//;x ). 打印并清除它（ x;p;s/.*//;x ）。 Then proceed as usual with the appending and whatnot. 然后像往常一样进行追加和其他操作。
Treat the first and last lines differently from all others: never flush the hold space after reading just the first line ( 1bgo skips past that, down to the position labeled :go ), and always flush the hold space after reading the last line ( ${ x;p; } ) 将第一行和最后一行与其他所有行区别对待：仅读取第一行后，切勿刷新保持空间（ 1bgo跳过该行，下降到标记为:go的位置），并在读取最后一行后始终刷新保持空间（ ${ x;p; } ）

Answer 3

$ awk '/[a-z]{2}-[A-Z]{2}/ { print b; b=$0; next }  # @xx-XX empty buffer, refill
                           { b=b OFS $0 }           # otherwise append to buffer
                       END { print b }' file        # dump the buffer in the end

en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd;
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd; Unix Linux
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd; START Solaris
en-GB,en-US;q=0.8,en    jsdjpksdkskd;lkskd; Aix SCO

You will get an empty line to start the output with. 您将获得一个空行以开始输出。 Also, use tab delimiter on output if so desired: awk -v OFS="\\t" ... . 另外，如果需要，在输出上使用制表符定界符： awk -v OFS="\\t" ...

合并与正则表达式不匹配的行

问题描述

3 个解决方案

解决方案1
3 2016-12-23 17:48:37

解决方案2
0 2016-12-23 17:34:34

解决方案3
0 已采纳 2016-12-25 10:59:53

合并与正则表达式不匹配的行

问题描述

3 个解决方案

解决方案1 3 2016-12-23 17:48:37

解决方案2 0 2016-12-23 17:34:34

解决方案3 0 已采纳 2016-12-25 10:59:53

解决方案1
3 2016-12-23 17:48:37

解决方案2
0 2016-12-23 17:34:34

解决方案3
0 已采纳 2016-12-25 10:59:53