简体   繁体   English

使用sed或awk的单引号部分

[英]Single-quote part of a line using sed or awk

Convert input text as follows, using sed or awk : 使用sedawk如下转换输入文本:

Input file: 输入文件:

       113259740 QA Test in progress
       219919630 UAT Test in progress

Expected output: 预期产量:

       113259740 'QA Test in progress'
       219919630 'UAT Test in progress'

Using GNU sed or BSD (OSX) sed : 使用GNU sed或BSD(OSX) sed

sed -E "s/^( *)([^ ]+)( +)(.*)$/\1\2\3'\4'/" file
  • ^( *) captures all leading spaces, if any ^( *)捕获所有前导空格(如果有)
  • ([^ ]+) captures the 1st field (a run of non-space characters of at least length 1) ([^ ]+)捕获第一个字段(一系列长度至少为1的非空格字符)
  • ( +) captures the space(s) after the first field ( +)捕获第一个字段之后的空格
  • (.*)$ matches the rest of the line, whatever it may be (.*)$匹配行的其余部分,无论它是什么
  • \\1\\2\\3'\\4' replaces each (matching) input line with the captured leading spaces, followed by the 1st field, followed by the captured first inter-field space(s), followed by the single-quoted remainder of the input line. \\1\\2\\3'\\4'将每个(匹配的)输入行替换为捕获的前导空格,然后是第一个字段,然后是捕获的第一个字段间空间,然后是单引号的其余部分输入线。 To discard the leading spaces, simply omit \\1 . 要丢弃前导空格,只需省略\\1

Note: 注意:

  • Matching the 1st field is more permissive than strictly required in that it matches any non-space sequence of characters, not just digits (as in the sample input data). 比起严格要求,匹配第一个字段更宽松,因为它匹配任何非空格字符序列,而不仅仅是数字(如样本输入数据)。
  • A generalized solution supporting other forms of whitespace (such as tabs), including after the 1st field, would look like this: 一个支持其他形式的空格 (例如制表符)的通用解决方案 ,包括在第一个字段之后,如下所示:

     sed -E "s/^([[:space:]]*)([^[:space:]]+)([[:space:]]+)(.*)$/\\1\\2\\3'\\4'/" file 

If your sed version doesn't support -E (or -r ) to enable support for extended regexes, try the following, POSIX-compliant variant that uses a basic regex: 如果您的sed版本不支持-E (或-r )以支持扩展的正则表达式,请尝试以下使用基本正则表达式的POSIX兼容变体:

 sed "s/^\( *\)\([^ ]\{1,\}\)\( \{1,\}\)\(.*\)$/\1\2\3'\4'/" file

And in awk : 并在awk

awk '{ printf "%s '"'"'", $1; for (i=2; i<NF; ++i) printf "%s ", $i; print $NF "'"'"'" }' file

Explanation: 说明:

  • printf "%s '"'"'", $1; Print the first field, followed by a space and a quote ( ' ) 打印第一个字段,后跟一个空格和一个引号( '
  • for (i=2; i<NF; ++i) printf "%s ", $i; Print all of the following fields save the last one, each followed by a space. 打印以下所有字段,保存最后一个字段,每个字段后跟一个空格。
  • print $NF "'"'"'" Print the last field followed by a quote( ' ) print $NF "'"'"'"打印最后一个字段,后跟引号( '

Note that '"'"'" is used to print just a single quote ( ' ). An alternative is to specify the quote character on the command line as a variable: 请注意, '"'"'"仅用于打印单引号( ' )。另一种方法是在命令行上将引号字符指定为变量:

awk -v qt="'" '{ printf "%s %s", $1, qt; for (i=2; i<NF; ++i) printf "%s ", $i; print $NF qt }' file

You could try this GNU sed command also, 您也可以尝试使用此GNU sed命令,

sed -r "s/^( +) ([0-9]+) (.*)$/\1 \2 '\3'/g" file
  • ^( +) , catches one or more spaces at the starting and stored it in a group(1). ^( +)在开始处捕获一个或多个空格,并将其存储在group(1)中。

  • ([0-9]+) - After catching one or more spaces at the starting, next it matches a space after that and fetch all the numbers that are next to that space then store it in a group(2). ([0-9]+) -在开始处捕获一个或多个空格后,下一个空格将与该空格之后的空格匹配,并获取该空格旁边的所有数字,然后将其存储在group(2)中。

  • (.*)$ - Fetch all the characters that are next to numbers upto the last character and then store it in a group(3). (.*)$ -提取数字旁边的所有字符直到最后一个字符,然后将其存储在group(3)中。

  • All the fetched groups are rearranged in the replacement part according to the desired output. 根据需要的输出,所有获取的组都将在替换部分中重新排列。

Example: 例:

$ cat ccc
       113259740 QA Test in progress
       219919630 UAT Test in progress

$ sed -r "s/^( +) ([0-9]+) (.*)$/\1 \2 '\3'/g" ccc
       113259740 'QA Test in progress'
       219919630 'UAT Test in progress'

You can perform this by taking advantage of the word-splitting involved in most shells like bash. 您可以利用bash等大多数shell中涉及的单词拆分功能来执行此操作。 To avoid ending up with an extra single quote in the final result, you can just remove it with sed. 为了避免在最终结果中以多余的单引号引起来,您可以仅使用sed将其删除。 This will also trim any extra spaces before i, between i and j and after j. 这还将修剪i之前,i和j之间以及j之后的所有多余空间。

cat file.txt | sed "s/'//g" | while read ij; do echo "$i '$j'"; done

Here, we'll pipe the first word into variable i, and the rest in j. 在这里,我们将第一个单词传递给变量i,其余的传递给j。

An awk solution: awk解决方案:

awk -v q="'" '{ f1=$1; $1=""; print f1, q substr($0,2) q }' file
  • Lets awk split each input line into fields by whitespace (the default behavior). awk用空格将每条输入行拆分为字段(默认行为)。
  • -vq="'" defines awk variable q containing a single quote so as to make it easier to use a single quote inside the awk program, which is single-quoted as a whole. -vq="'"定义包含单引号的awk变量q ,以便更轻松地在awk程序中使用单引号,该程序在整体上被单引号引起来。
  • f1=$1 saves the 1st field for later use. f1=$1保存第一个字段供以后使用。
  • $1=="" effectively removes the first field from the input line, leaving $0 , which originally referred to the whole input line, to contain a space followed by the rest of the line (strictly speaking, the fields are re-concatenated using the output-field separator OFS , which defaults to a space; since the 1st field is now empty, the resulting $0 starts with a single space followed by all remaining fields separated by a space each). $1==""有效地从输入行中删除了第一个字段,而保留$0最初指向整个输入行的$0 ,以在其后的其余行中包含一个空格(严格来说,这些字段使用输出字段分隔符OFS ,默认为空格;由于第一个字段现在为空,因此结果$0以单个空格开头,然后是所有其余字段,每个空格之间用空格隔开。
  • print f1, q substr($0,2) q then prints the saved 1st field, followed by a space ( OFS ) due to , , followed by the remainder of the line (with the initial space stripped with substr() ) enclosed in single quotes ( q ). print f1, q substr($0,2) q然后打印所保存的第一字段,后跟一个空格( OFS )由于, ,其次是该行的其余部分(与剥离的初始空间substr()包含在单引号( q )。

Note that this solution normalizes whitespace: 请注意,此解决方案规范了空格:

  • leading and trailing whitespace is removed 前导和尾随空格已删除
  • interior whitespace of length greater than 1 is compressed to a single space each. 长度大于1的内部空白将每个压缩为单个空间。

Since the post is tagged with bash , here is an all Bash solution that preserves leading white space. 由于帖子使用bash标记,因此这里提供了一种全Bash解决方案,可保留领先的空白空间。

while IFS= read -r line; do
    read -r f1 f2 <<<"$line"
    echo "${line/$f1 $f2/$f1 $'\''$f2$'\''}"
done < file

Output: 输出:

       113259740 'QA Test in progress'   
       219919630 'UAT Test in progress'

Here is a simple way to do it with awk 这是使用awk的简单方法

awk '{sub($2,v"&");sub($NF,"&"v)}1' v=\' file
       113259740 'QA Test in progress'
       219919630 'UAT Test in progress'

It does not change the formatting of the file. 它不会更改文件的格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM