简体   繁体   English

使用AWK提取并存储空格不均匀的字符串

[英]Extract & store Strings with uneven spaces using AWK

I have a file contain data like below. 我有一个包含如下数据的文件。 I want to cut first and last Columns and store in variables. 我想剪切第一列和最后一列并存储在变量中。 I am able to print it using command " awk -F" {2,}" '{print $1,$NF}' filename.txt " but I am unable to store it in variables using awk -v command. 我可以使用命令“ awk -F" {2,}" '{print $1,$NF}' filename.txtawk -F" {2,}" '{print $1,$NF}' filename.txt它,但是我无法使用awk -v命令将其存储在变量中。

The main problem is that first column contains space between words and awk is treating it 3 columns if I am using awk -v command. 主要问题是第一列包含单词之间的空间,如果我使用awk -v命令, awk将其awk 3列。

Please suggest me how I can achieve this. 请建议我如何实现这一目标。


XML 2144 11270 2846 3385074 XML 2144 11270 2846 3385074

Java 7356 272651 242949 1350596 Java 7356 272651 242949 1350596

C++ 671 46497 42702 179366 C ++ 671 46497 42702 179366

C/C++ Header 671 16932 57837 44248 C / C ++标头671 16932 57837 44248

XSD 216 3131 807 27634 XSD 216 3131 807 27634

Korn Shell 129 3686 4279 12431 Korn Shell 129 3686 4279 12431

IDL 90 1098 0 8697 IDL 90 1098 0 8697

Perl 17 717 795 5698 Perl 17 717 795 5698

Python 37 1102 786 4640 蟒蛇37 1102 786 4640

Ant 62 596 154 4015 蚂蚁62596154 4015

XSLT 18 117 13 2153 XSLT 18117 13 2153

make 14 414 1659 1833 使14 414 1659 1833

Bourne Again Shell 32 532 469 1830 Bourne Again Shell 32532469 1830

JavaScript 10 204 35 1160 的JavaScript 10 204 35 1160

CSS 5 95 45 735 CSS 5 95 45 735

SKILL 2 77 0 523 技能2 77 0 523

HTML 11 70 49 494 HTML 11 70 49494

SQL 9 39 89 71 SQL 9 39 89 71

C Shell 3 13 25 31 C外壳3 13 25 31

D 1 5 15 10 D 1 5 15 10

SUM: 11498 359246 355554 5031239 和:11498 359246 355554 5031239

The -v VAR=value parameter is evaluated before the awk code executes. 在执行awk代码之前,将对-v VAR=value参数进行求-v VAR=value It's not actually part of the code, so you can't reference fields because they don't exist yet. 它实际上不是代码的一部分,因此您不能引用字段,因为它们尚不存在。 Instead, set the variable in code: 而是在代码中设置变量:

awk '{ Lang=$1; Last=$NF; print Lang, Last; }'

Also, setting those variables within awk won't affect bash's variables. 同样,在awk中设置这些变量不会影响bash的变量。 Environments are hierarchical--each child environment inherits some state from the parent environment, but it never flows back upwards. 环境是分层的-每个子环境都从父环境继承某些状态,但是它永远不会向上流动。 The only way to get state from a child is for the child to print it in a format that the parent can handle. 从孩子那里获得状态的唯一方法是让孩子以父母可以处理的格式打印它。 For example, you can pipe the above command to while read LANG LAST; do ...; done 例如,您可以while read LANG LAST; do ...; done将上述命令传递给while read LANG LAST; do ...; done while read LANG LAST; do ...; done while read LANG LAST; do ...; done to read the awk output into variables. while read LANG LAST; do ...; done将awk输出读取为变量。

It seems from your comment that you're trying to mix awk and shell in a way that doesn't quite make sense. 从您的评论看来,您试图以一种不太有意义的方式来混合awk和shell。 So the correct full code (for getting the variables in a bash loop) would be: 因此,正确的完整代码(用于在bash循环中获取变量)将是:

cat loc.txt | awk '{ Lang=$1; Last=$NF; print Lang, Last; }' | while read LANG LAST; do ...; done

Or if it's a fixed number of fields, you can skip awk entirely: 或者,如果它是固定数量的字段,则可以完全跳过awk:

cat loc.txt | while read LANG _ _ _ _ LAST; do ...; done

where the "_" just represents a variable which is created and ignored. 其中“ _”仅代表已创建并忽略的变量。 It's a bit of a convention that underscores represent placeholders in some programming languages, and in this case it's actually a variable which could be printed with echo $_ . 下划线表示某些编程语言中的占位符是一个约定,在这种情况下,它实际上是一个可以用echo $_打印的变量。 You'd give it a real name, and name each field differently, if you cared about the middle values. 如果您关心中间值,则可以给它起一个真实的名称,并为每个字段取不同的名称。

Neither of these solutions cares about how much whitespace there is. 这些解决方案都不在乎有多少空白。 Awk doesn't care unless you tell it to, and neither does the shell. 除非您告知,否则Awk不会在乎,shell也不会。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM