简体   繁体   English

从命令输出的不同位置捕获子字符串

[英]Capture sub-string at different positions from command's output

I have a requirement, where I have to capture a string from a command's output and store it for further processing. 我有一个要求,我必须从命令的输出中捕获一个字符串并将其存储以进行进一步处理。 Problem is that the command's output may change sometimes and hence this leads to erroneous results. 问题是命令的输出有时可能会更改,因此这会导致错误的结果。


Requested dataset looks like 请求的数据集看起来像

application_1532934978357_3376 app_name job_type user any_name_2 RUNNING 
UNDEFINED 10% hostname
application_1532934978357_3375 app_name job_type user any_name_2 RUNNING 
UNDEFINED 10% hostname
application_1532934978357_3374 app_name job_type user any_name_2 RUNNING 
UNDEFINED 10% hostname
application_1532934978357_249069 some_information_etc job_type any_name_2 
RUNNING UNDEFINED 95% hostname
application_1532934978357_239728 app_name job_type any_name_2 RUNNING 
UNDEFINED 10% hostname
application_1532934978357_89483 some_info job_type user any_name RUNNING 
UNDEFINED 10% hostname
application_1532934978357_248180 with prog_vrsn as
(se...select cast(Stage-27) job_type user any_name RUNNING UNDEFINED 36.1% 
hostname
application_15329349783879_657880 select cast
value ..(stage35) with table
where value=5; job_type user any_name RUNNING UNDEFINED 10% hostname

and I use: 我用:

cat in | grep "RUNNING" | grep "any_name" | awk '{print $1}'

which generates output as 生成输出为

application_1532934978357_89483 
(se...select cast(Stage-27)
where

While I want to produce output as : 虽然我想产生输出为:

application_1532934978357_89483 
application_1532934978357_248180 
application_15329349783879_657880 

Here is a GNU awk script that only captures the application_XXXX associated to the word any_name : 这里是一个GNU awk脚本,只有捕获application_XXXX关联词any_name

awk -v RS='[ \n]' '/application_[0-9_]+/{a=$0}/\<any_name\>/{print a}' file

It relies on the record separator RS that is set to capture each word. 它依赖于设置为捕获每个单词的记录分隔符RS The application_XXXX string is stored in the variable a and printed when the word any_name is found. application_XXXX字符串存储在变量a并在找到单​​词any_name时打印。

You just need to add one more grep in your command: 您只需要在命令中添加一个grep:

command's output | grep "status_run" | grep -e "id_tag1" -e "id_tag2" | grep "app_id" | awk '{print $1}'

OR 要么

awk '(/status_run/) && (/app_id*/) && (/id_tag[12]/) {print $1;}' filename

This will only print all the app_id with id_tag1 and id_tag2 and which has "status_run" in them. 这只会打印所有具有id_tag1和id_tag2且其中包含“ status_run”的app_id。


Solution after updating your question: 更新您的问题后的解决方案:

cat filename | grep "RUNNING" | grep "any_name" | grep "application*" | awk '{print $1}'

If you want to print all the application Ids, then use the below command: 如果要打印所有应用程序ID,请使用以下命令:

awk '/application*/{print $1}' filename

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM