[英]Capture sub-string at different positions from command's output
I have a requirement, where I have to capture a string from a command's output and store it for further processing. 我有一个要求,我必须从命令的输出中捕获一个字符串并将其存储以进行进一步处理。 Problem is that the command's output may change sometimes and hence this leads to erroneous results.
问题是命令的输出有时可能会更改,因此这会导致错误的结果。
Requested dataset looks like 请求的数据集看起来像
application_1532934978357_3376 app_name job_type user any_name_2 RUNNING
UNDEFINED 10% hostname
application_1532934978357_3375 app_name job_type user any_name_2 RUNNING
UNDEFINED 10% hostname
application_1532934978357_3374 app_name job_type user any_name_2 RUNNING
UNDEFINED 10% hostname
application_1532934978357_249069 some_information_etc job_type any_name_2
RUNNING UNDEFINED 95% hostname
application_1532934978357_239728 app_name job_type any_name_2 RUNNING
UNDEFINED 10% hostname
application_1532934978357_89483 some_info job_type user any_name RUNNING
UNDEFINED 10% hostname
application_1532934978357_248180 with prog_vrsn as
(se...select cast(Stage-27) job_type user any_name RUNNING UNDEFINED 36.1%
hostname
application_15329349783879_657880 select cast
value ..(stage35) with table
where value=5; job_type user any_name RUNNING UNDEFINED 10% hostname
and I use: 我用:
cat in | grep "RUNNING" | grep "any_name" | awk '{print $1}'
which generates output as 生成输出为
application_1532934978357_89483
(se...select cast(Stage-27)
where
While I want to produce output as : 虽然我想产生输出为:
application_1532934978357_89483
application_1532934978357_248180
application_15329349783879_657880
Here is a GNU awk script that only captures the application_XXXX
associated to the word any_name
: 这里是一个GNU awk脚本,只有捕获
application_XXXX
关联词any_name
:
awk -v RS='[ \n]' '/application_[0-9_]+/{a=$0}/\<any_name\>/{print a}' file
It relies on the record separator RS
that is set to capture each word. 它依赖于设置为捕获每个单词的记录分隔符
RS
。 The application_XXXX
string is stored in the variable a
and printed when the word any_name
is found. application_XXXX
字符串存储在变量a
并在找到单词any_name
时打印。
You just need to add one more grep in your command: 您只需要在命令中添加一个grep:
command's output | grep "status_run" | grep -e "id_tag1" -e "id_tag2" | grep "app_id" | awk '{print $1}'
OR 要么
awk '(/status_run/) && (/app_id*/) && (/id_tag[12]/) {print $1;}' filename
This will only print all the app_id with id_tag1 and id_tag2 and which has "status_run" in them. 这只会打印所有具有id_tag1和id_tag2且其中包含“ status_run”的app_id。
Solution after updating your question: 更新您的问题后的解决方案:
cat filename | grep "RUNNING" | grep "any_name" | grep "application*" | awk '{print $1}'
If you want to print all the application Ids, then use the below command: 如果要打印所有应用程序ID,请使用以下命令:
awk '/application*/{print $1}' filename
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.