简体   繁体   English

使用 bash 组合来自多个文本文件的数字

[英]combining numbers from multiple text files using bash

I'm strugling to combine some data from my txt files generated in my jenkins job.我正在努力合并 jenkins 作业中生成的 txt 文件中的一些数据。

on each of the files there is 1 line, this is how each file look:每个文件都有 1 行,这是每个文件的外观:

testsuite name="mytest" cars="201" users="0" bus="0" bike="0" time="116.103016"

What I manage to do for now is to extract the numbers for each txt file:我现在设法做的是提取每个 txt 文件的数字:

awk '/<testsuite name=/{print $3, $4, $5, $6}' my-output*.txt 

Result are:结果是:

cars="193" users="2" bus="0" bike="0"
cars="23" users="2" bus="10" bike="7"
cars="124" users="2" bus="5" bike="0"
cars="124" users="2" bus="0" bike="123"

now I have a random number of files like this:现在我有随机数量的文件,如下所示:

my-output1.txt
my-output2.txt
my-output7.txt
my-output*.txt

I would like to create single command just like the one I did above and to sum all of the files to have the following echo result:我想像上面那样创建单个命令,并将所有文件相加以获得以下回显结果:

cars=544 users=32 bus=12 bike=44

is there a way to do that?有没有办法做到这一点? with a single line of command?用一行命令?

1st solution: With your shown samples please try following awk code, using match function in here.第一种解决方案:对于您显示的示例,请尝试遵循awk代码,在此处使用match function。 Since awk could read multiple files within a single program itself and your files have .txt format you can pass as .txt format to awk program itself.由于awk可以在单个程序本身中读取多个文件并且您的文件具有.txt格式,因此您可以将.txt格式传递给awk程序本身。

Written and tested in GNU awk with its match function's capturing group capability to create/store values into an array to be used later on in program.在 GNU awk中编写和测试,其match函数的捕获组功能可以创建/存储值到数组中,以便稍后在程序中使用。

awk -v s1="\"" '
match($0,/[[:space:]]+(cars)="([^"]*)" (users)="([^"]*)" (bus)="([^"]*)" (bike)="([^"]*)"/,tempArr){
   temp=""
   for(i=2;i<=8;i+=2){
     temp=tempArr[i-1]
     values[i]+=tempArr[i]
     indexes[i-1]=temp
   }
}
END{
   for(i in values){
     val=(val?val OFS:"") (indexes[i-1]"=" s1 values[i] s1)
   }
   print val
}
' *.txt

Explanation:解释:

  • In start of GNU awk program creating variable named s1 to be set to " to be used later in the program.在 GNU awk程序启动时,创建名为s1的变量,将其设置为"以便稍后在程序中使用。
  • Using match function in main program of awk .在 awk 的主程序中使用match awk
  • Mentioning regex [[:space:]]+(cars)="([^"]*)" (users)="([^"]*)" (bus)="([^"]*)" (bike)="([^"]*)" (explained at last of this post) which is creating 8 groups to be used later on.提到正则表达式[[:space:]]+(cars)="([^"]*)" (users)="([^"]*)" (bus)="([^"]*)" (bike)="([^"]*)" (在这篇文章的最后解释)正在创建 8 个组以供以后使用。
  • Then once condition is matched running a for loop which runs only even numbers in it(to get required values only).然后,一旦条件匹配,运行一个 for 循环,该循环仅在其中运行偶数(仅获取所需的值)。
  • Creating array values with index of i and keep adding its own value + tempArr values to it, where tempArr is created by match function.创建索引为 i 的数组值并不断添加其自己的值 + tempArr 值,其中 tempArr 由匹配 function 创建。
  • Similarly creating indexes array to store only key values in it.类似地创建索引数组以仅在其中存储键值。
  • Then in END block of this program traversing through values array and printing the values from indexes and values array as per requirement.然后在该程序的END块中遍历值数组并根据要求打印索引和值数组中的值。

Explanation of regex:正则表达式的解释:

[[:space:]]+       ##Matching spaces 1 or more occurrences here.
(cars)="([^"]*)"   ##Matching cars=" till next occurrence of " here.
 (users)="([^"]*)" ##Matching spaces followed by users=" till next occurrence of " here.
 (bus)="([^"]*)"   ##Matching spaces followed by bus=" till next occurrence of " here.
 (bike)="([^"]*)"  ##Matching spaces followed by bike=" till next occurrence of " here.


2nd solution: In GNU awk only with using RT and RS variables power here.第二种解决方案:在 GNU awk ,仅在此处使用RTRS变量 power。 This will make sure the sequence of the values also in output should be same in which order they have come in input.这将确保 output 中的值的顺序也应该与它们输入的顺序相同。

awk -v s1="\"" -v RS='[[:space:]][^=]*="[^"]*"' '
RT{
  gsub(/^ +|"/,"",RT)
  num=split(RT,arr,"=")
  if(arr[1]!="time" && arr[1]!="name"){
    if(!(arr[1] in values)){
      indexes[++count]=arr[1]
    }
    values[arr[1]]+=arr[2]
  }
}
END{
  for(i=1;i<=count;i++){
     val=(val?val OFS:"") (indexes[i]"=" s1 values[indexes[i]] s1)
  }
  print val
}
' *.txt

Using awk使用awk

$ cat script.awk
BEGIN {
    FS="[= ]"
} {
    gsub(/"/,"")
    for (i=1;i<NF;i++) 
      if ($i=="cars") cars+=$(i+1)
        else if($i=="users") users+=$(i+1);
          else if($i=="bus") bus+=$(i+1); 
            else if ($i=="bike")bike+=$(i+1)
} END {
print "cars="cars,"users="users,"bus="bus,"bike="bike
}

To run the script, you can use;要运行脚本,您可以使用;

$ awk -f script.awk my-output*.txt

Or, as a ugly one liner.或者,作为一个丑陋的班轮。

$ awk -F"[= ]" '{gsub(/"/,"");for (i=1;i<NF;i++) if ($i=="cars") cars+=$(i+1); else if($i=="users") users+=$(i+1); else if($i=="bus") bus+=$(i+1); else if ($i=="bike")bike+=$(i+1)}END{print"cars="cars,"users="users,"bus="bus,"bike="bike}' my-output*.txt 

found a way to do so a bit long:找到了一种方法来做这件事有点长:

awk '/<testsuite name=/{print $3, $4, $5, $6}' my-output*.xml | sed -e 's/[^0-9]/ /g' -e 's/^ *//g' -e 's/ *$//g' | tr -s ' ' | awk '{bus+=$1;users+=$2;cars+=$3;bike+=$4 }END{print "bus=" bus " users="users " cars=" cars " bike=" bike}'

M. Nejat Aydin answer was good fit: M. Nejat Aydin 的回答很合适:

awk -F '[ "=]+' '/testsuite name=/{ cars+=$5; users+=$7; buses+=$9; bikes+=$11 } END{ print "cars="cars, "users="users, "buses="buses, "bikes="bikes }' my-output*.xml

You can try rquery to do such query.您可以尝试 rquery 进行此类查询。

[ rquery]$ echo 'testsuite name="mytest" cars="201" users="0" bus="0" bike="0" time="116.103016"' > files/output1.txt
[ rquery]$ echo 'testsuite name="mytest" cars="201" users="1" bus="1" bike="2" time="116.103016"' > files/output2.txt
[ rquery]$ echo 'testsuite name="mytest" cars="301" users="10" bus="21" bike="23" time="116.103016"' > files/output3.txt
[ rquery]$ ./rq -q "p d/ /|s 'cars='+sum(trim(substr(@3,strlen('cars=')),'\"')),'users='+sum(trim(substr(@4,strlen('users=')),'\"')),'bus='+sum(trim(substr(@5,strlen('bus=')),'\"')),'bikes='+sum(trim(substr(@6,strlen('bikes=')),'\"'))" files/
cars=703        users=11        bus=22  bikes=25

Check out the latest rquery from here https://github.com/fuyuncat/rquery/releases从这里查看最新的 rquery https://github.com/fuyuncat/rquery/releases

You may use this awk solution:您可以使用此awk解决方案:

awk '{
   for (i=1; i<=NF; ++i)
      if (split($i, a, /=/) == 2) {
         gsub(/"/, "", a[2])
         sums[a[1]] +=a[2]
      }
}
END {
   for (i in sums) print i "=" sums[i]
}' file*

bus=15
cars=464
users=8
bike=130

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM