Bash：读取 CSV 文本文件并查找行的平均值

Question

This is the sample input (the data has user-IDs and the number of hours spent by the user):这是示例输入（数据具有用户 ID 和用户花费的小时数）：

Computer ID,User ID,M,T,W,T,F
Computer1,User3,5,7,3,5,2
Computer2,User5,8,8,8,8,8
Computer3,User4,0,8,0,8,4
Computer4,User1,5,4,5,5,8
Computer5,User2,9,8,10,0,0

I need to read the data, find all User-IDs ending in even numbers (2,4,6,8..) and find average number of hours spent (over five days).我需要读取数据，找到所有以偶数 (2,4,6,8..) 结尾的用户 ID，并找到平均花费的小时数（超过五天）。

I wrote the following script:我写了以下脚本：

hoursarray=(0,0,0,0,0)
while IFS=, read -r col1 col2 col3 col4 col5 col6 col7 || [[ -n $col1 ]]
do
    if [[ $col2 == *"2" ]]; then
        #echo "$col2"
        ((hoursarray[0] = col3 + col4 + col5 + col6 + col7))
    elif  [[ $col2 == *"4" ]]; then 
        #echo "$col2"
        ((hoursarray[1] = hoursarray[1] + col3 + col4 + col5 + col6 + col7))
    elif [[ $col2 == *"6" ]]; then
        #echo "$col2"
        ((hoursarray[2] = hoursarray[2] + col3 + col4 + col5 + col6 + col7))
    elif [[ $col2 == *"8" ]]; then
        #echo "$col2"
        ((hoursarray[3] = hoursarray[3] + col3 + col4 + col5 + col6 + col7))
    elif [[ $col2 == *"10" ]]; then
        #echo "$col2"
        ((hoursarray[4] = hoursarray[4] + col3 + col4 + col5 + col6 + col7))
    fi
done < <(tail -n+2 user-list.txt)
echo ${hoursarray[0]}
echo "$((hoursarray[0]/5))"

This is not a very good way of doing this.这不是一个很好的方法。 Also, the numbers arent adding up correctly.此外，这些数字加起来不正确。

I am getting the following output (for the first one - user2):我得到以下输出（第一个 - user2）：

27
5

I am expecting the following output:我期待以下输出：

27
5.4

What would be a better way to do it?什么是更好的方法呢？ Any help would be appreciated.任何帮助，将不胜感激。

TIA TIA

Answer 1

You issue is echo "$((hoursarray[0]/5))" Bash does not have floating point, so it returns the integer portion only.您发出的是echo "$((hoursarray[0]/5))" Bash 没有浮点数，因此它仅返回整数部分。

Easy to demonstrate:易于演示：

$ hours=27
$ echo "$((hours/5))"
5

If you want to stick to Bash, you could use bc for the floating point result:如果你想坚持使用 Bash，你可以使用bc作为浮点结果：

$ echo "$hours / 5.0" | bc -l
5.40000000000000000000

Or use awk , perl , python , ruby etc.或者使用awk 、 perl 、 python 、 ruby等。

Here is an awk you can parse out.这是您可以解析的awk 。 Easily modified to you use (which is a little unclear to me)易于修改以供您使用（这对我来说有点不清楚）

awk -F, 'FNR==1{print $2; next} 
     {arr[$2]+=($3+$4+$5+$6+$7) }   
     END{ for (e in arr) print e "\t\t" arr[e] "\t" arr[e]/5 }' file

Prints:印刷：

User ID
User1       27  5.4
User2       27  5.4
User3       22  4.4
User4       20  4
User5       40  8

If you only want even users, filter for User that end in any of 0,2,4,6,8:如果您只想要偶数用户，请过滤以 0、2、4、6、8 中的任何一个结尾的User ：

awk -F, 'FNR==1{print $2; next} 
         $2~/[24680]$/ {arr[$2]+=($3+$4+$5+$6+$7) } 
         END{ for (e in arr) print e "\t\t" arr[e] "\t" arr[e]/5 }' file

Prints:印刷：

User ID
User2       27  5.4
User4       20  4

Answer 2

Your description is fairly imprecise, but here's an attempt primarily based on the sample output:您的描述相当不准确，但这是主要基于示例输出的尝试：

awk -F, '$2~/[24680]$/{for(i=3;i<=7;i++){a+=$i};print a;printf "%.2g\n",a/5; a=0}' file 
20
4
27
5.4

$2~/[24680]$/ makes sure we only look at "even" user-IDs. $2~/[24680]$/确保我们只查看“偶数”用户 ID。

for(i=3;i<=7;i++){} iterates over the day columns and adds them. for(i=3;i<=7;i++){}迭代日期列并添加它们。

Edit 1: Accommodating new requirement:编辑 1：适应新要求：

awk -F, '$2~/[24680]$/{for(i=3;i<=7;i++){a+=$i};printf "%s\t%.2g\n",$2,a/5;a=0}' saad 
User4   4
User2   5.4

Answer 3

Sample data showing userIDs with even and odd endings, userID showing up more than once (eg, User2 ), and some non-integer values:示例数据显示具有偶数和奇数结尾的 userID、出现不止一次的 userID（例如User2 ）和一些非整数值：

$ cat user-list.txt
Computer ID,User ID,M,T,W,T,F
Computer1,User3,5,7,3,5,2
Computer2,User5,8,8,8,8,8
Computer3,User4,0,8,0,8,4
Computer4,User1,5,4,5,5,8
Computer5,User2,9,8,10,0,0
Computer5,User120,9,8,10,0,0
Computer5,User2,4,7,12,3.5,1.5

One awk solution to find total hours plus averages, across 5x days, with duplicate userIDs rolled into a single set of numbers, but limited to userIDs that end in an even number:一个awk解决方案，用于在 5 天中查找总小时数加上平均值，将重复的用户 ID 合并为一组数字，但仅限于以偶数结尾的用户 ID：

$ awk -F',' 'FNR==1 { next } $2 ~ /[02468]$/ { tot[$2]+=($3+$4+$5+$6+$7) } END { for ( i in tot ) { print i, tot[i], tot[i]/5 } }' user-list.txt

Where:在哪里：

-F ',' - use comma as input field delimiter -F ',' - 使用逗号作为输入字段分隔符
FNR==1 { next } - skip first line FNR==1 { next } - 跳过第一行
$2 ~ /[02468]$/ - if field 2 ends in an even number $2 ~ /[02468]$/ - 如果字段 2 以偶数结尾
tot[$2]+=($3+$4+$5+$6+$7) - add current line's hours to array where userID is the array index; tot[$2]+=($3+$4+$5+$6+$7) - 将当前行的小时数添加到数组中，其中 userID 是数组索引； this will add up hours from multiple input lines (for same userID) into a single array cell这会将多个输入行（对于相同的用户 ID）的小时数加到一个数组单元格中
for (...) { print ...} - loop through array indices printing the index, total hours and average hours (total divided by 5) for (...) { print ...} - 通过数组索引循环打印索引、总小时数和平均小时数（总小时数除以 5）

The above generates:以上生成：

User120 27 5.4
User2 55 11
User4 20 4

Depending on OPs desired output the print can be replaced with printf and the desired format string ...根据 OP 所需的输出， print可以替换为printf和所需的格式字符串...

Answer 4

 Here is your script modified a little bit:
  
 while IFS=, read -r col1 col2 col3 || [[ -n $col1 ]]
 do
       (( $(sed 's/[^[:digit:]]*//' <<<$col2) % 2 )) || ( echo -n "For $col1 $col2 average is: " && echo "($(tr , + <<<$col3))/5" | bc -l )
 done < <(tail -n+2 list.txt)

prints:印刷：

 For Computer3 User4 average is: 4.00000000000000000000
 For Computer5 User2 average is: 5.40000000000000000000

Bash：读取 CSV 文本文件并查找行的平均值

问题描述

4 个解决方案

解决方案1
0 2020-11-20 16:46:57

解决方案2
0 已采纳 2020-11-20 16:48:54

解决方案3
0 2020-11-20 16:59:45

解决方案4
0 2020-11-20 18:04:51

Bash：读取 CSV 文本文件并查找行的平均值

问题描述

4 个解决方案

解决方案1 0 2020-11-20 16:46:57

解决方案2 0 已采纳 2020-11-20 16:48:54

解决方案3 0 2020-11-20 16:59:45

解决方案4 0 2020-11-20 18:04:51

解决方案1
0 2020-11-20 16:46:57

解决方案2
0 已采纳 2020-11-20 16:48:54

解决方案3
0 2020-11-20 16:59:45

解决方案4
0 2020-11-20 18:04:51