简体   繁体   中英

Bash / Shell: Parsing CSV file in bash script and skipping first line

I am trying to parse a csv file (that contains userIDs and working hours for each user). I have written the following script:

#save weekly average to a file
    while IFS=, read -r col1 col2 col3 col4 col5 col6 col7
    do
        echo "$col2  ($col3+$col4+$col5+$col6+$col7)/5"
    done < user-list.txt

I am facing the following two problems:

  1. I want to skip the first line of the csv file since it contains the headers
  2. I am trying to calculate the average value but the echo command does not execute the expression.

Some sample data from the input file is:

Computer ID,User ID,M,T,W,T,F
Computer1,User3,5,7,3,5,2

Any help would be appreciated. TIA

Try

awk -F, 'NR > 1 { map[$2]=($3+$4+$5+$6+$7)/5 } END { PROCINFO["sorted_in"]="@val_num_asc";for (i in map) { printf "%s %.2f\n",i,map[i] } }' user-list.txt

Use comma as the field delimiter through -F, Add the third, fourth, fifth, sixth and seventh fields and divide by 5, putting the result in an array called map, indexed by the user ($2). Ignore the headers with NR > 1. At the end, set the sorting of the array to value number ascending and loop through the array, printing the index (user) and the value to 2 decimal places.

For your first problem this should solve the problem:

#save weekly average to a file
while IFS=, read -r col1 col2 col3 col4 col5 col6 col7
do
    echo "$col2  ($col3+$col4+$col5+$col6+$col7)/5"
done < <tail -n +2 user-list.txt

The second is a bit more complex - the echo command is only used to display variable contents or output of ny kind, it is not for mathematic expressions. Use the expr command like this:

myvar=$((1 + 2)); result=$(($myvar / 3)); echo $result

Something like this, with a little adaption to your problem will solve it.

OP hasn't (yet) provided any sample input data nor the desired output so some assumptions:

  • data values could be integer or reals, positive or negative
  • the user wants the average for each line (no need to calculate an average for the entire file)

Some sample data:

$ cat user-list.txt
a,b,c,d,e,f,g,h
1,id1,3,4,5,6,7
2,id2,13,14.233,15,16,17
3,id2,3.2,4.3,5.9233,6.0,7.32
4,id4,-3.2,4.3,-15.3,96.0,7.32

One awk solution:

$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt

Where:

  • -F"," - use comma as input field separator
  • FNR>=2 - skip the first line of the file
  • printf "%s %10.3f\\n" - print field 2 using %s format; print the average using %10.3f format (width of 10 w/ max of 6 digits to left of decimal plus the decimal plus 3 digits to the right of the decimal); append a linefeed ( \\n ) on the end

The above generates:

id1      5.000
id2     15.047
id2      5.349
id4     17.824

OP has added a new requirement ... sort the output by the calculated averages however, there are a few potential issues that need further input from the OP:

  • Can a userID show up more than once in the data file?
  • If a userID can show up more than once then do we need to generate a single line of output for each unique userID or do we generate separate lines for each occurrence of a userID?
  • Is the data to be sorted in ascending or descending order?

For now I'm going to assume:

  • A userID may show up more than once in the source data (eg, as with id2 in my sample data set - above).
  • We will not combine multiple lines for a given userID (ie, each line will stand on its own).
  • We'll show sorting in both ascending and descending order.

While the sorting can be done within awk I'm going to opt for piping the awk output to sort as this will require a bit less code and (imo) be a bit easier to understand.

Ascending sort:

$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt | sort -nk2
id1      5.000
id2      5.349
id2     15.047
id4     17.824

Where sort -nk2 says to sort by column #2 using a n umeric sort.

Descending sort:

$ awk -F"," 'FNR>=2 { printf "%s %10.3f\n", $2, ($3+$4+$5+$6+$7)/5.0 }' user-list.txt | sort -rnk2
id4     17.824
id2     15.047
id2      5.349
id1      5.000

Where sort -rnk2 says to sort by column #2 using a n umeric sort but to r everse the order

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM