简体   繁体   中英

Convert flat file to a different format using shell or python

I have a file in the below format:

User: user1
Count:3
Sum:80
  departmentId: dept1
  Amount by departmentId: 20
  departmentId: dept1
  Amount by departmentId: 35
  departmentId: dept2
  Amount by departmentId: 25
User: user2
Count:3
Sum:7.199999999999999
  departmentId: dept1
  Amount by departmentId: 2.4
  departmentId: dept2
  Amount by departmentId: 2.4
  departmentId: dept3
  Amount by departmentId: 2.4
User: user3
Count:1
Sum:0.2
  departmentId: dept2
  Amount by departmentId: 0.2
User: user4
Count:2
Sum:2
  departmentId: dept3
  Amount by departmentId: 1
  departmentId: dept3
  Amount by departmentId: 1

The file list basically the User dues for a department. If the same user is due to a department multiple times then that need to be merged into one row. The output file needs to be in the below format.

EDIT: For user1, he has 2 dues for dept1 and 1 due for dept2. So in the output file the 2 dues for dept1 need to be merged into 1 and the total count on line will be 2, as count needs to be user per department.

Format:
count total_sum
userId+deptId sum for that dept

Example:

2 80
user1dept1 55
user1dept2 25
3 7.2
user2dept1 2.4
user2dept2 2.4
user2dept3 2.4
1 0.2
user3dept2 0.2
1 2
user4dept3 2

Please advice on which scripting language to use, bash or python?And how to loop through the input file? Thanks

You don't use a shell to manipulate text (see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why). Awk is the tool that the guys who invented shell invented for shell to call to manipulate text so just use that.

$ cat tst.awk
BEGIN { FS=": *" }
{
    gsub(/^ +| +$/,"")
    f[$1] = $2
}
/Amount/ {
    dept = f["departmentId"]
    subTot[dept] += $2
    tot += $2
}
$1 == "User" {
    if (NR>1) {
        prt()
    }
    user = $2
}
END { prt() }

function prt() {
    print length(subTot), tot
    for (dept in subTot) {
        print user dept, subTot[dept]
    }
    delete subTot
    tot = 0
}

.

$ awk -f tst.awk file
2 80
user1dept1 55
user1dept2 25
3 7.2
user2dept1 2.4
user2dept2 2.4
user2dept3 2.4
1 0.2
user3dept2 0.2
1 2
user4dept3 2

The above assumes you have an awk where length(array) gives you the number of elements in an array. If you don't then just count every time you see a new dept for the current user (eg by using if (!(dept in subTot)) numDepts++ just before you populate subTot[dept] ) and print that value instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM