I have a file in the below format:
User: user1
Count:3
Sum:80
departmentId: dept1
Amount by departmentId: 20
departmentId: dept1
Amount by departmentId: 35
departmentId: dept2
Amount by departmentId: 25
User: user2
Count:3
Sum:7.199999999999999
departmentId: dept1
Amount by departmentId: 2.4
departmentId: dept2
Amount by departmentId: 2.4
departmentId: dept3
Amount by departmentId: 2.4
User: user3
Count:1
Sum:0.2
departmentId: dept2
Amount by departmentId: 0.2
User: user4
Count:2
Sum:2
departmentId: dept3
Amount by departmentId: 1
departmentId: dept3
Amount by departmentId: 1
The file list basically the User dues for a department. If the same user is due to a department multiple times then that need to be merged into one row. The output file needs to be in the below format.
EDIT: For user1, he has 2 dues for dept1 and 1 due for dept2. So in the output file the 2 dues for dept1 need to be merged into 1 and the total count on line will be 2, as count needs to be user per department.
Format:
count total_sum
userId+deptId sum for that dept
Example:
2 80
user1dept1 55
user1dept2 25
3 7.2
user2dept1 2.4
user2dept2 2.4
user2dept3 2.4
1 0.2
user3dept2 0.2
1 2
user4dept3 2
Please advice on which scripting language to use, bash or python?And how to loop through the input file? Thanks
You don't use a shell to manipulate text (see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why). Awk is the tool that the guys who invented shell invented for shell to call to manipulate text so just use that.
$ cat tst.awk
BEGIN { FS=": *" }
{
gsub(/^ +| +$/,"")
f[$1] = $2
}
/Amount/ {
dept = f["departmentId"]
subTot[dept] += $2
tot += $2
}
$1 == "User" {
if (NR>1) {
prt()
}
user = $2
}
END { prt() }
function prt() {
print length(subTot), tot
for (dept in subTot) {
print user dept, subTot[dept]
}
delete subTot
tot = 0
}
.
$ awk -f tst.awk file
2 80
user1dept1 55
user1dept2 25
3 7.2
user2dept1 2.4
user2dept2 2.4
user2dept3 2.4
1 0.2
user3dept2 0.2
1 2
user4dept3 2
The above assumes you have an awk where length(array)
gives you the number of elements in an array. If you don't then just count every time you see a new dept for the current user (eg by using if (!(dept in subTot)) numDepts++
just before you populate subTot[dept]
) and print that value instead.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.