使用Shell或python将平面文件转换为其他格式

Question

I have a file in the below format: 我有以下格式的文件：

User: user1
Count:3
Sum:80
  departmentId: dept1
  Amount by departmentId: 20
  departmentId: dept1
  Amount by departmentId: 35
  departmentId: dept2
  Amount by departmentId: 25
User: user2
Count:3
Sum:7.199999999999999
  departmentId: dept1
  Amount by departmentId: 2.4
  departmentId: dept2
  Amount by departmentId: 2.4
  departmentId: dept3
  Amount by departmentId: 2.4
User: user3
Count:1
Sum:0.2
  departmentId: dept2
  Amount by departmentId: 0.2
User: user4
Count:2
Sum:2
  departmentId: dept3
  Amount by departmentId: 1
  departmentId: dept3
  Amount by departmentId: 1

The file list basically the User dues for a department. 文件列表基本上是部门的用户费用。 If the same user is due to a department multiple times then that need to be merged into one row. 如果同一用户多次属于某个部门，则需要将该部门合并为一行。 The output file needs to be in the below format. 输出文件必须采用以下格式。

EDIT: For user1, he has 2 dues for dept1 and 1 due for dept2. 编辑：对于user1，他对dept1有2个会费，对dept2有1个会费。 So in the output file the 2 dues for dept1 need to be merged into 1 and the total count on line will be 2, as count needs to be user per department. 因此，在输出文件中，需要将dept1的2个会费合并为1，并且在线的总计数为2，因为该计数需要每个部门的用户。

Format:
count total_sum
userId+deptId sum for that dept

Example:

2 80
user1dept1 55
user1dept2 25
3 7.2
user2dept1 2.4
user2dept2 2.4
user2dept3 2.4
1 0.2
user3dept2 0.2
1 2
user4dept3 2

Please advice on which scripting language to use, bash or python?And how to loop through the input file? 请建议使用哪种脚本语言，bash或python？以及如何在输入文件中循环？ Thanks 谢谢

Answer 1

You don't use a shell to manipulate text (see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why). 您不使用外壳来操纵文本（请参阅https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice一些原因）。 Awk is the tool that the guys who invented shell invented for shell to call to manipulate text so just use that. Awk是发明shell的人发明的工具，shell调用它们来操纵文本，因此只需使用它即可。

$ cat tst.awk
BEGIN { FS=": *" }
{
    gsub(/^ +| +$/,"")
    f[$1] = $2
}
/Amount/ {
    dept = f["departmentId"]
    subTot[dept] += $2
    tot += $2
}
$1 == "User" {
    if (NR>1) {
        prt()
    }
    user = $2
}
END { prt() }

function prt() {
    print length(subTot), tot
    for (dept in subTot) {
        print user dept, subTot[dept]
    }
    delete subTot
    tot = 0
}

. 。

$ awk -f tst.awk file
2 80
user1dept1 55
user1dept2 25
3 7.2
user2dept1 2.4
user2dept2 2.4
user2dept3 2.4
1 0.2
user3dept2 0.2
1 2
user4dept3 2

The above assumes you have an awk where length(array) gives you the number of elements in an array. 上面假设您有awk，其中length(array)给出了数组中元素的数量。 If you don't then just count every time you see a new dept for the current user (eg by using if (!(dept in subTot)) numDepts++ just before you populate subTot[dept] ) and print that value instead. 如果您不这样做，则每次查看当前用户的新部门时（例如，在填充subTot[dept]之前通过使用if (!(dept in subTot)) numDepts++ ）并打印该值。

使用Shell或python将平面文件转换为其他格式

问题描述

1 个解决方案

解决方案1
4 已采纳 2018-05-01 15:25:43

使用Shell或python将平面文件转换为其他格式

问题描述

1 个解决方案

解决方案1 4 已采纳 2018-05-01 15:25:43

解决方案1
4 已采纳 2018-05-01 15:25:43