简体   繁体   English

使用Shell或python将平面文件转换为其他格式

[英]Convert flat file to a different format using shell or python

I have a file in the below format: 我有以下格式的文件:

User: user1
Count:3
Sum:80
  departmentId: dept1
  Amount by departmentId: 20
  departmentId: dept1
  Amount by departmentId: 35
  departmentId: dept2
  Amount by departmentId: 25
User: user2
Count:3
Sum:7.199999999999999
  departmentId: dept1
  Amount by departmentId: 2.4
  departmentId: dept2
  Amount by departmentId: 2.4
  departmentId: dept3
  Amount by departmentId: 2.4
User: user3
Count:1
Sum:0.2
  departmentId: dept2
  Amount by departmentId: 0.2
User: user4
Count:2
Sum:2
  departmentId: dept3
  Amount by departmentId: 1
  departmentId: dept3
  Amount by departmentId: 1

The file list basically the User dues for a department. 文件列表基本上是部门的用户费用。 If the same user is due to a department multiple times then that need to be merged into one row. 如果同一用户多次属于某个部门,则需要将该部门合并为一行。 The output file needs to be in the below format. 输出文件必须采用以下格式。

EDIT: For user1, he has 2 dues for dept1 and 1 due for dept2. 编辑:对于user1,他对dept1有2个会费,对dept2有1个会费。 So in the output file the 2 dues for dept1 need to be merged into 1 and the total count on line will be 2, as count needs to be user per department. 因此,在输出文件中,需要将dept1的2个会费合并为1,并且在线的总计数为2,因为该计数需要每个部门的用户。

Format:
count total_sum
userId+deptId sum for that dept

Example:

2 80
user1dept1 55
user1dept2 25
3 7.2
user2dept1 2.4
user2dept2 2.4
user2dept3 2.4
1 0.2
user3dept2 0.2
1 2
user4dept3 2

Please advice on which scripting language to use, bash or python?And how to loop through the input file? 请建议使用哪种脚本语言,bash或python?以及如何在输入文件中循环? Thanks 谢谢

You don't use a shell to manipulate text (see https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice for some of the reasons why). 您不使用外壳来操纵文本(请参阅https://unix.stackexchange.com/questions/169716/why-is-using-a-shell-loop-to-process-text-considered-bad-practice一些原因)。 Awk is the tool that the guys who invented shell invented for shell to call to manipulate text so just use that. Awk是发明shell的人发明的工具,shell调用它们来操纵文本,因此只需使用它即可。

$ cat tst.awk
BEGIN { FS=": *" }
{
    gsub(/^ +| +$/,"")
    f[$1] = $2
}
/Amount/ {
    dept = f["departmentId"]
    subTot[dept] += $2
    tot += $2
}
$1 == "User" {
    if (NR>1) {
        prt()
    }
    user = $2
}
END { prt() }

function prt() {
    print length(subTot), tot
    for (dept in subTot) {
        print user dept, subTot[dept]
    }
    delete subTot
    tot = 0
}

.

$ awk -f tst.awk file
2 80
user1dept1 55
user1dept2 25
3 7.2
user2dept1 2.4
user2dept2 2.4
user2dept3 2.4
1 0.2
user3dept2 0.2
1 2
user4dept3 2

The above assumes you have an awk where length(array) gives you the number of elements in an array. 上面假设您有awk,其中length(array)给出了数组中元素的数量。 If you don't then just count every time you see a new dept for the current user (eg by using if (!(dept in subTot)) numDepts++ just before you populate subTot[dept] ) and print that value instead. 如果您不这样做,则每次查看当前用户的新部门时(例如,在填充subTot[dept]之前通过使用if (!(dept in subTot)) numDepts++ )并打印该值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM