合并多个csv文件

Question

我正在使用q转换我的csv文件： log.csv （链接的文件）。 格式为：

datapath,port,rxpkts,rxbytes,rxerror,txpkts,txbytes,txerror
4,1,178,25159,0,40,3148,0
4,2,3,230,0,213,27897,0
4,3,3,230,0,212,27807,0
4,4,4,320,0,211,27717,0
4,5,3,230,0,212,27807,0
4,6,3,230,0,212,27807,0
4,7,4,320,0,211,27717,0
4,8,4,320,0,211,27717,0
4,9,4,320,0,211,27717,0
4,a,4,320,0,211,27717,0
4,b,3,230,0,212,27807,0
4,fffffffe,7,578,0,209,27549,0
3,1,197,26863,0,21,1638,0
3,2,3,230,0,215,28271,0
3,3,5,390,0,215,28271,0
3,4,2,140,0,216,28361,0
3,5,4,320,0,214,28181,0
3,6,3,230,0,215,28271,0
3,fffffffe,7,578,0,212,28013,0
5,1,208,27401,0,6,488,0
5,fffffffe,7,578,0,208,27401,0
2,1,180,24228,0,18,1368,0
2,2,2,140,0,195,25366,0
2,3,2,140,0,195,25366,0
2,4,3,230,0,194,25276,0
2,5,3,230,0,194,25276,0
2,6,2,140,0,195,25366,0
2,fffffffe,7,578,0,191,25018,0
1,1,38,5096,0,182,23602,0
1,2,42,5419,0,179,23369,0
1,3,61,7152,0,159,21546,0
1,4,28,4611,0,192,24087,0
1,5,46,6022,0,174,22676,0
1,fffffffe,7,578,0,214,28210,0

我想将其转换为以下格式： 在此处输入图片说明

端口数量可以变化。

当前代码：

python q -H -d "," "select rxpkts, txpkts from ./log.csv where datapath = i and port = j" > i_j.csv;

因此，我制作了i*j个文件，然后手动将它们组合在一起。 有没有一种方法可以通过修改上面的sql查询或使用Python或注释中建议的pandas 合并文件来一次性完成此操作？

import subprocess

def printit():
    for i in range(1,6):
        for j in range(1,6):
            query = "select rxpkts, txpkts from ./log.csv where datapath = "+str(i)+" and port = "+str(j)
            fileName = str(i)+"_"+str(j)+".csv"
            with open(fileName, "w+") as f:
                p = subprocess.Popen(["python", "q", "-H", "-d", ",", query], stdout=f)

printit()

Answer 1

您可以将set_index与stack一起使用。

import pandas as pd

# your data
# ======================================
print(df)

    datapath      port  rxpkts   ...     txpkts  txbytes  txerror
0          4         1     178   ...         40     3148        0
1          4         2       3   ...        213    27897        0
2          4         3       3   ...        212    27807        0
3          4         4       4   ...        211    27717        0
4          4         5       3   ...        212    27807        0
5          4         6       3   ...        212    27807        0
6          4         7       4   ...        211    27717        0
7          4         8       4   ...        211    27717        0
8          4         9       4   ...        211    27717        0
9          4         a       4   ...        211    27717        0
..       ...       ...     ...   ...        ...      ...      ...
24         2         4       3   ...        194    25276        0
25         2         5       3   ...        194    25276        0
26         2         6       2   ...        195    25366        0
27         2  fffffffe       7   ...        191    25018        0
28         1         1      38   ...        182    23602        0
29         1         2      42   ...        179    23369        0
30         1         3      61   ...        159    21546        0
31         1         4      28   ...        192    24087        0
32         1         5      46   ...        174    22676        0
33         1  fffffffe       7   ...        214    28210        0

[34 rows x 8 columns]


# reshaping
# ======================================
series_res = df[df.columns[:4]].set_index(['datapath', 'port']).stack()
series_res.name = 'value'



datapath  port             
4         1         rxpkts       178
                    rxbytes    25159
          2         rxpkts         3
                    rxbytes      230
          3         rxpkts         3
                    rxbytes      230
          4         rxpkts         4
                    rxbytes      320
          5         rxpkts         3
                    rxbytes      230
                               ...  
1         2         rxpkts        42
                    rxbytes     5419
          3         rxpkts        61
                    rxbytes     7152
          4         rxpkts        28
                    rxbytes     4611
          5         rxpkts        46
                    rxbytes     6022
          fffffffe  rxpkts         7
                    rxbytes      578
Name: value, dtype: int64



df_res = pd.DataFrame(series_res)
df_res.T

datapath      4                                         ...        1                                        
port          1              2              3           ...        4              5         fffffffe        
         rxpkts rxbytes rxpkts rxbytes rxpkts rxbytes   ...   rxpkts rxbytes rxpkts rxbytes   rxpkts rxbytes
value       178   25159      3     230      3     230   ...       28    4611     46    6022        7     578

[1 rows x 68 columns]

合并多个csv文件

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-07-18 08:55:19

合并多个csv文件

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-07-18 08:55:19

解决方案1
1 已采纳 2015-07-18 08:55:19