[英]Combine multiple csv files
我正在使用q
转换我的csv文件: log.csv (链接的文件)。 格式为:
datapath,port,rxpkts,rxbytes,rxerror,txpkts,txbytes,txerror
4,1,178,25159,0,40,3148,0
4,2,3,230,0,213,27897,0
4,3,3,230,0,212,27807,0
4,4,4,320,0,211,27717,0
4,5,3,230,0,212,27807,0
4,6,3,230,0,212,27807,0
4,7,4,320,0,211,27717,0
4,8,4,320,0,211,27717,0
4,9,4,320,0,211,27717,0
4,a,4,320,0,211,27717,0
4,b,3,230,0,212,27807,0
4,fffffffe,7,578,0,209,27549,0
3,1,197,26863,0,21,1638,0
3,2,3,230,0,215,28271,0
3,3,5,390,0,215,28271,0
3,4,2,140,0,216,28361,0
3,5,4,320,0,214,28181,0
3,6,3,230,0,215,28271,0
3,fffffffe,7,578,0,212,28013,0
5,1,208,27401,0,6,488,0
5,fffffffe,7,578,0,208,27401,0
2,1,180,24228,0,18,1368,0
2,2,2,140,0,195,25366,0
2,3,2,140,0,195,25366,0
2,4,3,230,0,194,25276,0
2,5,3,230,0,194,25276,0
2,6,2,140,0,195,25366,0
2,fffffffe,7,578,0,191,25018,0
1,1,38,5096,0,182,23602,0
1,2,42,5419,0,179,23369,0
1,3,61,7152,0,159,21546,0
1,4,28,4611,0,192,24087,0
1,5,46,6022,0,174,22676,0
1,fffffffe,7,578,0,214,28210,0
我想将其转换为以下格式:
端口数量可以变化。
当前代码:
python q -H -d "," "select rxpkts, txpkts from ./log.csv where datapath = i and port = j" > i_j.csv;
因此,我制作了i*j
个文件,然后手动将它们组合在一起。 有没有一种方法可以通过修改上面的sql查询或使用Python或注释中建议的pandas 合并文件来一次性完成此操作?
import subprocess
def printit():
for i in range(1,6):
for j in range(1,6):
query = "select rxpkts, txpkts from ./log.csv where datapath = "+str(i)+" and port = "+str(j)
fileName = str(i)+"_"+str(j)+".csv"
with open(fileName, "w+") as f:
p = subprocess.Popen(["python", "q", "-H", "-d", ",", query], stdout=f)
printit()
您可以将set_index
与stack
一起使用。
import pandas as pd
# your data
# ======================================
print(df)
datapath port rxpkts ... txpkts txbytes txerror
0 4 1 178 ... 40 3148 0
1 4 2 3 ... 213 27897 0
2 4 3 3 ... 212 27807 0
3 4 4 4 ... 211 27717 0
4 4 5 3 ... 212 27807 0
5 4 6 3 ... 212 27807 0
6 4 7 4 ... 211 27717 0
7 4 8 4 ... 211 27717 0
8 4 9 4 ... 211 27717 0
9 4 a 4 ... 211 27717 0
.. ... ... ... ... ... ... ...
24 2 4 3 ... 194 25276 0
25 2 5 3 ... 194 25276 0
26 2 6 2 ... 195 25366 0
27 2 fffffffe 7 ... 191 25018 0
28 1 1 38 ... 182 23602 0
29 1 2 42 ... 179 23369 0
30 1 3 61 ... 159 21546 0
31 1 4 28 ... 192 24087 0
32 1 5 46 ... 174 22676 0
33 1 fffffffe 7 ... 214 28210 0
[34 rows x 8 columns]
# reshaping
# ======================================
series_res = df[df.columns[:4]].set_index(['datapath', 'port']).stack()
series_res.name = 'value'
datapath port
4 1 rxpkts 178
rxbytes 25159
2 rxpkts 3
rxbytes 230
3 rxpkts 3
rxbytes 230
4 rxpkts 4
rxbytes 320
5 rxpkts 3
rxbytes 230
...
1 2 rxpkts 42
rxbytes 5419
3 rxpkts 61
rxbytes 7152
4 rxpkts 28
rxbytes 4611
5 rxpkts 46
rxbytes 6022
fffffffe rxpkts 7
rxbytes 578
Name: value, dtype: int64
df_res = pd.DataFrame(series_res)
df_res.T
datapath 4 ... 1
port 1 2 3 ... 4 5 fffffffe
rxpkts rxbytes rxpkts rxbytes rxpkts rxbytes ... rxpkts rxbytes rxpkts rxbytes rxpkts rxbytes
value 178 25159 3 230 3 230 ... 28 4611 46 6022 7 578
[1 rows x 68 columns]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.