[英]How to add CSV file names as Column Headers in a dataframe using pandas?
[英]Pandas : how to add Column name on dataframe on csv file
新但对Python感到兴奋,我需要您的建议。 我想出了以下代码来比较基于nmap扫描的两个CSV文件:
import pandas as pd
from pandas import DataFrame
import os
file = raw_input('\nEnter the Old CSV file: ')
file1 = raw_input('\nEnter the New CSV file: ')
A=set(pd.read_csv(file, index_col=False, header=None)[0])
B=set(pd.read_csv(file1, index_col=False, header=None)[0])
final=list(A-B)
df = pd.DataFrame(final, columns=["host"])
df.to_csv('DIFF_'+file)
print "Completed!"
当我运行它时,我得到以下结果:
host
0,82.214.228.71;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
1,82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
我的问题是如何在第2,3列等上添加标签/ enter code here
,例如:hostanme,port,port name,state等。我试过了:df ['hostname'] = range(1,len( df)+ 1),但是当我使用Excel打开文件时,这会在主机的第一列中添加主机名
我认为您需要带参数sep=','
read_csv
和names
以便首先定义列名称:
file = raw_input('\nEnter the Old CSV file: ')
file1 = raw_input('\nEnter the New CSV file: ')
cols = ['hostname','port','portname', ...]
A= pd.read_csv(file, index_col=False, header=None, sep=';', names=cols)
B= pd.read_csv(file1, index_col=False, header=None, sep=';', names=cols)
然后,如果需要比较所有列,则将merge
与通过boolean indexing
进行比较一起使用:
df = pd.merge(A, B, how='outer', indicator=True)
df = df[df['_merge']=='left_only'].drop('_merge',axis=1)
df.to_csv('DIFF_'+file)
print "Completed!"
样品 :
import pandas as pd
from pandas.compat import StringIO
temp=u"""82.214.228.71;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
82.214.228.74;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
82.214.228.75;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
cols = ['hostname','port','portname', 'a','b','c','d','e','f','g','h','i', 'j']
A = pd.read_csv(StringIO(temp), sep=";", names=cols)
print (A)
hostname port portname a b c \
0 82.214.228.71 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
1 82.214.228.70 dsl-radius-01.direcpceu.com PTR tcp 111 rpcbind
2 82.214.228.74 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
3 82.214.228.75 dsl-radius-01.direcpceu.com PTR tcp 111 rpcbind
d e f g h i j
0 open NaN NaN syn-ack NaN 3 NaN
1 open NaN NaN syn-ack NaN 3 NaN
2 open NaN NaN syn-ack NaN 3 NaN
3 open NaN NaN syn-ack NaN 3 NaN
temp=u"""82.214.228.75;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
82.214.228.70;dsl-radius-01.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
82.214.228.77;dsl-radius-02.direcpceu.com;PTR;tcp;111;rpcbind;open;;;syn-ack;;3;
"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
cols = ['hostname','port','portname', 'a','b','c','d','e','f','g','h','i', 'j']
B = pd.read_csv(StringIO(temp), sep=";", names=cols)
print (B)
hostname port portname a b c \
0 82.214.228.75 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
1 82.214.228.70 dsl-radius-01.direcpceu.com PTR tcp 111 rpcbind
2 82.214.228.77 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
d e f g h i j
0 open NaN NaN syn-ack NaN 3 NaN
1 open NaN NaN syn-ack NaN 3 NaN
2 open NaN NaN syn-ack NaN 3 NaN
df1 = pd.merge(A, B, how='outer', indicator=True)
print (df1)
hostname port portname a b c \
0 82.214.228.71 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
1 82.214.228.70 dsl-radius-01.direcpceu.com PTR tcp 111 rpcbind
2 82.214.228.74 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
3 82.214.228.75 dsl-radius-01.direcpceu.com PTR tcp 111 rpcbind
4 82.214.228.75 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
5 82.214.228.77 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
d e f g h i j _merge
0 open NaN NaN syn-ack NaN 3 NaN left_only
1 open NaN NaN syn-ack NaN 3 NaN both
2 open NaN NaN syn-ack NaN 3 NaN left_only
3 open NaN NaN syn-ack NaN 3 NaN left_only
4 open NaN NaN syn-ack NaN 3 NaN right_only
5 open NaN NaN syn-ack NaN 3 NaN right_only
#only values in A
df1 = df1[df1['_merge']=='left_only'].drop('_merge',axis=1)
print (df1)
hostname port portname a b c \
0 82.214.228.71 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
2 82.214.228.74 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
3 82.214.228.75 dsl-radius-01.direcpceu.com PTR tcp 111 rpcbind
d e f g h i j
0 open NaN NaN syn-ack NaN 3 NaN
2 open NaN NaN syn-ack NaN 3 NaN
3 open NaN NaN syn-ack NaN 3 NaN
#only values in B
df1 = pd.merge(A, B, how='outer', indicator=True)
df11 = df1[df1['_merge']=='right_only'].drop('_merge',axis=1)
print (df11)
hostname port portname a b c \
4 82.214.228.75 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
5 82.214.228.77 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
d e f g h i j
4 open NaN NaN syn-ack NaN 3 NaN
5 open NaN NaN syn-ack NaN 3 NaN
#same values in both dataframes
df12 = df1[df1['_merge']=='both'].drop('_merge',axis=1)
print (df12)
hostname port portname a b c \
1 82.214.228.70 dsl-radius-01.direcpceu.com PTR tcp 111 rpcbind
d e f g h i j
1 open NaN NaN syn-ack NaN 3 NaN
但是,如果只需要比较第一列hostname
则将isin
用作掩码, ~
用于通过boolean indexing
进行反转:
df2 = A[~A['hostname'].isin(B['hostname'])]
print (df2)
hostname port portname a b c \
0 82.214.228.71 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
2 82.214.228.74 dsl-radius-02.direcpceu.com PTR tcp 111 rpcbind
d e f g h i j
0 open NaN NaN syn-ack NaN 3 NaN
2 open NaN NaN syn-ack NaN 3 NaN
您可以在定义数据框的位置添加标签。 例如,以下应该工作
df = pd.DataFrame(final, columns=["host"].append([x for x in range(1, len(df) + 1)] ))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.