![](/img/trans.png)
[英]How to calculate a value grouped by one attribute, but provided in the second column in pandas
[英]How can I best remove lines with different values grouped on one, but keep the same value even if it appears multiple time grouped on one value
幫助那些試圖幫助我的人。 這是我的代碼。 我得到的錯誤和最后的真實數據樣本。
所需的 output 將消除具有多個不同 MAC 值的所有接口。 剛剛從 output 中刪除,我想寫入 excel 文件。 vlan 和 type 列無關緊要。
nf_file = filedialog.askopenfilename() # here we grab a filename to parse
timestr = time.strftime("%m%d")
df = pd.read_csv(nf_file)
#Drop 'interface' with more than one different 'mac'
df['mac_count'] = (df.groupby(['interface'])['mac']).transform('nunique')
df = df.loc[df['mac_count'] == 1]
df = df.drop(['mac_count'], axis=1)
print(df)
我得到的錯誤。
Python 3.6.5(v3.6.5:f59c0932b4,2018 年 3 月 28 日,16:07:46)[MSC v.1900 32 位(英特爾)]
wdir='C:/Users/pythonuser/Desktop/companyinfo - Work File/automate1') Traceback(最近一次通話最后):
文件“C:\Users\pythonuser\AppData\Local\Programs\Python\Python36-32\lib\site-packages\IPython\core\interactiveshell.py”,第 3296 行,在 run_code exec(code_obj, self.user_global_ns, self .user_ns)
文件“”,第 1 行,在 runfile('C:/Users/pythonuser/Desktop/companyinfo - 工作文件/automate1/Python/automate1-v1.py',
wdir='C:/Users/pythonuser/Desktop/companyinfo - 工作文件/automate1') 文件 "C:\Program Files\JetBrains\PyCharm Community Edition 2019.3.1\plugins\python-ce\helpers\pydev_pydev_bundle\pydev_umd.py ", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # 執行腳本
文件“C:\Program Files\JetBrains\PyCharm Community Edition 2019.3.1\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py”,第 18 行,在 execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc) File "C:/Users/pythonuser/Desktop/companyinfo - Work File/automate1/Python/automate1-v1.py",第 16 行, 在
df['mac_count'] = (df.groupby(['interface'])['mac']).transform('nunique') 文件 "C:\Users\pythonuser\AppData\Local\Programs\Python\Python36- 32\lib\site-packages\pandas\core\frame.py",第 5810 行,在 groupby 中觀察=觀察,
文件“C:\Users\pythonuser\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\groupby\groupby.py”,第 410 行,在init mutated=self.mutated 中,
文件“C:\Users\pythonuser\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\groupby\grouper.py”,第 600 行,在 get_grouper 中引發 KeyError(gpr)
真正的 CSV 我正在閱讀 dataframe。 有成千上萬。
vlan mac type interface
1 0000.005e.5344 DYNAMIC Gi1/0/1
1 0010.5f8f.d6e1 DYNAMIC Gi1/0/2
1 0010.5f92.0066 DYNAMIC Gi1/0/2
1 0010.5f64.241f DYNAMIC Gi1/0/2
1 0010.5f64.dd4e DYNAMIC Gi1/0/3
1 0010.5f65.1814 DYNAMIC Gi1/0/3
1 0012.5f18.4425 DYNAMIC Gi1/0/3
1 0012.5f18.61dd DYNAMIC Gi1/0/2
1 0012.5f18.61de DYNAMIC Gi1/0/3
1 0016.2155.18fd DYNAMIC Gi1/0/5
1 0026.5342.5668 DYNAMIC Gi1/0/2
1 0026.5343.1048 DYNAMIC Gi1/0/3
1 0042.680f.1282 DYNAMIC Gi1/0/1
1 0050.600d.5f19 DYNAMIC Gi1/0/3
1 0061.e351.14c5 STATIC Vl1
1 00c0.6558.5d5a DYNAMIC Gi1/0/1
1 00c0.65e1.455a DYNAMIC Gi1/0/1
1 00c0.65fe.1e5a DYNAMIC Gi1/0/1
1 00c0.65fe.1e6e DYNAMIC Gi1/0/1
1 3086.6288.1acf DYNAMIC Gi1/0/5
1 3086.6288.1ad0 DYNAMIC Gi1/0/5
1 546f.6495.fd93 DYNAMIC Gi1/0/4
1 5c5a.c536.689c DYNAMIC Gi1/0/1
1 5c5a.c536.686a DYNAMIC Gi1/0/2
1 5c5a.c536.686e DYNAMIC Gi1/0/4
1 5c5a.c599.3fd4 DYNAMIC Gi1/0/3
1 5c5a.c599.40a6 DYNAMIC Gi1/0/1
1 5c5a.c599.4066 DYNAMIC Gi1/0/3
1 5c5a.c599.40c8 DYNAMIC Gi1/0/3
1 5c5a.c599.40cc DYNAMIC Gi1/0/2
1 5c5a.c568.5118 DYNAMIC Gi1/0/1
1 5c5a.c568.561e DYNAMIC Gi1/0/1
1 8426.2642.5e6d DYNAMIC Gi1/0/6
1 8ce5.4851.2046 DYNAMIC Gi1/0/1
1 6c26.c530.ad32 DYNAMIC Gi1/0/2
1 6c26.c530.ad45 DYNAMIC Gi1/0/2
1 6c26.c530.6c61 DYNAMIC Gi1/0/3
1 6c26.c530.6cc5 DYNAMIC Gi1/0/3
1 6c26.c546.a361 DYNAMIC Gi1/0/1
1 6c26.c546.a3c5 DYNAMIC Gi1/0/1
1 6c26.c563.6331 DYNAMIC Gi1/0/4
1 6c26.c563.6345 DYNAMIC Gi1/0/4
1 e00e.da82.58f2 DYNAMIC Gi1/0/1
501 0000.0c9f.f006 DYNAMIC Gi1/0/1
501 0005.36c4.a5f3 DYNAMIC Gi1/0/4
501 0016.4f51.1a89 DYNAMIC Gi1/0/2
501 0016.4f51.1af4 DYNAMIC Gi1/0/1
501 0016.4f51.1614 DYNAMIC Gi1/0/4
501 0016.4f51.1615 DYNAMIC Gi1/0/4
501 0016.4f51.1625 DYNAMIC Gi1/0/1
501 0016.4f51.1635 DYNAMIC Gi1/0/2
501 0016.4f51.1639 DYNAMIC Gi1/0/1
501 0016.4f51.163c DYNAMIC Gi1/0/4
501 0016.4f51.1641 DYNAMIC Gi1/0/3
501 0016.4f51.1645 DYNAMIC Gi1/0/2
501 0016.4f51.1665 DYNAMIC Gi1/0/4
501 0016.4f51.1666 DYNAMIC Gi1/0/4
501 0016.4f51.1650 DYNAMIC Gi1/0/2
501 0016.4f51.1651 DYNAMIC Gi1/0/3
501 0016.4f51.1654 DYNAMIC Gi1/0/2
501 0016.4f51.1656 DYNAMIC Gi1/0/1
501 0016.4f51.1655 DYNAMIC Gi1/0/4
501 0016.4f51.1656 DYNAMIC Gi1/0/4
501 0016.4f51.2ac0 DYNAMIC Gi1/0/1
501 0016.4f51.2ac8 DYNAMIC Gi1/0/4
501 0016.4f51.2ac9 DYNAMIC Gi1/0/4
501 0016.4f51.2acc DYNAMIC Gi1/0/1
501 0016.4f51.2ae9 DYNAMIC Gi1/0/1
501 0016.4f51.2aec DYNAMIC Gi1/0/2
501 0016.4f51.2af1 DYNAMIC Gi1/0/2
501 0016.4f51.2af6 DYNAMIC Gi1/0/3
501 0016.4f51.2afc DYNAMIC Gi1/0/4
501 0016.4f51.2606 DYNAMIC Gi1/0/3
501 0016.4f51.2608 DYNAMIC Gi1/0/1
501 0016.4f51.2618 DYNAMIC Gi1/0/3
501 0016.4f6f.fda8 DYNAMIC Gi1/0/4
501 0016.4f6f.fd60 DYNAMIC Gi1/0/4
501 0016.4f6f.fd66 DYNAMIC Gi1/0/4
501 0016.4f6f.fdd3 DYNAMIC Gi1/0/4
501 0016.4f6f.fdd5 DYNAMIC Gi1/0/1
501 0016.4f6f.fddc DYNAMIC Gi1/0/4
501 0016.4f6f.fde4 DYNAMIC Gi1/0/4
501 0016.4f6f.fe14 DYNAMIC Gi1/0/4
501 0016.4f6f.fe15 DYNAMIC Gi1/0/1
501 0016.4f6f.fe18 DYNAMIC Gi1/0/1
501 0016.4f6f.fe26 DYNAMIC Gi1/0/3
501 0016.4f6f.fe43 DYNAMIC Gi1/0/2
501 0042.680f.1282 DYNAMIC Gi1/0/1
501 0061.e351.14f6 STATIC Vl501
501 5038.eec8.63fc DYNAMIC Gi1/0/3
501 5038.eec8.6401 DYNAMIC Gi1/0/4
501 5038.eec8.862e DYNAMIC Gi1/0/3
501 5038.eec8.8ca4 DYNAMIC Gi1/0/3
501 5038.eec8.8c61 DYNAMIC Gi1/0/4
501 5038.eec8.8c6e DYNAMIC Gi1/0/3
501 5038.eec8.8c6f DYNAMIC Gi1/0/3
501 5038.eec8.8e40 DYNAMIC Gi1/0/2
501 5038.eec8.8e4f DYNAMIC Gi1/0/4
501 5038.eec8.8f00 DYNAMIC Gi1/0/2
501 5038.eec8.8f03 DYNAMIC Gi1/0/2
501 5038.eec8.8f1a DYNAMIC Gi1/0/2
501 5038.eec8.8f1c DYNAMIC Gi1/0/4
501 5038.eec8.8f21 DYNAMIC Gi1/0/2
501 5038.eec8.8f25 DYNAMIC Gi1/0/4
501 5038.eec8.8f29 DYNAMIC Gi1/0/2
501 5038.eec8.8f38 DYNAMIC Gi1/0/2
501 5038.eec8.8f39 DYNAMIC Gi1/0/2
501 5038.eec8.8f41 DYNAMIC Gi1/0/2
501 5038.eec8.8f44 DYNAMIC Gi1/0/3
501 5038.eec8.8f46 DYNAMIC Gi1/0/3
501 5038.eec8.8f45 DYNAMIC Gi1/0/2
501 5038.eec8.8f4f DYNAMIC Gi1/0/4
501 5038.eec8.8f61 DYNAMIC Gi1/0/2
501 5038.eec8.8f68 DYNAMIC Gi1/0/1
501 5038.eec8.8f54 DYNAMIC Gi1/0/4
501 5038.eec8.8f56 DYNAMIC Gi1/0/2
501 5038.eec8.8f5e DYNAMIC Gi1/0/2
501 5038.eec8.8f82 DYNAMIC Gi1/0/1
501 5038.eec8.8f86 DYNAMIC Gi1/0/2
501 5038.eec8.8f8e DYNAMIC Gi1/0/4
501 e00e.da82.58f2 DYNAMIC Gi1/0/1
您可以為此使用 Pandas。 假設您將數據加載到 dataframe 中,如下所示:
0 1 2 3 4
0 MYSWITCH 1 0000.007e.7344 DYNAMIC Gi1/0/1
1 MYSWITCH 1 00c0.b778.7d5a DYNAMIC Gi1/0/1
2 MYSWITCH 1 00c0.b7e1.455a DYNAMIC Gi1/0/1
3 MYSWITCH 1 00c0.b7fe.1e5a DYNAMIC Gi1/0/1
4 MYSWITCH 1 00c0.b7fe.1e6e DYNAMIC Gi1/0/1
5 MYSWITCH 1 5c5a.c73b.689c DYNAMIC Gi1/0/1
6 MYSWITCH 1 5c5a.c799.40a6 DYNAMIC Gi1/0/1
7 MYSWITCH 1 5c5a.c7b8.7118 DYNAMIC Gi1/0/1
8 MYSWITCH 1 5c5a.c7b8.761e DYNAMIC Gi1/0/1
9 MYSWITCH 1 8ce7.4871.204b DYNAMIC Gi1/0/1
10 MYSWITCH 1 bc26.c74b.a3b1 DYNAMIC Gi1/0/1
11 MYSWITCH 1 bc26.c74b.a3c7 DYNAMIC Gi1/0/1
12 MYSWITCH 1 001b.2175.18fd DYNAMIC Gi1/0/5
13 MYSWITCH 1 e00e.da82.78f2 DYNAMIC Gi1/0/11
14 MYSWITCH 3 e00e.da82.78f2 DYNAMIC Gi1/0/11
15 MYSWITCH 1 0042.680f.1282 DYNAMIC Gi1/0/12
16 MYSWITCH 3 0042.680f.1282 DYNAMIC Gi1/0/12
您只能 select 接口(第 4 列)具有 1 個唯一 MAC(第 2 列)的行。
import pandas as pd
df.loc[df.groupby(4)[2].transform('nunique')==1]
Output
0 1 2 3 4
12 MYSWITCH 1 001b.2175.18fd DYNAMIC Gi1/0/5
13 MYSWITCH 1 e00e.da82.78f2 DYNAMIC Gi1/0/11
14 MYSWITCH 3 e00e.da82.78f2 DYNAMIC Gi1/0/11
15 MYSWITCH 1 0042.680f.1282 DYNAMIC Gi1/0/12
16 MYSWITCH 3 0042.680f.1282 DYNAMIC Gi1/0/12
編輯 - 3 :
下面的代碼刪除了具有多個不同mac
的所有interface
行。 這是在下面的中間 Output 中。 接下來,可選行刪除所有重復記錄。
注意:示例DataFrame
中的某些mac
值已更改,可能與 Edit-1 或問題不同。
# Import libraries
import pandas as pd
# Create a sample DataFrame
df = pd.DataFrame({
'myswitch': ['MYSWITCH']*20,
'num': [1]*20,
'mac':[np.nan,'0000.007e.7344','0000.007e.7344',
'00c0.b7fe.1e5a','00c0.b7fe.1e5a','00c0.b7fe.1e5a',
'5c5a.c799.40a6','5c5a.c799.40a6','5c5a.c799.40a6',
'5c5a.c799.40a6','5c5a.c799.40a6','bc26.c74b.a3c7',
'0042.680f.9999','1111.680f.1282','1111.680f.1282',
'0042.680f.1282','0042.680f.1282', np.nan,None, ""
],
'dynamic': ['DYNAMIC']*20,
'interface':['Gi1/0/1']*12 + ['Gi1/0/5'] + ['Gi1/0/11']*2 + ['Gi1/0/12']*2 +
['Gi99/99/99'] + ['Gi88/88/88'] + ['Gi77/77/77']
})
# Drop 'interface' with more than one different 'mac'
df['mac_count'] = (df.groupby(['interface'])['mac']).transform('nunique')
df = df.loc[df['mac_count']==1]
df = df.drop(['mac_count'], axis=1)
print(df)
中間 output - 1:
myswitch num mac dynamic interface
12 MYSWITCH 1 0042.680f.9999 DYNAMIC Gi1/0/5
13 MYSWITCH 1 1111.680f.1282 DYNAMIC Gi1/0/11
14 MYSWITCH 1 1111.680f.1282 DYNAMIC Gi1/0/11
15 MYSWITCH 1 0042.680f.1282 DYNAMIC Gi1/0/12
16 MYSWITCH 1 0042.680f.1282 DYNAMIC Gi1/0/12
19 MYSWITCH 1 DYNAMIC Gi77/77/77
刪除帶有空白mac
的行:
# Replace and drop blank 'mac'
df['mac'] = df['mac'].fillna(np.nan)
df['mac'] = df['mac'].replace(r'\s+',np.nan,regex=True).replace('',np.nan)
df = df[~df['mac'].isna()]
print(df)
中間 output - 2:
myswitch num mac dynamic interface
12 MYSWITCH 1 0042.680f.9999 DYNAMIC Gi1/0/5
13 MYSWITCH 1 1111.680f.1282 DYNAMIC Gi1/0/11
14 MYSWITCH 1 1111.680f.1282 DYNAMIC Gi1/0/11
15 MYSWITCH 1 0042.680f.1282 DYNAMIC Gi1/0/12
16 MYSWITCH 1 0042.680f.1282 DYNAMIC Gi1/0/12
接下來,如果需要,刪除所有重復的行:
# Drop duplicate rows
df = df.drop_duplicates()
print(df)
Output:
myswitch num mac dynamic interface
12 MYSWITCH 1 0042.680f.9999 DYNAMIC Gi1/0/5
13 MYSWITCH 1 1111.680f.1282 DYNAMIC Gi1/0/11
15 MYSWITCH 1 0042.680f.1282 DYNAMIC Gi1/0/12
編輯 - 1
使用.drop_duplicates()
替代代碼:
注1:下面的示例數據與問題不完全相同。 添加了更多重復項。
注 2:可以通過在代碼df = df.drop_duplicates(subset=['mac', 'interface'])
中指定subset=
中的列名來基於一個或多個列刪除重復項。 例如:
subset='mac'
,則只有 output 中的唯一mac
。subset=['mac','interface']
,則 output 中的mac
和interface
的唯一組合,這樣可能會有重復的mac
屬於不同的interface
。選項#1 :
使用來自 .txt 文件的數據
### Import libraries
import pandas as pd
### Create DataFrame
# Read data from *.txt file
path = "<input path here>"
df = pd.read_csv(path+'data.txt', header=None)
# Split and rename columns
df = df['mac'].str.split(' ', expand=True)
df.columns = ['myswitch','num','mac','dynamic','interface']
# Remove duplicates based on column 'mac'
df = df.drop_duplicates(subset=['mac', 'interface'])
# Output
print(df)
選項#2 :
使用樣品 DataFrame
# Option-2: Manually create a sample DataFrame
df = pd.DataFrame({
'myswitch': ['MYSWITCH']*17,
'num': [1]*17,
'mac':['0000.007e.7344','0000.007e.7344','0000.007e.7344',
'00c0.b7fe.1e5a','00c0.b7fe.1e5a','00c0.b7fe.1e5a',
'5c5a.c799.40a6','5c5a.c799.40a6','5c5a.c799.40a6',
'5c5a.c799.40a6','5c5a.c799.40a6','bc26.c74b.a3c7',
'0042.680f.1282','0042.680f.1282','0042.680f.1282',
'0042.680f.1282','0042.680f.1282'
],
'dynamic': ['DYNAMIC']*17,
'interface':['Gi1/0/1']*12 + ['Gi1/0/5'] + ['Gi1/0/11']*2 + ['Gi1/0/12']*2
})
# Remove duplicates based on column 'mac'
df = df.drop_duplicates(subset=['mac', 'interface'])
Output
print(df)
myswitch num mac dynamic interface
0 MYSWITCH 1 0000.007e.7344 DYNAMIC Gi1/0/1
3 MYSWITCH 1 00c0.b7fe.1e5a DYNAMIC Gi1/0/1
6 MYSWITCH 1 5c5a.c799.40a6 DYNAMIC Gi1/0/1
11 MYSWITCH 1 bc26.c74b.a3c7 DYNAMIC Gi1/0/1
12 MYSWITCH 1 0042.680f.1282 DYNAMIC Gi1/0/5
13 MYSWITCH 1 0042.680f.1282 DYNAMIC Gi1/0/11
15 MYSWITCH 1 0042.680f.1282 DYNAMIC Gi1/0/12
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.