簡體   English   中英

我怎樣才能最好地刪除具有不同值的行分組在一個上,但即使它出現多個時間分組在一個值上,也要保持相同的值

[英]How can I best remove lines with different values grouped on one, but keep the same value even if it appears multiple time grouped on one value

幫助那些試圖幫助我的人。 這是我的代碼。 我得到的錯誤和最后的真實數據樣本。

所需的 output 將消除具有多個不同 MAC 值的所有接口。 剛剛從 output 中刪除,我想寫入 excel 文件。 vlan 和 type 列無關緊要。

nf_file = filedialog.askopenfilename()  # here we grab a filename to parse
timestr = time.strftime("%m%d")
df = pd.read_csv(nf_file)
#Drop 'interface' with more than one different 'mac'
df['mac_count'] = (df.groupby(['interface'])['mac']).transform('nunique')
df = df.loc[df['mac_count'] == 1]
df = df.drop(['mac_count'], axis=1)
print(df)

我得到的錯誤。

Python 3.6.5(v3.6.5:f59c0932b4,2018 年 3 月 28 日,16:07:46)[MSC v.1900 32 位(英特爾)]
wdir='C:/Users/pythonuser/Desktop/companyinfo - Work File/automate1') Traceback(最近一次通話最后):
文件“C:\Users\pythonuser\AppData\Local\Programs\Python\Python36-32\lib\site-packages\IPython\core\interactiveshell.py”,第 3296 行,在 run_code exec(code_obj, self.user_global_ns, self .user_ns)
文件“”,第 1 行,在 runfile('C:/Users/pythonuser/Desktop/companyinfo - 工作文件/automate1/Python/automate1-v1.py',
wdir='C:/Users/pythonuser/Desktop/companyinfo - 工作文件/automate1') 文件 "C:\Program Files\JetBrains\PyCharm Community Edition 2019.3.1\plugins\python-ce\helpers\pydev_pydev_bundle\pydev_umd.py ", line 197, in runfile pydev_imports.execfile(filename, global_vars, local_vars) # 執行腳本
文件“C:\Program Files\JetBrains\PyCharm Community Edition 2019.3.1\plugins\python-ce\helpers\pydev_pydev_imps_pydev_execfile.py”,第 18 行,在 execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc) File "C:/Users/pythonuser/Desktop/companyinfo - Work File/automate1/Python/automate1-v1.py",第 16 行, 在
df['mac_count'] = (df.groupby(['interface'])['mac']).transform('nunique') 文件 "C:\Users\pythonuser\AppData\Local\Programs\Python\Python36- 32\lib\site-packages\pandas\core\frame.py",第 5810 行,在 groupby 中觀察=觀察,
文件“C:\Users\pythonuser\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\groupby\groupby.py”,第 410 行,在init mutated=self.mutated 中,
文件“C:\Users\pythonuser\AppData\Local\Programs\Python\Python36-32\lib\site-packages\pandas\core\groupby\grouper.py”,第 600 行,在 get_grouper 中引發 KeyError(gpr)

真正的 CSV 我正在閱讀 dataframe。 有成千上萬。

vlan     mac       type      interface    
   1    0000.005e.5344    DYNAMIC     Gi1/0/1  
   1    0010.5f8f.d6e1    DYNAMIC     Gi1/0/2  
   1    0010.5f92.0066    DYNAMIC     Gi1/0/2  
   1    0010.5f64.241f    DYNAMIC     Gi1/0/2  
   1    0010.5f64.dd4e    DYNAMIC     Gi1/0/3  
   1    0010.5f65.1814    DYNAMIC     Gi1/0/3  
   1    0012.5f18.4425    DYNAMIC     Gi1/0/3  
   1    0012.5f18.61dd    DYNAMIC     Gi1/0/2  
   1    0012.5f18.61de    DYNAMIC     Gi1/0/3  
   1    0016.2155.18fd    DYNAMIC     Gi1/0/5  
   1    0026.5342.5668    DYNAMIC     Gi1/0/2  
   1    0026.5343.1048    DYNAMIC     Gi1/0/3  
   1    0042.680f.1282    DYNAMIC     Gi1/0/1  
   1    0050.600d.5f19    DYNAMIC     Gi1/0/3  
   1    0061.e351.14c5    STATIC      Vl1  
   1    00c0.6558.5d5a    DYNAMIC     Gi1/0/1  
   1    00c0.65e1.455a    DYNAMIC     Gi1/0/1  
   1    00c0.65fe.1e5a    DYNAMIC     Gi1/0/1  
   1    00c0.65fe.1e6e    DYNAMIC     Gi1/0/1  
   1    3086.6288.1acf    DYNAMIC     Gi1/0/5  
   1    3086.6288.1ad0    DYNAMIC     Gi1/0/5  
   1    546f.6495.fd93    DYNAMIC     Gi1/0/4  
   1    5c5a.c536.689c    DYNAMIC     Gi1/0/1  
   1    5c5a.c536.686a    DYNAMIC     Gi1/0/2  
   1    5c5a.c536.686e    DYNAMIC     Gi1/0/4  
   1    5c5a.c599.3fd4    DYNAMIC     Gi1/0/3  
   1    5c5a.c599.40a6    DYNAMIC     Gi1/0/1  
   1    5c5a.c599.4066    DYNAMIC     Gi1/0/3  
   1    5c5a.c599.40c8    DYNAMIC     Gi1/0/3  
   1    5c5a.c599.40cc    DYNAMIC     Gi1/0/2  
   1    5c5a.c568.5118    DYNAMIC     Gi1/0/1  
   1    5c5a.c568.561e    DYNAMIC     Gi1/0/1  
   1    8426.2642.5e6d    DYNAMIC     Gi1/0/6  
   1    8ce5.4851.2046    DYNAMIC     Gi1/0/1  
   1    6c26.c530.ad32    DYNAMIC     Gi1/0/2  
   1    6c26.c530.ad45    DYNAMIC     Gi1/0/2  
   1    6c26.c530.6c61    DYNAMIC     Gi1/0/3  
   1    6c26.c530.6cc5    DYNAMIC     Gi1/0/3  
   1    6c26.c546.a361    DYNAMIC     Gi1/0/1  
   1    6c26.c546.a3c5    DYNAMIC     Gi1/0/1  
   1    6c26.c563.6331    DYNAMIC     Gi1/0/4  
   1    6c26.c563.6345    DYNAMIC     Gi1/0/4  
   1    e00e.da82.58f2    DYNAMIC     Gi1/0/1  
 501    0000.0c9f.f006    DYNAMIC     Gi1/0/1  
 501    0005.36c4.a5f3    DYNAMIC     Gi1/0/4  
 501    0016.4f51.1a89    DYNAMIC     Gi1/0/2  
 501    0016.4f51.1af4    DYNAMIC     Gi1/0/1  
 501    0016.4f51.1614    DYNAMIC     Gi1/0/4  
 501    0016.4f51.1615    DYNAMIC     Gi1/0/4  
 501    0016.4f51.1625    DYNAMIC     Gi1/0/1  
 501    0016.4f51.1635    DYNAMIC     Gi1/0/2  
 501    0016.4f51.1639    DYNAMIC     Gi1/0/1  
 501    0016.4f51.163c    DYNAMIC     Gi1/0/4  
 501    0016.4f51.1641    DYNAMIC     Gi1/0/3  
 501    0016.4f51.1645    DYNAMIC     Gi1/0/2  
 501    0016.4f51.1665    DYNAMIC     Gi1/0/4  
 501    0016.4f51.1666    DYNAMIC     Gi1/0/4  
 501    0016.4f51.1650    DYNAMIC     Gi1/0/2  
 501    0016.4f51.1651    DYNAMIC     Gi1/0/3  
 501    0016.4f51.1654    DYNAMIC     Gi1/0/2  
 501    0016.4f51.1656    DYNAMIC     Gi1/0/1  
 501    0016.4f51.1655    DYNAMIC     Gi1/0/4  
 501    0016.4f51.1656    DYNAMIC     Gi1/0/4  
 501    0016.4f51.2ac0    DYNAMIC     Gi1/0/1  
 501    0016.4f51.2ac8    DYNAMIC     Gi1/0/4  
 501    0016.4f51.2ac9    DYNAMIC     Gi1/0/4  
 501    0016.4f51.2acc    DYNAMIC     Gi1/0/1  
 501    0016.4f51.2ae9    DYNAMIC     Gi1/0/1  
 501    0016.4f51.2aec    DYNAMIC     Gi1/0/2  
 501    0016.4f51.2af1    DYNAMIC     Gi1/0/2  
 501    0016.4f51.2af6    DYNAMIC     Gi1/0/3  
 501    0016.4f51.2afc    DYNAMIC     Gi1/0/4  
 501    0016.4f51.2606    DYNAMIC     Gi1/0/3  
 501    0016.4f51.2608    DYNAMIC     Gi1/0/1  
 501    0016.4f51.2618    DYNAMIC     Gi1/0/3  
 501    0016.4f6f.fda8    DYNAMIC     Gi1/0/4  
 501    0016.4f6f.fd60    DYNAMIC     Gi1/0/4  
 501    0016.4f6f.fd66    DYNAMIC     Gi1/0/4  
 501    0016.4f6f.fdd3    DYNAMIC     Gi1/0/4  
 501    0016.4f6f.fdd5    DYNAMIC     Gi1/0/1  
 501    0016.4f6f.fddc    DYNAMIC     Gi1/0/4  
 501    0016.4f6f.fde4    DYNAMIC     Gi1/0/4  
 501    0016.4f6f.fe14    DYNAMIC     Gi1/0/4  
 501    0016.4f6f.fe15    DYNAMIC     Gi1/0/1  
 501    0016.4f6f.fe18    DYNAMIC     Gi1/0/1  
 501    0016.4f6f.fe26    DYNAMIC     Gi1/0/3  
 501    0016.4f6f.fe43    DYNAMIC     Gi1/0/2  
 501    0042.680f.1282    DYNAMIC     Gi1/0/1  
 501    0061.e351.14f6    STATIC      Vl501  
 501    5038.eec8.63fc    DYNAMIC     Gi1/0/3  
 501    5038.eec8.6401    DYNAMIC     Gi1/0/4  
 501    5038.eec8.862e    DYNAMIC     Gi1/0/3  
 501    5038.eec8.8ca4    DYNAMIC     Gi1/0/3  
 501    5038.eec8.8c61    DYNAMIC     Gi1/0/4  
 501    5038.eec8.8c6e    DYNAMIC     Gi1/0/3  
 501    5038.eec8.8c6f    DYNAMIC     Gi1/0/3  
 501    5038.eec8.8e40    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8e4f    DYNAMIC     Gi1/0/4  
 501    5038.eec8.8f00    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f03    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f1a    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f1c    DYNAMIC     Gi1/0/4  
 501    5038.eec8.8f21    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f25    DYNAMIC     Gi1/0/4  
 501    5038.eec8.8f29    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f38    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f39    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f41    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f44    DYNAMIC     Gi1/0/3  
 501    5038.eec8.8f46    DYNAMIC     Gi1/0/3  
 501    5038.eec8.8f45    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f4f    DYNAMIC     Gi1/0/4  
 501    5038.eec8.8f61    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f68    DYNAMIC     Gi1/0/1  
 501    5038.eec8.8f54    DYNAMIC     Gi1/0/4  
 501    5038.eec8.8f56    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f5e    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f82    DYNAMIC     Gi1/0/1  
 501    5038.eec8.8f86    DYNAMIC     Gi1/0/2  
 501    5038.eec8.8f8e    DYNAMIC     Gi1/0/4  
 501    e00e.da82.58f2    DYNAMIC     Gi1/0/1  

您可以為此使用 Pandas。 假設您將數據加載到 dataframe 中,如下所示:

    0           1   2               3       4
0   MYSWITCH    1   0000.007e.7344  DYNAMIC Gi1/0/1
1   MYSWITCH    1   00c0.b778.7d5a  DYNAMIC Gi1/0/1
2   MYSWITCH    1   00c0.b7e1.455a  DYNAMIC Gi1/0/1
3   MYSWITCH    1   00c0.b7fe.1e5a  DYNAMIC Gi1/0/1
4   MYSWITCH    1   00c0.b7fe.1e6e  DYNAMIC Gi1/0/1
5   MYSWITCH    1   5c5a.c73b.689c  DYNAMIC Gi1/0/1
6   MYSWITCH    1   5c5a.c799.40a6  DYNAMIC Gi1/0/1
7   MYSWITCH    1   5c5a.c7b8.7118  DYNAMIC Gi1/0/1
8   MYSWITCH    1   5c5a.c7b8.761e  DYNAMIC Gi1/0/1
9   MYSWITCH    1   8ce7.4871.204b  DYNAMIC Gi1/0/1
10  MYSWITCH    1   bc26.c74b.a3b1  DYNAMIC Gi1/0/1
11  MYSWITCH    1   bc26.c74b.a3c7  DYNAMIC Gi1/0/1
12  MYSWITCH    1   001b.2175.18fd  DYNAMIC Gi1/0/5
13  MYSWITCH    1   e00e.da82.78f2  DYNAMIC Gi1/0/11
14  MYSWITCH    3   e00e.da82.78f2  DYNAMIC Gi1/0/11
15  MYSWITCH    1   0042.680f.1282  DYNAMIC Gi1/0/12
16  MYSWITCH    3   0042.680f.1282  DYNAMIC Gi1/0/12

您只能 select 接口(第 4 列)具有 1 個唯一 MAC(第 2 列)的行。

import pandas as pd
df.loc[df.groupby(4)[2].transform('nunique')==1]

Output

    0           1   2               3       4
12  MYSWITCH    1   001b.2175.18fd  DYNAMIC Gi1/0/5
13  MYSWITCH    1   e00e.da82.78f2  DYNAMIC Gi1/0/11
14  MYSWITCH    3   e00e.da82.78f2  DYNAMIC Gi1/0/11
15  MYSWITCH    1   0042.680f.1282  DYNAMIC Gi1/0/12
16  MYSWITCH    3   0042.680f.1282  DYNAMIC Gi1/0/12

編輯 - 3

下面的代碼刪除了具有多個不同mac的所有interface行。 這是在下面的中間 Output 中。 接下來,可選行刪除所有重復記錄。

注意:示例DataFrame中的某些mac值已更改,可能與 Edit-1 或問題不同。

# Import libraries
import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({
    'myswitch': ['MYSWITCH']*20,
    'num': [1]*20,
    'mac':[np.nan,'0000.007e.7344','0000.007e.7344',
           '00c0.b7fe.1e5a','00c0.b7fe.1e5a','00c0.b7fe.1e5a',
           '5c5a.c799.40a6','5c5a.c799.40a6','5c5a.c799.40a6',
           '5c5a.c799.40a6','5c5a.c799.40a6','bc26.c74b.a3c7',
           '0042.680f.9999','1111.680f.1282','1111.680f.1282',
           '0042.680f.1282','0042.680f.1282', np.nan,None, ""
          ],
    'dynamic': ['DYNAMIC']*20,
    'interface':['Gi1/0/1']*12 + ['Gi1/0/5'] + ['Gi1/0/11']*2 + ['Gi1/0/12']*2 +
                 ['Gi99/99/99'] + ['Gi88/88/88'] + ['Gi77/77/77']
    
    
})

# Drop 'interface' with more than one different 'mac'
df['mac_count'] = (df.groupby(['interface'])['mac']).transform('nunique')
df = df.loc[df['mac_count']==1]
df = df.drop(['mac_count'], axis=1)
print(df)

中間 output - 1:

    myswitch  num             mac  dynamic   interface
12  MYSWITCH    1  0042.680f.9999  DYNAMIC     Gi1/0/5
13  MYSWITCH    1  1111.680f.1282  DYNAMIC    Gi1/0/11
14  MYSWITCH    1  1111.680f.1282  DYNAMIC    Gi1/0/11
15  MYSWITCH    1  0042.680f.1282  DYNAMIC    Gi1/0/12
16  MYSWITCH    1  0042.680f.1282  DYNAMIC    Gi1/0/12
19  MYSWITCH    1                  DYNAMIC  Gi77/77/77

刪除帶有空白mac的行:

# Replace and drop blank 'mac'
df['mac'] = df['mac'].fillna(np.nan)
df['mac'] = df['mac'].replace(r'\s+',np.nan,regex=True).replace('',np.nan)
df = df[~df['mac'].isna()]
print(df)

中間 output - 2:

    myswitch  num             mac  dynamic interface
12  MYSWITCH    1  0042.680f.9999  DYNAMIC   Gi1/0/5
13  MYSWITCH    1  1111.680f.1282  DYNAMIC  Gi1/0/11
14  MYSWITCH    1  1111.680f.1282  DYNAMIC  Gi1/0/11
15  MYSWITCH    1  0042.680f.1282  DYNAMIC  Gi1/0/12
16  MYSWITCH    1  0042.680f.1282  DYNAMIC  Gi1/0/12

接下來,如果需要,刪除所有重復的行:

# Drop duplicate rows
df = df.drop_duplicates()
print(df)

Output:

    myswitch  num             mac  dynamic interface
12  MYSWITCH    1  0042.680f.9999  DYNAMIC   Gi1/0/5
13  MYSWITCH    1  1111.680f.1282  DYNAMIC  Gi1/0/11
15  MYSWITCH    1  0042.680f.1282  DYNAMIC  Gi1/0/12


編輯 - 1

使用.drop_duplicates()替代代碼:

注1:下面的示例數據與問題不完全相同。 添加了更多重復項。

注 2:可以通過在代碼df = df.drop_duplicates(subset=['mac', 'interface'])中指定subset=中的列名來基於一個或多個列刪除重復項。 例如:

  • 如果subset='mac' ,則只有 output 中的唯一mac
  • 如果subset=['mac','interface'] ,則 output 中的macinterface的唯一組合,這樣可能會有重復的mac屬於不同的interface

選項#1

使用來自 .txt 文件的數據

### Import libraries
import pandas as pd

### Create DataFrame
# Read data from *.txt file 
path = "<input path here>"
df = pd.read_csv(path+'data.txt', header=None)

# Split and rename columns
df = df['mac'].str.split(' ', expand=True)
df.columns = ['myswitch','num','mac','dynamic','interface']

# Remove duplicates based on column 'mac'
df = df.drop_duplicates(subset=['mac', 'interface'])

# Output
print(df)

選項#2

使用樣品 DataFrame

# Option-2: Manually create a sample DataFrame
df = pd.DataFrame({
    'myswitch': ['MYSWITCH']*17,
    'num': [1]*17,
    'mac':['0000.007e.7344','0000.007e.7344','0000.007e.7344',
           '00c0.b7fe.1e5a','00c0.b7fe.1e5a','00c0.b7fe.1e5a',
           '5c5a.c799.40a6','5c5a.c799.40a6','5c5a.c799.40a6',
           '5c5a.c799.40a6','5c5a.c799.40a6','bc26.c74b.a3c7',
           '0042.680f.1282','0042.680f.1282','0042.680f.1282',
           '0042.680f.1282','0042.680f.1282'
          ],
    'dynamic': ['DYNAMIC']*17,
    'interface':['Gi1/0/1']*12 + ['Gi1/0/5'] + ['Gi1/0/11']*2 + ['Gi1/0/12']*2
    
    
})
# Remove duplicates based on column 'mac'
df = df.drop_duplicates(subset=['mac', 'interface'])

Output

print(df)

    myswitch  num             mac  dynamic interface
0   MYSWITCH    1  0000.007e.7344  DYNAMIC   Gi1/0/1
3   MYSWITCH    1  00c0.b7fe.1e5a  DYNAMIC   Gi1/0/1
6   MYSWITCH    1  5c5a.c799.40a6  DYNAMIC   Gi1/0/1
11  MYSWITCH    1  bc26.c74b.a3c7  DYNAMIC   Gi1/0/1
12  MYSWITCH    1  0042.680f.1282  DYNAMIC   Gi1/0/5
13  MYSWITCH    1  0042.680f.1282  DYNAMIC  Gi1/0/11
15  MYSWITCH    1  0042.680f.1282  DYNAMIC  Gi1/0/12

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM