簡體   English   中英

合並兩行熊貓數據框

[英]Merge two rows pandas dataframe

在此處輸入圖像描述

我有這些數據,我需要將兩個選定的列與另一行合並,因為它的重復行來自我的代碼。

那么,我該怎么做呢?

這是一種解決您的問題的方法:

df[['State_new', 'Solution_new']] = df[['Power State', 'Recommended Solution']].shift()
mask = ~df['State_new'].isna()
df.loc[mask, 'State'] = df.loc[mask, 'State_new']
df.loc[mask, 'Recommended Solutuin'] = df.loc[mask, 'Solution_new']
df = df.drop(columns=['State_new', 'Solution_new', 'Power State', 'Recommended Solution'])[~df['State'].isna()].reset_index(drop=True)

解釋:

  • 從您的代碼中創建重要數據的版本,向下移動一行
  • 創建一個布爾掩碼,指示這些移位行中的哪些不為空
  • 使用此掩碼覆蓋StateRecommended Solutuin列的內容(注意:使用 OP 問題中的原始列標簽逐字逐句地使用來自您的代碼的更新數據包含在移位列中
  • 刪除不再需要的用於執行更新的列
  • 使用reset_index創建一個沒有間隙的新整數范圍索引。

如果有幫助,這里是從 Excel 中提取數據框的示例代碼:

import pandas as pd
df = pd.read_excel('TestBook.xlsx', sheet_name='TestSheet', usecols='AD:AM')

這是輸入數據框:

         MAC       RLC     RLC 2  PDCCH Down  PDCCH Uplink  Unnamed: 34 Recommended Solutuin                         State Power State Recommended Solution
0   122.9822  7119.503  125.7017    1186.507      784.9464          NaN    Downtitlt antenna  serving cell is overshooting         NaN                  NaN
1     4.1000  7119.503   24.0000      11.000       51.0000          NaN    Downtitlt antenna  serving cell is overshooting         NaN                  NaN
2   121.8900  2127.740  101.3300    1621.000      822.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
3    86.5800  2085.250   94.6400    1650.000      880.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
4    64.7500  1873.540   63.8600    1259.000      841.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
5    84.8700  1735.070   60.3800    1423.000      474.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
6    49.3400  1276.190   59.9600    1372.000      450.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
7   135.0200  2359.840  164.1300    1224.000      704.0000          NaN                  NaN                           NaN   Bad Power  Check hardware etc.
8   135.0200  2359.840  164.1300    1224.000      704.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
9   163.7200  1893.940   90.0300    1244.000      753.0000          NaN                  NaN                           NaN   Bad Power  Check hardware etc.
10  163.7200  1893.940   90.0300    1244.000      753.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
11  129.6400  1163.140  154.3200     663.000      798.0000          NaN                  NaN                           NaN   Bad Power  Check hardware etc.
12  129.6400  1163.140  154.3200     663.000      798.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN

這是示例輸出:

        MAC       RLC     RLC 2  PDCCH Down  PDCCH Uplink  Unnamed: 34 Recommended Solutuin                         State
0  122.9822  7119.503  125.7017    1186.507      784.9464          NaN    Downtitlt antenna  serving cell is overshooting
1    4.1000  7119.503   24.0000      11.000       51.0000          NaN    Downtitlt antenna  serving cell is overshooting
2  121.8900  2127.740  101.3300    1621.000      822.0000          NaN       uptilt antenna                  bad coverage
3   86.5800  2085.250   94.6400    1650.000      880.0000          NaN       uptilt antenna                  bad coverage
4   64.7500  1873.540   63.8600    1259.000      841.0000          NaN       uptilt antenna                  bad coverage
5   84.8700  1735.070   60.3800    1423.000      474.0000          NaN       uptilt antenna                  bad coverage
6   49.3400  1276.190   59.9600    1372.000      450.0000          NaN       uptilt antenna                  bad coverage
7  135.0200  2359.840  164.1300    1224.000      704.0000          NaN  Check hardware etc.                     Bad Power
8  163.7200  1893.940   90.0300    1244.000      753.0000          NaN  Check hardware etc.                     Bad Power
9  129.6400  1163.140  154.3200     663.000      798.0000          NaN  Check hardware etc.                     Bad Power

您可以使用 groupby 按列組合行:

df = pd.DataFrame(data)
new_df = df.groupby(['MAC', 'RLC1', 'RLC2', 'POCCH', 'POCCH Up']).sum()
new_df.reset_index()

您可以執行以下操作:

    fill_cols = ['Power State', 'Recommended Solution 2']
    dup_cols = ['MAC_UL','RLC_Through_1','RLC_Through_2','PDCCH Down', 'PDCCH Up']
    m = df.duplicated(subset=dup_cols, keep=False)
    df_fill = df.loc[m,fill_cols]
    df_fill[df_fill['Power State']==''] = np.NaN
    df_fill[df_fill['Recommended Solution 2']==''] = np.NaN
    
    df.loc[m,fill_cols]=df_fill.ffill()
  1. 使用duplicated獲取重復的行
  2. 用 NaN 填充空值
  3. 然后使用ffill

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM