繁体   English   中英

合并两行熊猫数据框

[英]Merge two rows pandas dataframe

在此处输入图像描述

我有这些数据,我需要将两个选定的列与另一行合并,因为它的重复行来自我的代码。

那么,我该怎么做呢?

这是一种解决您的问题的方法:

df[['State_new', 'Solution_new']] = df[['Power State', 'Recommended Solution']].shift()
mask = ~df['State_new'].isna()
df.loc[mask, 'State'] = df.loc[mask, 'State_new']
df.loc[mask, 'Recommended Solutuin'] = df.loc[mask, 'Solution_new']
df = df.drop(columns=['State_new', 'Solution_new', 'Power State', 'Recommended Solution'])[~df['State'].isna()].reset_index(drop=True)

解释:

  • 从您的代码中创建重要数据的版本,向下移动一行
  • 创建一个布尔掩码,指示这些移位行中的哪些不为空
  • 使用此掩码覆盖StateRecommended Solutuin列的内容(注意:使用 OP 问题中的原始列标签逐字逐句地使用来自您的代码的更新数据包含在移位列中
  • 删除不再需要的用于执行更新的列
  • 使用reset_index创建一个没有间隙的新整数范围索引。

如果有帮助,这里是从 Excel 中提取数据框的示例代码:

import pandas as pd
df = pd.read_excel('TestBook.xlsx', sheet_name='TestSheet', usecols='AD:AM')

这是输入数据框:

         MAC       RLC     RLC 2  PDCCH Down  PDCCH Uplink  Unnamed: 34 Recommended Solutuin                         State Power State Recommended Solution
0   122.9822  7119.503  125.7017    1186.507      784.9464          NaN    Downtitlt antenna  serving cell is overshooting         NaN                  NaN
1     4.1000  7119.503   24.0000      11.000       51.0000          NaN    Downtitlt antenna  serving cell is overshooting         NaN                  NaN
2   121.8900  2127.740  101.3300    1621.000      822.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
3    86.5800  2085.250   94.6400    1650.000      880.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
4    64.7500  1873.540   63.8600    1259.000      841.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
5    84.8700  1735.070   60.3800    1423.000      474.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
6    49.3400  1276.190   59.9600    1372.000      450.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
7   135.0200  2359.840  164.1300    1224.000      704.0000          NaN                  NaN                           NaN   Bad Power  Check hardware etc.
8   135.0200  2359.840  164.1300    1224.000      704.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
9   163.7200  1893.940   90.0300    1244.000      753.0000          NaN                  NaN                           NaN   Bad Power  Check hardware etc.
10  163.7200  1893.940   90.0300    1244.000      753.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN
11  129.6400  1163.140  154.3200     663.000      798.0000          NaN                  NaN                           NaN   Bad Power  Check hardware etc.
12  129.6400  1163.140  154.3200     663.000      798.0000          NaN       uptilt antenna                  bad coverage         NaN                  NaN

这是示例输出:

        MAC       RLC     RLC 2  PDCCH Down  PDCCH Uplink  Unnamed: 34 Recommended Solutuin                         State
0  122.9822  7119.503  125.7017    1186.507      784.9464          NaN    Downtitlt antenna  serving cell is overshooting
1    4.1000  7119.503   24.0000      11.000       51.0000          NaN    Downtitlt antenna  serving cell is overshooting
2  121.8900  2127.740  101.3300    1621.000      822.0000          NaN       uptilt antenna                  bad coverage
3   86.5800  2085.250   94.6400    1650.000      880.0000          NaN       uptilt antenna                  bad coverage
4   64.7500  1873.540   63.8600    1259.000      841.0000          NaN       uptilt antenna                  bad coverage
5   84.8700  1735.070   60.3800    1423.000      474.0000          NaN       uptilt antenna                  bad coverage
6   49.3400  1276.190   59.9600    1372.000      450.0000          NaN       uptilt antenna                  bad coverage
7  135.0200  2359.840  164.1300    1224.000      704.0000          NaN  Check hardware etc.                     Bad Power
8  163.7200  1893.940   90.0300    1244.000      753.0000          NaN  Check hardware etc.                     Bad Power
9  129.6400  1163.140  154.3200     663.000      798.0000          NaN  Check hardware etc.                     Bad Power

您可以使用 groupby 按列组合行:

df = pd.DataFrame(data)
new_df = df.groupby(['MAC', 'RLC1', 'RLC2', 'POCCH', 'POCCH Up']).sum()
new_df.reset_index()

您可以执行以下操作:

    fill_cols = ['Power State', 'Recommended Solution 2']
    dup_cols = ['MAC_UL','RLC_Through_1','RLC_Through_2','PDCCH Down', 'PDCCH Up']
    m = df.duplicated(subset=dup_cols, keep=False)
    df_fill = df.loc[m,fill_cols]
    df_fill[df_fill['Power State']==''] = np.NaN
    df_fill[df_fill['Recommended Solution 2']==''] = np.NaN
    
    df.loc[m,fill_cols]=df_fill.ffill()
  1. 使用duplicated获取重复的行
  2. 用 NaN 填充空值
  3. 然后使用ffill

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM