如何将同一类别的多行合并为大熊猫？

Question

I'm trying to get from table 1 to table 2 from the image but I can't seem to get it right. 我正在尝试从表1从表1到表2，但似乎无法正确处理。 I tried pivot table to change col A - D from rows to cols. 我尝试通过数据透视表将A-D列从行更改为列。 Then I try groupby but it doesn't give me one row but messes up my dataframe instead. 然后，我尝试groupby，但是它没有给我一行，而是弄乱了我的数据框。

Answer 1

You can fill the null values with the value in the column and drop duplicates: 您可以使用列中的值填充空值，然后删除重复项：

with : 与：

df = pd.DataFrame([["A", pd.np.nan, pd.np.nan, "Y", "Z"],
              [pd.np.nan, "B", pd.np.nan, "Y", "Z"],
              [pd.np.nan,pd.np.nan, "C", "Y", "Z"]], columns=list("ABCDE"))
df
     A    B    C  D  E
0    A  NaN  NaN  Y  Z
1  NaN    B  NaN  Y  Z
2  NaN  NaN    C  Y  Z

df.ffill().bfill().drop_duplicates()
   A  B  C  D  E
0  A  B  C  Y  Z

df.ffill().bfill() gives: df.ffill().bfill()给出：

   A  B  C  D  E
0  A  B  C  Y  Z
1  A  B  C  Y  Z
2  A  B  C  Y  Z

As per your comment, you could define a function that fill the missing value of the first row by the unique value that lies somewhere else in the same column. 根据您的评论，您可以定义一个函数，该函数用位于同一列中其他位置的唯一值填充第一行的缺失值。

def fillna_uniq(df, col):
    if isinstance(col, list):
        for c in col:
            df.loc[df.index[0], c] = df[c].dropna().iloc[0]
    else:
        df.loc[df.index[0], col] = df[col].dropna().iloc[0]
    return df.iloc[[0]]

You could then do: 然后，您可以执行以下操作：

fillna_uniq(df.copy(), ["B", "C", "D"])
       A  B   C     D       E     F
0  Hello  I  am  lost  Pandas  Data

It is a bit faster I think. 我认为这要快一些。 You can modify your df inplace by passing directly the dataframe, not a copy. 您可以通过直接传递数据框而不是副本来直接修改df。

HTH HTH

Answer 2

One way you can do this is using apply and dropna : 一种方法是使用apply和dropna ：

Assuming those blanks in your table above are really nulls: 假设上表中的空白为空：

df = pd.DataFrame({'A':['Hello',np.nan,np.nan,np.nan],'B':[np.nan,'I',np.nan,np.nan],
                   'C':[np.nan,np.nan,'am',np.nan],
                  'D':[np.nan,np.nan,np.nan,'lost'],
                  'E':['Pandas']*4,
                  'F':['Data']*4})

print(df)
       A    B    C     D       E     F
0  Hello  NaN  NaN   NaN  Pandas  Data
1    NaN    I  NaN   NaN  Pandas  Data
2    NaN  NaN   am   NaN  Pandas  Data
3    NaN  NaN  NaN  lost  Pandas  Data

Using apply , you can apply the lambda function to each column of the dataframe, first dropping null values then find the max: 使用apply ，可以将lambda函数应用于数据框的每一列，首先删除空值，然后找到最大值：

df.apply(lambda x: x.dropna().max()).to_frame().T

       A  B   C     D       E     F
0  Hello  I  am  lost  Pandas  Data

Or if your blanks are really empty strings, then you can do this: 或者，如果您的空格是真正的空字符串，则可以执行以下操作：

df1 = df.replace(np.nan,'')
df1
       A  B   C     D       E     F
0  Hello               Pandas  Data
1         I            Pandas  Data
2            am        Pandas  Data
3                lost  Pandas  Data

df1.apply(lambda x: x[x!=''].max()).to_frame().T

       A  B   C     D       E     F
0  Hello  I  am  lost  Pandas  Data

如何将同一类别的多行合并为大熊猫？

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-12-19 13:44:32

解决方案2
1 2017-12-19 14:03:27

如何将同一类别的多行合并为大熊猫？

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-12-19 13:44:32

解决方案2 1 2017-12-19 14:03:27

解决方案1
2 已采纳 2017-12-19 13:44:32

解决方案2
1 2017-12-19 14:03:27