[英]How to combine multiple rows of same category to one in pandas?
I'm trying to get from table 1 to table 2 from the image but I can't seem to get it right. 我正在尝试从表1从表1到表2,但似乎无法正确处理。 I tried pivot table to change col A - D from rows to cols.
我尝试通过数据透视表将A-D列从行更改为列。 Then I try groupby but it doesn't give me one row but messes up my dataframe instead.
然后,我尝试groupby,但是它没有给我一行,而是弄乱了我的数据框。
You can fill the null values with the value in the column and drop duplicates: 您可以使用列中的值填充空值,然后删除重复项:
with : 与:
df = pd.DataFrame([["A", pd.np.nan, pd.np.nan, "Y", "Z"],
[pd.np.nan, "B", pd.np.nan, "Y", "Z"],
[pd.np.nan,pd.np.nan, "C", "Y", "Z"]], columns=list("ABCDE"))
df
A B C D E
0 A NaN NaN Y Z
1 NaN B NaN Y Z
2 NaN NaN C Y Z
df.ffill().bfill().drop_duplicates()
A B C D E
0 A B C Y Z
df.ffill().bfill()
gives: df.ffill().bfill()
给出:
A B C D E
0 A B C Y Z
1 A B C Y Z
2 A B C Y Z
As per your comment, you could define a function that fill the missing value of the first row by the unique value that lies somewhere else in the same column. 根据您的评论,您可以定义一个函数,该函数用位于同一列中其他位置的唯一值填充第一行的缺失值。
def fillna_uniq(df, col):
if isinstance(col, list):
for c in col:
df.loc[df.index[0], c] = df[c].dropna().iloc[0]
else:
df.loc[df.index[0], col] = df[col].dropna().iloc[0]
return df.iloc[[0]]
You could then do: 然后,您可以执行以下操作:
fillna_uniq(df.copy(), ["B", "C", "D"])
A B C D E F
0 Hello I am lost Pandas Data
It is a bit faster I think. 我认为这要快一些。 You can modify your df inplace by passing directly the dataframe, not a copy.
您可以通过直接传递数据框而不是副本来直接修改df。
HTH HTH
One way you can do this is using apply
and dropna
: 一种方法是使用
apply
和dropna
:
Assuming those blanks in your table above are really nulls: 假设上表中的空白为空:
df = pd.DataFrame({'A':['Hello',np.nan,np.nan,np.nan],'B':[np.nan,'I',np.nan,np.nan],
'C':[np.nan,np.nan,'am',np.nan],
'D':[np.nan,np.nan,np.nan,'lost'],
'E':['Pandas']*4,
'F':['Data']*4})
print(df)
A B C D E F
0 Hello NaN NaN NaN Pandas Data
1 NaN I NaN NaN Pandas Data
2 NaN NaN am NaN Pandas Data
3 NaN NaN NaN lost Pandas Data
Using apply
, you can apply the lambda function to each column of the dataframe, first dropping null values then find the max: 使用
apply
,可以将lambda函数应用于数据框的每一列,首先删除空值,然后找到最大值:
df.apply(lambda x: x.dropna().max()).to_frame().T
A B C D E F
0 Hello I am lost Pandas Data
Or if your blanks are really empty strings, then you can do this: 或者,如果您的空格是真正的空字符串,则可以执行以下操作:
df1 = df.replace(np.nan,'')
df1
A B C D E F
0 Hello Pandas Data
1 I Pandas Data
2 am Pandas Data
3 lost Pandas Data
df1.apply(lambda x: x[x!=''].max()).to_frame().T
A B C D E F
0 Hello I am lost Pandas Data
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.