[英]Error in merging pandas data frame columns
I'm trying to merge three columns from the same data frame into one.我正在尝试将同一数据框中的三列合并为一列。
Here my data frame selected_vals
这是我的数据框selected_vals
label_1 label_2 label_3
0 NaN NaN NaN
1 ('__label__Religione_e_Magia',) NaN NaN
2 NaN ('__label__Storia',) NaN
3 NaN ('__label__Storia',) NaN
4 ('__label__Religione_e_Magia',) NaN NaN
The dataframe has only one value per row so, in the col where the value it's not specified I'm having NaN
Following the solution proposed here I used this code: dataframe 每行只有一个值,因此,在未指定值的 col 中,我有NaN
按照此处提出的解决方案,我使用了以下代码:
selected_vals['selected_vals'] = selected_vals.loc[:,selected_vals.columns.tolist()[1:]].apply(lambda x: x.dropna().tolist(), 1)
However, by doing so, only the values from the col label_2
are in the col selected_vals
但是,通过这样做,只有 col label_2
中的值在 col selected_vals
中
Here the ouput这里的输出
label_1 label_2 label_3 selected_vals
0 NaN NaN NaN []
1 ('__label__Religione_e_Magia',) NaN NaN []
2 NaN ('__label__Storia',) NaN ('__label__Storia',)
3 NaN ('__label__Storia',) NaN ('__label__Storia',)
4 ('__label__Religione_e_Magia',) NaN
As desired output I would like to have all the values stored in the same col ie根据需要 output 我希望将所有值存储在同一个列中
selected_vals
0 NaN
1 ('__label__Religione_e_Magia',)
2 ('__label__Storia',)
3 ('__label__Storia',)
4 ('__label__Religione_e_Magia',)
Suggestions about how to deal with this problem?关于如何处理这个问题的建议?
Thanks谢谢
Use DataFrame.iloc
for select all columns without first, then forward fiiling missing values and last select last column:将DataFrame.iloc
用于 select 所有列,没有第一列,然后转发缺失值,最后 select 最后一列:
#replace NaN strings to np.nan if necessary
selected_vals = selected_vals.replace('NaN', np.nan)
selected_vals['selected_vals'] = selected_vals.iloc[:,1:].ffill(axis=1).iloc[:, -1]
You can apply function to each row and keep only desired value (where column is not NaN)您可以将 function 应用于每一行并仅保留所需的值(其中列不是 NaN)
selected_vals['selected_vals'] = selected_vals.apply(lambda row: row[row[pd.notnull(row)].index.item()], axis=1)
Thanks for your suggestions.感谢您的建议。
I think the problem was related to the type of the dataframe.我认为问题与 dataframe 的类型有关。
I solved the issue as follows:我解决了以下问题:
selected_vals = selected_vals.replace(np.nan, '', regex=True)
selected_vals = selected_vals.applymap(str)
df['suggested_label'] = selected_vals["label_1"].astype(str) + selected_vals["label_2"]+ selected_vals["label_3"]
print(df)
Don't know if it's correct or not but at least it works for me.不知道它是否正确,但至少它对我有用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.