合并 pandas 数据框列时出错

Question

I'm trying to merge three columns from the same data frame into one.我正在尝试将同一数据框中的三列合并为一列。

Here my data frame selected_vals这是我的数据框selected_vals

   label_1                         label_2                   label_3   
0  NaN                              NaN                      NaN
1  ('__label__Religione_e_Magia',)  NaN                      NaN
2  NaN                            ('__label__Storia',)       NaN
3  NaN                            ('__label__Storia',)       NaN
4 ('__label__Religione_e_Magia',)  NaN                       NaN

The dataframe has only one value per row so, in the col where the value it's not specified I'm having NaN Following the solution proposed here I used this code: dataframe 每行只有一个值，因此，在未指定值的 col 中，我有NaN按照此处提出的解决方案，我使用了以下代码：

selected_vals['selected_vals'] =  selected_vals.loc[:,selected_vals.columns.tolist()[1:]].apply(lambda x: x.dropna().tolist(), 1)

However, by doing so, only the values from the col label_2 are in the col selected_vals但是，通过这样做，只有 col label_2中的值在 col selected_vals中

Here the ouput这里的输出

 label_1                         label_2                   label_3  selected_vals   
0  NaN                              NaN                      NaN      []
1  ('__label__Religione_e_Magia',)  NaN                      NaN      []
2  NaN                            ('__label__Storia',)       NaN      ('__label__Storia',)
3  NaN                            ('__label__Storia',)       NaN      ('__label__Storia',)
4 ('__label__Religione_e_Magia',)  NaN

As desired output I would like to have all the values stored in the same col ie根据需要 output 我希望将所有值存储在同一个列中

   selected_vals                              
0  NaN                              
1  ('__label__Religione_e_Magia',)  
2  ('__label__Storia',)                                   
3  ('__label__Storia',)                            
4 ('__label__Religione_e_Magia',)

Suggestions about how to deal with this problem?关于如何处理这个问题的建议？

Thanks谢谢

Answer 1

Use DataFrame.iloc for select all columns without first, then forward fiiling missing values and last select last column:将DataFrame.iloc用于 select 所有列，没有第一列，然后转发缺失值，最后 select 最后一列：

#replace NaN strings to np.nan if necessary
selected_vals = selected_vals.replace('NaN', np.nan)

selected_vals['selected_vals'] =  selected_vals.iloc[:,1:].ffill(axis=1).iloc[:, -1]

Answer 2

You can apply function to each row and keep only desired value (where column is not NaN)您可以将 function 应用于每一行并仅保留所需的值（其中列不是 NaN）

selected_vals['selected_vals'] = selected_vals.apply(lambda row: row[row[pd.notnull(row)].index.item()], axis=1)

Answer 3

Thanks for your suggestions.感谢您的建议。

I think the problem was related to the type of the dataframe.我认为问题与 dataframe 的类型有关。

I solved the issue as follows:我解决了以下问题：

selected_vals = selected_vals.replace(np.nan, '', regex=True)
selected_vals = selected_vals.applymap(str)
df['suggested_label'] = selected_vals["label_1"].astype(str) + selected_vals["label_2"]+ selected_vals["label_3"]

print(df)

Don't know if it's correct or not but at least it works for me.不知道它是否正确，但至少它对我有用。

合并 pandas 数据框列时出错

问题描述

3 个解决方案

解决方案1
0 2021-03-16 12:47:34

解决方案2
0 2021-03-16 13:18:47

解决方案3
0 2021-03-16 16:07:32

合并 pandas 数据框列时出错

问题描述

3 个解决方案

解决方案1 0 2021-03-16 12:47:34

解决方案2 0 2021-03-16 13:18:47

解决方案3 0 2021-03-16 16:07:32

解决方案1
0 2021-03-16 12:47:34

解决方案2
0 2021-03-16 13:18:47

解决方案3
0 2021-03-16 16:07:32