简体   繁体   English

pandas-向数据框添加一系列会导致出现NaN值

[英]pandas- adding a series to a dataframe causes NaN values to appear

I have a dataframe that looks something like this: 我有一个看起来像这样的数据框:

d = {'Col_1' : pd.Series(['A', 'A', 'A', 'B']),
     'Col_2' : pd.Series(['B', 'C', 'B', 'D']),
     'Col_3' : pd.Series([np.nan, 'D', 'C', np.nan]),
     'Col_4' : pd.Series([np.nan, np.nan, 'D', np.nan]),
     'Col_5' : pd.Series([np.nan, np.nan, 'E', np.nan]),}
df = pd.DataFrame(d)

Col_1  Col_2  Col_3  Col_4  Col_5
  A      B      NaN    NaN    NaN
  A      C      D      NaN    NaN
  A      B      C      D      E
  B      D      NaN    NaN    NaN

My Goal is to end up with something along the lines of: 我的目标是最终得到以下内容:

Col_1  Col_2  Col_3  Col_4  Col_5  ConCat
  A      B      NaN    NaN    NaN    A:B
  A      C      D      NaN    NaN    A:C:D
  A      B      C      D      E      A:B:C:D:E
  B      D      NaN    NaN    NaN    B:D

I've successfully created a dataframe that looks like the desired output from: 我已经成功创建了一个看起来像所需输出的数据框:

rows = df.values
df_1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])

    0
0  A:B
1  A:C:D
2  A:B:C:D:E
3  B:D

But now when I attempt to place it into the original dataframe, I get: 但是现在当我尝试将它放入原始数据帧时,我得到:

df['concatenated'] = df_1

Col_1  Col_2  Col_3  Col_4  Col_5  concatenated
  A      B      NaN    NaN    NaN    NaN
  A      C      D      NaN    NaN    NaN
  A      B      C      D      E      NaN
  B      D      NaN    NaN    NaN    NaN

What's strange is that when creating a simplified example, it works as expected. 奇怪的是,在创建简化示例时,它按预期工作。 Below if the full code of what I'm doing. 下面,如果我正在做的完整代码。 The original data comes to me transposed from what the original dataframe above looks like. 原始数据来自我上面原始数据框的转换。

df_caregiver_type = pd.concat([df_caregiver_type[col].order().reset_index(drop=True) for col in df_caregiver_type], axis=1, ignore_index=False).T
df_caregiver_type.rename(columns=lambda x: 'Col_' + str(x), inplace=True)
rows = df_caregiver_type.values
df_caregiver_type1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
df_caregiver_type['concatenated'] = df_caregiver_type1
df_caregiver_type = df_caregiver_type.T
df_caregiver_type

Update I'm thinking I'm getting an error due to the first row of the full code. 更新我认为由于完整代码的第一行,我收到错误。 It's from a separate, but related question: pandas: sort each column individually 它来自一个单独但相关的问题: pandas:单独对每列进行排序

For your full dataset, change the last step from df['concatenated'] = df_1 to df['concatenated'] = df_1.values will solve the issue, I think it a bug and I am very sure I have seen it in SO before. 对于您的完整数据集,将最后一步从df['concatenated'] = df_1df['concatenated'] = df_1.values将解决问题,我认为这是一个错误,我非常确定我已经在SO中看到了它之前。

Or just: df['concatenated'] = [':'.join(word for word in row if word is not np.nan) for row in rows] 或者只是: df['concatenated'] = [':'.join(word for word in row if word is not np.nan) for row in rows]

>>> d = {'Col_1' : pd.Series(['A', 'A', 'A', 'B']),
...      'Col_2' : pd.Series(['B', 'C', 'B', 'D']),
...      'Col_3' : pd.Series([np.nan, 'D', 'C', np.nan]),
...      'Col_4' : pd.Series([np.nan, np.nan, 'D', np.nan]),
...      'Col_5' : pd.Series([np.nan, np.nan, 'E', np.nan]),}
>>> df = pd.DataFrame(d)
>>> 
>>> rows = df.values
>>> df_1 = pd.DataFrame([':'.join(word for word in rows if word is not np.nan) for rows in rows])
>>> 
>>> df['concatenated'] = df_1[0]
>>> df
  Col_1 Col_2 Col_3 Col_4 Col_5 concatenated
0     A     B   NaN   NaN   NaN          A:B
1     A     C     D   NaN   NaN        A:C:D
2     A     B     C     D     E    A:B:C:D:E
3     B     D   NaN   NaN   NaN          B:D
>>> 
>>> df = df.join(df_1)
>>> df = df.rename(columns = {0:'concatenated'})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM