根据数据框中另一列的值创建新列

Question

我当前数据框的一个片段是：

     |commentID | commentType |depth | parentID   |                                    
     |:-------- |:-------------------------------:| 
0    |58b61d1d  | comment     | 1.0  | 0.0        |
1    |58b6393b  | userReply   | 2.0  | 58b61d1d.0 |     
2    |58b6556e  | comment     | 1.0  | 0.0        |
3    |58b657fa  | userReply   | 3.0  | 58b61d1d.0 |
4    |58b657fa  | comment     | 1.0  | 0.0        |

我希望数据框看起来像：

     |commentID | commentType |depth | parentID   | receiveAReply |                                  
     |:-------- |:--------------------------------|--------------:| 
0    |58b61d1d  | comment     | 1.0  | 0.0        | 1             |
1    |58b6393b  | userReply   | 2.0  | 58b61d1d.0 | 0             |
2    |58b6556e  | comment     | 1.0  | 0.0        | 0             |
3    |58b657fa  | userReply   | 3.0  | 58b61d1d.0 | 0             |
4    |58b657fa  | comment     | 1.0  | 0.0        | 0             |

添加的列：receiveAReply
如果任何评论收到回复，则分配为 1。即使评论有多个回复，它仍然只分配 1 或 0。
所有用户回复都会收到 0，即使该回复有回复，例如深度 = 3.0。 这样我只关心对实际文章的评论以及他们是否收到回复，而不是回复的数量或对这些回复的回复。
因此，我专注于深度 2.0 的用户回复以及他们的 parentID 匹配的commentID。

我有以下代码，但是它分配了整个receiveAReply 列Nan，我尝试在其中创建另一列“回复”，其中它们具有深度为2.0 的父ID。 我尝试根据是否有任何commentID 与这些父ID 匹配来分配1：


df['replies'] = df.loc[df.depth == 2.0, ['parentID']]
df['receiveAReply'] = df.loc[df.commentID == df.replies, [1]]

Answer 1

IIUC 您的条件，您只是错过了提取parentID列的左侧部分：

pid = df.loc[df['depth'] == 2, 'parentID'].str.split('.').str[0].values

df['receiveAReply'] = 0
df.loc[df['commentID'].isin(pid), 'receiveAReply'] = 1

Output：

>>> df
  commentID commentType  depth    parentID  receiveAReply
0  58b61d1d     comment    1.0         0.0              1
1  58b6393b   userReply    2.0  58b61d1d.0              0
2  58b6556e     comment    1.0         0.0              0
3  58b657fa   userReply    3.0  58b61d1d.0              0
4  58b657fa     comment    1.0         0.0              0

Answer 2

这对我有用：

df['replies'] = df.loc[df.depth == 2.0, ['parentID']]

def test(x, y):
    if x in y.values:
        return 1
    else:
        return 0


df['getsReply'] = df['commentID'].apply(lambda x: test(x, df['replies']))

根据数据框中另一列的值创建新列

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-01-13 11:05:01

解决方案2
1 2022-01-13 11:20:21

根据数据框中另一列的值创建新列

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-01-13 11:05:01

解决方案2 1 2022-01-13 11:20:21

解决方案1
1 已采纳 2022-01-13 11:05:01

解决方案2
1 2022-01-13 11:20:21