简体   繁体   English

Python Pandas:Append 列值,基于另一个相同的列值

[英]Python Pandas: Append column value, based on another same column value

I have a pandas dataframe like this.我有一个像这样的 pandas dataframe。

   Time                      Source Level  County  Town
0  2021-12-01 10:01:41.443   NaN    NaN    NaN     NaN
1                      NaN   Test   3      C1      C1-T1
2                      NaN   Test   5-     C2      C2-T0
3                      NaN   Test   5-     C2      C2-T1
4  2021-12-01 10:01:46.452   NaN    NaN    NaN     NaN

I want to append Town value, which is based on row have the same Source, Level and County value.我想要 append 城镇值,它基于行具有相同的源、级别和县值。

I have tried isin, groupby, diff(but my value is str), but still not figure out.我尝试过 isin、groupby、diff(但我的值是 str),但仍然没有弄清楚。

Image below is what I want to get.下面的图片是我想要得到的。

   Time                      Source Level  County  Town
0  2021-12-01 10:01:41.443   NaN    NaN    NaN     NaN
1                      NaN   Test   3      C1      C1-T0
2                      NaN   Test   5-     C2      C2-T0, C2-T1
3  2021-12-01 10:01:46.452   NaN    NaN    NaN     NaN

Really appreciate your help!非常感谢您的帮助!

The way we can make this work is by creating a list out of it using groupby() and apply(list) , we can then transform this into a string separated by comma.我们可以通过使用groupby()apply(list)从中创建一个列表来完成这项工作,然后我们可以将其转换为用逗号分隔的字符串。 Let's split it into 2 steps for better understanding.让我们将其分为 2 个步骤以便更好地理解。

Personally I would keep this data as a list within a pandas series and not do step 2. Formatting as string separated by comma might not be ideal to work with.就我个人而言,我会将这些数据作为列表保留在 pandas 系列中,而不执行第 2 步。格式化为用逗号分隔的字符串可能不适合使用。

Step 1:步骤1:

output = df.groupby(['Time','Source','Level','County'])['Town'].apply(list).reset_index()

Returns:回报:

                      Time Source Level County            Town
0  2021-12-01 10:01:41.443    NaN   NaN    NaN           [nan]
1  2021-12-01 10:01:46.452    NaN   NaN    NaN           [nan]
2                      NaN   Test     3     C1         [C1-T1]
3                      NaN   Test    5-     C2  [C2-T0, C2-T1]

Now, we can format them correctly into strings (step 2):现在,我们可以将它们正确格式化为字符串(步骤 2):

output['Town'] = pd.Series([', '.join([y for y in x if type(y) == str]) for x in output['Town']]).replace('',np.nan)

Which outputs our desired result:输出我们想要的结果:

                      Time Source Level County          Town
0  2021-12-01 10:01:41.443    NaN   NaN    NaN           NaN
1  2021-12-01 10:01:46.452    NaN   NaN    NaN           NaN
2                      NaN   Test     3     C1         C1-T1
3                      NaN   Test    5-     C2  C2-T0, C2-T1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Pandas/Python:根据另一列中的值设置一列的值 - Pandas/Python: Set value of one column based on value in another column Label 基于另一列(同一行)的值的列 pandas dataframe - Label a column based on the value of another column (same row) in pandas dataframe 根据另一列 pandas python 的值在 python 中添加新列 - Adding a new column in python based on the value of another column pandas python Python Pandas:仅当列值唯一时,才将数据框追加到另一个数据框 - Python Pandas: Append Dataframe To Another Dataframe Only If Column Value is Unique 根据另一列的值向python pandas数据框添加一列 - Adding a column to a python pandas data frame based on the value of another column Python Pandas根据一列中的值创建新列,而另一列中为空白 - Python Pandas new column based on value in one column and blank in another 根据另一列(Python,Pandas)中的值删除一列的重复项 - Drop duplicates of one column based on value in another column, Python, Pandas Python Pandas 根据另一个列值创建新列 - Python Pandas create new column based on another column value Python Pandas 根据另一个单元格中的值检查列中的单元格值 - Python Pandas check cell value in column based on value in another cell 熊猫-根据另一列更改列中的值 - pandas - change value in column based on another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM