简体   繁体   English

如何从 dataframe 中的另一列按条件创建新组?

[英]How to create new group by condition from another column in dataframe?

So I have this kind of data所以我有这样的数据

data = [['A', 0], ['A', 1], ['A', 2], ['A', 15], ['A', 2], ['A', 12],['B',1],['B',3]]
df = pd.DataFrame(data, columns = ['name', 'interval'])

    name    interval
0   A       0
1   A       1
2   A       2
3   A       15
4   A       2
5   A       12
6   B       1
7   B       3

so I want to create a new name based on the interval (if the interval>10 then the new name is generated) but still using the previous name like this (this is just an example name)所以我想根据间隔创建一个新名称(如果间隔>10,则生成新名称)但仍然使用以前的名称(这只是一个示例名称)

    name    interval    new_name
0   A       0           A_0
1   A       1           A_0
2   A       2           A_0
3   A       15          A_1
4   A       2           A_1
5   A       12          A_2
6   B       1           B_0
7   B       3           B_0

My current code is accessing every row using for, any other idea to process it?我当前的代码正在使用 for 访问每一行,还有其他想法来处理它吗? Thank you谢谢

###################### ######################

Credits to Rutger for his idea.感谢 Rutger 的想法。 This is the flow how to do it这是流程怎么做

    name    interval    condition  cumsum   new_name(name+"_"+cumsum)
0   A       0           False      0        A_0
1   A       1           False      0        A_0
2   A       2           False      0        A_0
3   A       15          True       1        A_1
4   A       2           False      1        A_1
5   A       12          True       2        A_2
6   B       1           False      0        B_0
7   B       3           False      0        B_0

Details of the code is in the Rutger's answer代码的详细信息在 Rutger 的回答中

I think the easiest is to start with creating a bool series and then create your new field like this:我认为最简单的方法是从创建 bool 系列开始,然后像这样创建新字段:

df['large_interval'] = 10 < df['interval']
df['new_name'] = df['name'] + '_' + df.groupby('name')['large'].cumsum().astype(str)

On the second line it counts how many large intervals have passed per group.在第二行,它计算每组经过了多少大间隔。 That value is used as a string and added after then name and _.该值用作字符串并在名称和_之后添加。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用来自另一个数据帧的 if 条件在 Pandas 数据帧中创建一个新列 - create a new column in pandas dataframe using if condition from another dataframe Pandas 数据框根据另一列的条件创建新行 - Pandas dataframe create new rows based on condition from another column Pandas 根据来自另一个 dataframe 的计数和条件创建新列 - Pandas Create new column based on a count and a condition from another dataframe 如何使用来自满足条件的另一列的解析值在 dataframe 中创建新列 - How to create a new column in a dataframe with a parsed value from another column where condition is satisfied 如何根据其他 dataframe 的条件在 dataframe 中创建新列? - How to create new column in dataframe based on condition from other dataframe? 如何根据给定条件从 dataframe 创建新列 - How to create a new column from dataframe based on a given condition 根据条件从另一个 dataframe 创建列 - Create column from another dataframe with condition 如何创建一个新的数据框列,并从另一个列中移出值? - How to create a new dataframe column with shifted values from another column? 在熊猫中,如何根据条件从另一个部分中创建一个新列? - In pandas how to create a new column from part of another, obeying a condition? 根据条件从另一个数据帧的值向数据帧添加新列 - Adding a new column to a dataframe from the values of another dataframe based on a condition
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM