简体   繁体   English

如何从每个键有多个值的长格式 pandas 数据框中有条件地选择每个键的列值? 分组,然后如果-那么?

[英]How to conditionally choose column value per key from long format pandas data frame with multiple values per key? group-by and then if-then?

I have a data frame that has a person's name and the positions they have played per year.我有一个数据框,其中包含一个人的姓名和他们每年所扮演的位置。 It is in long format with multiple entries per person.它是长格式,每人有多个条目。 I would like to make 1 data frame for all years with just one entry per person.我想为所有年份制作 1 个数据框,每人只有一个条目。

I am thinking about using groupby for this.我正在考虑为此使用 groupby 。 However, I don't know how to handle the position titles.但是,我不知道如何处理 position 标题。 A person can have either forward, offence, or both.一个人可以有前锋,进攻,或两者兼而有之。 What I would like to do is if a person has entries for forward AND offence, to put their position as "both forward and offence" OR if a person has forward, offence and both, to pick "both forward and offence", OR if a person has just forward, or just offence, to take what they have.我想做的是,如果一个人有前锋和进攻的条目,将他们的 position 作为“前锋和进攻”,或者如果一个人有前锋,进攻和两者,选择“前锋和进攻”,或者如果一个人只是向前,或只是冒犯,拿走他们所拥有的东西。

I have NO idea where to start though.我不知道从哪里开始。 I have tried googling this but I think I don't know the right terms because nothing useful is coming up.我试过用谷歌搜索,但我认为我不知道正确的术语,因为没有任何有用的东西出现。 I am thinking of using group-by with an if-then statement after but I am not sure.我正在考虑使用 group-by 和 if-then 语句,但我不确定。 Any advice or even a suggestion of what terms to google for this would be much much appreciated!非常感谢任何建议,甚至是关于谷歌使用什么条款的建议!

Input dataset:输入数据集:

Name姓名 Position Position
Tom汤姆 Forward向前
Tom汤姆 Offence罪行
Aiden艾登 Forward向前
Aiden艾登 Offence罪行
Aiden艾登 Both Forward and Offence前锋和进攻
Kristy克里斯蒂 Forward向前
Kristy克里斯蒂 Forward向前
data = {'Name': ['Tom', 'Tom', 'Aiden', 'Aiden', 'Aiden', 'Kristy', 'Kristy'], 
        'Position': ['Forward', 'Offence', 'Forward', 'Offence', 
                     'Both Forward and Offence', 'Forward', 'Forward']}
df = pd.DataFrame(data)

Ideal output dataset:理想的 output 数据集:

Name姓名 Position Position
Tom汤姆 Both Forward and Offence前锋和进攻
Aiden艾登 Both Forward and Offence前锋和进攻
Kristy克里斯蒂 Forward向前

You were on the right idea with groupby and if-else .您对groupbyif-else的想法是正确的。 You can see your problem a bit more simple as: if the number of unique position ( nunique ) per name is 1, you want this one, else 'Both Forward and Offence' so a simple way is.您可以更简单地看到您的问题:如果每个名称的唯一 position ( nunique ) 的数量为 1,则您想要这个,否则'Both Forward and Offence' ,所以一个简单的方法是。

res = (
    df.groupby('Name', sort=False)
      ['Position'].apply(lambda x: x.min() if x.nunique()==1 
                                   else 'Both Forward and Offence')
      .reset_index()
)
print(res)
#      Name                  Position
# 0     Tom  Both Forward and Offence
# 1   Aiden  Both Forward and Offence
# 2  Kristy                   Forward

the use of x.min() is to select one value in case like Kristy you have several rows with the same position, but could be x.max() , x.iloc[0] , ... x.min()的使用是 select 一个值,以防像 Kristy 你有几行具有相同的 position,但可以是x.max()x.iloc[0] ,...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM