[英]How to conditionally choose column value per key from long format pandas data frame with multiple values per key? group-by and then if-then?
I have a data frame that has a person's name and the positions they have played per year.我有一个数据框,其中包含一个人的姓名和他们每年所扮演的位置。 It is in long format with multiple entries per person.它是长格式,每人有多个条目。 I would like to make 1 data frame for all years with just one entry per person.我想为所有年份制作 1 个数据框,每人只有一个条目。
I am thinking about using groupby for this.我正在考虑为此使用 groupby 。 However, I don't know how to handle the position titles.但是,我不知道如何处理 position 标题。 A person can have either forward, offence, or both.一个人可以有前锋,进攻,或两者兼而有之。 What I would like to do is if a person has entries for forward AND offence, to put their position as "both forward and offence" OR if a person has forward, offence and both, to pick "both forward and offence", OR if a person has just forward, or just offence, to take what they have.我想做的是,如果一个人有前锋和进攻的条目,将他们的 position 作为“前锋和进攻”,或者如果一个人有前锋,进攻和两者,选择“前锋和进攻”,或者如果一个人只是向前,或只是冒犯,拿走他们所拥有的东西。
I have NO idea where to start though.我不知道从哪里开始。 I have tried googling this but I think I don't know the right terms because nothing useful is coming up.我试过用谷歌搜索,但我认为我不知道正确的术语,因为没有任何有用的东西出现。 I am thinking of using group-by with an if-then statement after but I am not sure.我正在考虑使用 group-by 和 if-then 语句,但我不确定。 Any advice or even a suggestion of what terms to google for this would be much much appreciated!非常感谢任何建议,甚至是关于谷歌使用什么条款的建议!
Input dataset:输入数据集:
Name姓名 | Position Position |
---|---|
Tom汤姆 | Forward向前 |
Tom汤姆 | Offence罪行 |
Aiden艾登 | Forward向前 |
Aiden艾登 | Offence罪行 |
Aiden艾登 | Both Forward and Offence前锋和进攻 |
Kristy克里斯蒂 | Forward向前 |
Kristy克里斯蒂 | Forward向前 |
data = {'Name': ['Tom', 'Tom', 'Aiden', 'Aiden', 'Aiden', 'Kristy', 'Kristy'],
'Position': ['Forward', 'Offence', 'Forward', 'Offence',
'Both Forward and Offence', 'Forward', 'Forward']}
df = pd.DataFrame(data)
Ideal output dataset:理想的 output 数据集:
Name姓名 | Position Position |
---|---|
Tom汤姆 | Both Forward and Offence前锋和进攻 |
Aiden艾登 | Both Forward and Offence前锋和进攻 |
Kristy克里斯蒂 | Forward向前 |
You were on the right idea with groupby
and if-else
.您对groupby
和if-else
的想法是正确的。 You can see your problem a bit more simple as: if the number of unique position ( nunique
) per name is 1, you want this one, else 'Both Forward and Offence'
so a simple way is.您可以更简单地看到您的问题:如果每个名称的唯一 position ( nunique
) 的数量为 1,则您想要这个,否则'Both Forward and Offence'
,所以一个简单的方法是。
res = (
df.groupby('Name', sort=False)
['Position'].apply(lambda x: x.min() if x.nunique()==1
else 'Both Forward and Offence')
.reset_index()
)
print(res)
# Name Position
# 0 Tom Both Forward and Offence
# 1 Aiden Both Forward and Offence
# 2 Kristy Forward
the use of x.min()
is to select one value in case like Kristy you have several rows with the same position, but could be x.max()
, x.iloc[0]
, ... x.min()
的使用是 select 一个值,以防像 Kristy 你有几行具有相同的 position,但可以是x.max()
, x.iloc[0]
,...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.