[英]How do I create a new column with values from one column or another based on the value in a third column using pandas?
So, as an example, say I have the table:因此,举个例子,假设我有一张桌子:
Activity活动 | Start Time开始时间 | End Time时间结束 |
---|---|---|
Red on红灯亮 | 2:15 2:15 | 3:00 3:00 |
Red on红灯亮 | 2:30 2:30 | 3:15 3:15 |
Red off红灯熄灭 | 1:45 1:45 | 2:30 2:30 |
Red off红灯熄灭 | 2:45 2:45 | 3:30 3:30 |
Based on the activity, I only care about one of two values - for 'Red on' I need to know the start time.根据活动,我只关心两个值之一 - 对于“红色”,我需要知道开始时间。 For 'Red off' I need to know the end time.对于“Red off”,我需要知道结束时间。
I want to create a fourth column just labeled 'Change Time', and based on whether 'Activity' is 'Red on' or 'Red off' I want to grab either the Start Time column value or the End Time column value for this fourth column.我想创建第四个标记为“更改时间”的列,并根据“活动”是“红色开启”还是“红色关闭”我想获取第四列的开始时间列值或结束时间列值柱子。 Later, I'm going to be discarding the Start Time and End Time columns and just keeping this newly-merged Change Time column.稍后,我将放弃 Start Time 和 End Time 列,只保留这个新合并的 Change Time 列。 For now, I'm just trying to work out how to create it.现在,我只是想弄清楚如何创建它。
With this example, the result I want is:通过这个例子,我想要的结果是:
Activity活动 | Start Time开始时间 | End Time时间结束 | Change Time更改时间 |
---|---|---|---|
Red on红灯亮 | 2:15 2:15 | 3:00 3:00 | 2:15 2:15 |
Red on红灯亮 | 2:30 2:30 | 3:15 3:15 | 2:30 2:30 |
Red off红灯熄灭 | 1:45 1:45 | 2:30 2:30 | 2:30 2:30 |
Red off红灯熄灭 | 2:45 2:45 | 3:30 3:30 | 3:30 3:30 |
Let's assume Red on and Red off are the only two possible values for the Activity column.假设 Red on 和 Red off 是 Activity 列仅有的两个可能值。 I thought I had an idea of how to do this, but both things I've tried have thrown errors.我以为我知道如何做到这一点,但我尝试过的两件事都引发了错误。
First, I tried:首先,我试过:
df['Change time'] = df['Activity'].apply(lambda x: df['Start Time'] if x == 'Red on' else df['End Time'])
And I got an error that said "ValueError: Wrong number of items passed 35, placement implies 1"
- since I have 35 rows, that leads me to believe df['Start Time'] was trying to pass the whole column.我得到一个错误,说"ValueError: Wrong number of items passed 35, placement implies 1"
- 因为我有 35 行,这让我相信 df['Start Time'] 试图通过整个列。 So, instead, I tried:所以,相反,我尝试了:
df['Change time'] = df['Activity'].apply(lambda x: df.loc['Start Time'] if x == 'Red on' else df.loc['End Time'])
And this one just gives me KeyError: 'Start Time'
.而这个只是给了我KeyError: 'Start Time'
。
What am I missing to check the string value of the 'Activity' column and pass the value in the 'Start Time' column if 'Activity' == Red on and 'End time' if else?我缺少什么来检查“活动”列的字符串值,如果“活动”==红色,则传递“开始时间”列中的值,否则传递“结束时间”?
Assuming you only have "Red on"/"Red off" in Activity, this is a use case for numpy.where
:假设您在 Activity 中只有“Red on”/“Red off”,这是numpy.where
的用例:
df['Change time'] = np.where(df['Activity'].eq('Red on'),
df['Start Time'], df['End Time'])
If you potentially have other values, use numpy.select
:如果您可能有其他值,请使用numpy.select
:
df['Change time'] = np.select([df['Activity'].eq('Red on'),
df['Activity'].eq('Red off')],
[df['Start Time'], df['End Time']], pd.NA)
df.loc[df['Activity'].eq('Red on'), 'Change time'] = df['Start Time']
df.loc[df['Activity'].eq('Red off'), 'Change time'] = df['End Time']
output: output:
Activity Start Time End Time Change time
0 Red on 2:15 3:00 2:15
1 Red on 2:30 3:15 2:30
2 Red off 1:45 2:30 2:30
3 Red off 2:45 3:30 3:30
You can do this straightforwardly in pandas:你可以在pandas中直接这样做:
df['Change Time'] = df['Start Time']
df['Change Time'][df['Activity']=='Red off'] = df['End Time']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.