![](/img/trans.png)
[英]How to create another column that contains an operation based on two different values of a same categorical column in a pandas dataframe?
[英]How to put two different flags based on two thresholds when column value changes in a pandas Dataframe
我有一个数据框,比如说df
(最后两列我认为是datetime64[ns]
而不是str
),
data = [['abc', 'abc1', '1_1', '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
['abc', 'abc1', '1_2', '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
['abc', 'abc2', '1_1', '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
['abc', 'abc2', '1_2', '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
['abc', 'abc3', '1_1', '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
['abc', 'abc3', '1_2', '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
['abc', 'abc3', '2_1', '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999'],
['abc', 'abc3', '2_2', '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999']]
df = pd.DataFrame(data, columns = ['vehicle', 'order', 'work', 'Start', 'Finish'])
我想找到两件作品之间的时间。 例如,我想计算工作1_1 (vehicle: abc and order: abc1)
的完成时间1_1 (vehicle: abc and order: abc1)
和工作1_2
开始时间之间的时间。 我正在为每个不同的order
计算它。
vehicle order work Start Finish
0 abc abc1 1_1 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000
1 abc abc1 1_2 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000
2 abc abc2 1_1 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001
3 abc abc2 1_2 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001
4 abc abc3 1_1 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000
5 abc abc3 1_2 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000
6 abc abc3 2_1 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999
7 abc abc3 2_2 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999
我为此编写了一个代码并且它正在工作。
po_unique = df['order'].unique()
appended_data = []
for pos in po_unique:
x1 = df.copy()
x1 = x1.loc[x1['order'] == pos, :]
x1.reset_index(drop = True, inplace = True)
#print(x1)
aList = []
for i in range(len(x1) - 1):
t = (x1.Start[i + 1] - x1.Finish[i])/ dt.timedelta(hours=24)
aList.append(t)
aList.insert(0, 0)
x2 = x1.copy()
x2['flag'] = aList
appended_data.append(x2)
appended_data = pd.concat(appended_data)
我想收到一些关于代码的意见。 有没有其他方法可以做到这一点? appended_data[['order', 'work', 'flag']]
看起来像
Out[112]:
order work flag
0 abc1 1_1 0.000000
1 abc1 1_2 -1.166666
0 abc2 1_1 0.000000
1 abc2 1_2 -0.624999
0 abc3 1_1 0.000000
1 abc3 1_2 -2.020833
2 abc3 2_1 1.000000
3 abc3 2_2 -0.052082
现在我想创建另一列flag1
,如果标志列的值大于某个阈值,那么它将在此列中放置“F”。 我也可以通过使用.apply()
函数来做到这一点
thresold = 0.9
appended_data['flag1'] = appended_data.apply(lambda row: 'F' if row['flag'] > thresold else ' ', axis = 1)
但是,如果我想为两个不同的阈值设置标志,一个用于“内部”,例如1_1 to 1_2
,另一个用于“外部”(当前缀更改时),例如1_2 to 2_1
,那么该怎么做。 说threshold_sameprefix = -1.0
threshold_diffprefix = 0.8
预期输出
vehicle order work flag flag1
abc abc1 1_1 0.000000
abc abc1 1_2 -1.166666
abc abc2 1_1 0.000000
abc abc2 1_2 -0.624999 F1
abc abc3 1_1 0.000000
abc abc3 1_2 -2.020833
abc abc3 2_1 1.000000 F2
abc abc3 2_2 -0.052082 F1
请不要采取最低门槛并应用我所做的逻辑。 我想创建一个逻辑,我想以迭代方式分配标志,以便我可以自定义它。
1) 将work
ID 分成两部分: work_prefix
和work_suffix
:
df[['work_prefix', 'work_suffix']] = df['work'].str.split('_', expand=True)
2) 然后,定义一组与条件对应的布尔掩码。 这些布尔掩码是使用.groupby()
考虑相同order
组边界设置的:
threshold_sameprefix = -1.0 # given threshold value
threshold_diffprefix = 0.8 # given threshold value
w_ne = df['work'] != df.groupby('order')['work'].shift() # work id changed
wp_eq = df['work_prefix'] == df.groupby('order')['work_prefix'].shift() # same work prefix
wp_ne = df['work_prefix'] != df.groupby('order')['work_prefix'].shift() # different work prefix
m1 = w_ne & wp_eq & (df['flag'] > threshold_sameprefix) # condition for 'F1'
m2 = w_ne & wp_ne & (df['flag'] > threshold_diffprefix) # condition for 'F2'
3) 最后,使用.loc
和布尔掩码来设置flag1
的值为F1
和F2
,如下所示:
df['flag1'] = ' ' # init flag1 to blank
df.loc[m1, 'flag1'] = 'F1'
df.loc[m2, 'flag1'] = 'F2'
输入
vehicle order work flag
0 abc abc1 1_1 0.000000
1 abc abc1 1_2 -1.166666
2 abc abc2 1_1 0.000000
3 abc abc2 1_2 -0.624999
4 abc abc3 1_1 0.000000
5 abc abc3 1_2 -2.020833
6 abc abc3 2_1 1.000000
7 abc abc3 2_2 -0.052082
输出:
vehicle order work flag work_prefix work_suffix flag1
0 abc abc1 1_1 0.000000 1 1
1 abc abc1 1_2 -1.166666 1 2
2 abc abc2 1_1 0.000000 1 1
3 abc abc2 1_2 -0.624999 1 2 F1
4 abc abc3 1_1 0.000000 1 1
5 abc abc3 1_2 -2.020833 1 2
6 abc abc3 2_1 1.000000 2 1 F2
7 abc abc3 2_2 -0.052082 2 2 F1
或者,您可以通过以下方式删除 2 个工作列work_prefix
和work_suffix
:
df = df.drop(['work_prefix', 'work_suffix'], axis=1)
要更有效地设置您的第一列flag
而不是使用循环,您可以使用:
data = [['abc', 'abc1', '1_1', '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
['abc', 'abc1', '1_2', '2021-06-01 06:00:00.035999', '2021-06-02 09:59:59.964000'],
['abc', 'abc2', '1_1', '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
['abc', 'abc2', '1_2', '2021-06-01 06:00:00.035999', '2021-06-01 20:59:59.964001'],
['abc', 'abc3', '1_1', '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
['abc', 'abc3', '1_2', '2021-06-01 06:00:00.035999', '2021-06-03 06:29:59.964000'],
['abc', 'abc3', '2_1', '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999'],
['abc', 'abc3', '2_2', '2021-06-04 06:30:00.000001', '2021-06-04 07:44:59.927999']]
df = pd.DataFrame(data, columns = ['vehicle', 'order', 'work', 'Start', 'Finish'])
df['Start'] = pd.to_datetime(df['Start'])
df['Finish'] = pd.to_datetime(df['Finish'])
用循环替换代码的主要代码:
df['flag'] = ((df['Start'] - df.groupby('order')['Finish'].shift()) / pd.Timedelta(days=1)).fillna(0)
结果:
print(df)
vehicle order work Start Finish flag
0 abc abc1 1_1 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000 0.000000
1 abc abc1 1_2 2021-06-01 06:00:00.035999 2021-06-02 09:59:59.964000 -1.166666
2 abc abc2 1_1 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001 0.000000
3 abc abc2 1_2 2021-06-01 06:00:00.035999 2021-06-01 20:59:59.964001 -0.624999
4 abc abc3 1_1 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000 0.000000
5 abc abc3 1_2 2021-06-01 06:00:00.035999 2021-06-03 06:29:59.964000 -2.020833
6 abc abc3 2_1 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999 1.000000
7 abc abc3 2_2 2021-06-04 06:30:00.000001 2021-06-04 07:44:59.927999 -0.052082
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.