简体   繁体   English

循环遍历 Python 中多列中的不同值

[英]Loop through different values in multiple columns in Python

I would like to loop through df below to find the duration of the activity where:我想遍历下面的df以找到活动的持续时间,其中:

  1. For the same X , there are two Y values.对于相同的X ,有两个Y值。
  2. If a turns 1 for either unique values of Y (eg., for X ==18 , a becomes 1 for Y==13 or a becomes 1 for Y==14 ), it marks the start of the activity;如果a1唯一值Y (例如,用于。 X ==18a变为1Y==13a变为1Y==14 ,它标志着活动的开始); if a turns 0 for both unique values of Y , it marks the end of the activity(eg., for X ==18 , a becomes 0 for Y==13 and a becomes 0 for Y==14 ).如果a0为的两个唯一值Y ,活动的标记它的端部(例如,对X ==18a变为0Y==13a变为0Y==14 )。
    Timestamp               X   Y   a   b       Type
0   2000-10-26 10:08:27.060 18  14  0.0 24.5    medium
1   2000-10-26 10:39:24.310 18  13  1.0 24.0    low     # Start 
2   2000-10-26 11:50:48.190 18  14  1.0 23.5    medium
3   2000-10-26 17:18:07.610 18  14  1.0 23.5    medium
4   2000-10-26 17:18:09.610 18  14  0.0 23.5    medium
5   2000-10-26 17:29:10.610 18  14  0.0 26.5    medium
6   2000-10-26 17:29:10.770 18  14  1.0 26.5    medium
7   2000-10-26 17:29:12.610 18  14  1.0 53.5    medium
8   2000-10-26 17:29:14.610 18  14  1.0 62.0    medium
9   2000-10-26 17:29:14.770 18  13  1.0 24.0    low
10  2000-10-26 17:29:16.610 18  14  1.0 64.5    medium
11  2000-10-26 17:29:18.770 18  14  0.0 64.5    medium
12  2000-10-26 17:29:18.770 18  13  0.0 24.0    low     # End 
13  2000-10-26 17:29:28.770 18  14  0.0 63.5    medium
14  2000-10-26 17:29:34.770 19  16  0.0 62.0    medium
15  2000-10-26 17:29:40.770 19  16  1.0 61.0    medium  # Start 
16  2000-10-26 17:29:46.770 19  16  1.0 60.0    medium
17  2000-10-26 17:32:01.180 19  17  1.0 25.0    low
18  2000-10-26 17:32:01.180 19  16  0.0 51.5    medium
19  2000-10-26 17:32:35.180 19  17  0.0 50.0    medium  # End 

reproducible example:可重现的例子:

from datetime import *
from pandas import *

df= pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
  1: Timestamp('2000-10-26 10:39:24.310000'),
  2: Timestamp('2000-10-26 11:50:48.190000'),
  3: Timestamp('2000-10-26 17:18:07.610000'),
  4: Timestamp('2000-10-26 17:18:09.610000'),
  5: Timestamp('2000-10-26 17:29:10.610000'),
  6: Timestamp('2000-10-26 17:29:10.770000'),
  7: Timestamp('2000-10-26 17:29:12.610000'),
  8: Timestamp('2000-10-26 17:29:14.610000'),
  9: Timestamp('2000-10-26 17:29:14.770000'),
  10: Timestamp('2000-10-26 17:29:16.610000'),
  11: Timestamp('2000-10-26 17:29:18.770000'),
  12: Timestamp('2000-10-26 17:29:18.770000'),
  13: Timestamp('2000-10-26 17:29:28.770000'),
  14: Timestamp('2000-10-26 17:29:34.770000'),
  15: Timestamp('2000-10-26 17:29:40.770000'),
  16: Timestamp('2000-10-26 17:29:46.770000'),
  17: Timestamp('2000-10-26 17:32:01.180000'),
  18: Timestamp('2000-10-26 17:32:01.180000'),
  19: Timestamp('2000-10-26 17:32:35.180000')},
 'X': {0: 18,
  1: 18,
  2: 18,
  3: 18,
  4: 18,
  5: 18,
  6: 18,
  7: 18,
  8: 18,
  9: 18,
  10: 18,
  11: 18,
  12: 18,
  13: 18,
  14: 19,
  15: 19,
  16: 19,
  17: 19,
  18: 19,
  19: 19},
 'Y': {0: 14,
  1: 13,
  2: 14,
  3: 14,
  4: 14,
  5: 14,
  6: 14,
  7: 14,
  8: 14,
  9: 13,
  10: 14,
  11: 14,
  12: 13,
  13: 14,
  14: 14,
  15: 14,
  16: 14,
  17: 13,
  18: 14,
  19: 13},
 'a': {0: 0.0,
  1: 1.0,
  2: 1.0,
  3: 1.0,
  4: 0.0,
  5: 0.0,
  6: 1.0,
  7: 1.0,
  8: 1.0,
  9: 1.0,
  10: 1.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 1.0,
  16: 1.0,
  17: 1.0,
  18: 0.0,
  19: 0.0},
 'b': {0: 24.5,
  1: 24.0,
  2: 23.5,
  3: 23.5,
  4: 23.5,
  5: 26.5,
  6: 26.5,
  7: 53.5,
  8: 62.0,
  9: 24.0,
  10: 64.5,
  11: 64.5,
  12: 24.0,
  13: 63.5,
  14: 62.0,
  15: 61.0,
  16: 60.0,
  17: 25.0,
  18: 51.5,
  19: 50.0},
 'Type': {0: 'medium',
  1: 'low',
  2: 'medium',
  3: 'medium',
  4: 'medium',
  5: 'medium',
  6: 'medium',
  7: 'medium',
  8: 'medium',
  9: 'low',
  10: 'medium',
  11: 'medium',
  12: 'low',
  13: 'medium',
  14: 'medium',
  15: 'medium',
  16: 'medium',
  17: 'low',
  18: 'medium',
  19: 'medium'}})

Expected output:预期输出:

Start                    End                      X   
2000-10-26 10:39:24.310  2000-10-26 17:29:18.770  18
2000-10-26 17:29:40.770  2000-10-26 17:32:35.180  19

How can I do this in a for-loop so that I could further manipulate the data-frame such calculating the mean for each unique value for Y in an activity?如何在 for 循环中执行此操作,以便我可以进一步操作数据框,例如计算活动中Y每个唯一值的平均值?


Edit:编辑:

Apologies for any confusion.对任何混淆表示歉意。 The activity should be determined by changing in a values and should by continuous, ie., after one activity ends we reset and count the next.活动应该通过改变a值来确定,并且应该是连续的,即,在一个活动结束后,我们重置并计算下一个。 So for example the following data-frame should generate the same output:因此,例如以下数据帧应生成相同的输出:


    Timestamp               X   Y   a   b       Type
0   2000-10-26 10:08:27.060 18  14  0.0 24.5    medium
1   2000-10-26 10:39:24.310 18  13  1.0 24.0    low
2   2000-10-26 11:50:48.190 18  14  1.0 23.5    medium
3   2000-10-26 17:18:07.610 18  14  1.0 23.5    medium
4   2000-10-26 17:18:09.610 18  14  0.0 23.5    medium
5   2000-10-26 17:29:10.610 18  14  0.0 26.5    medium
6   2000-10-26 17:29:10.770 18  14  1.0 26.5    medium
7   2000-10-26 17:29:12.610 18  14  1.0 53.5    medium
8   2000-10-26 17:29:14.610 18  14  1.0 62.0    medium
9   2000-10-26 17:29:14.770 18  13  1.0 24.0    low
10  2000-10-26 17:29:16.610 18  14  1.0 64.5    medium
11  2000-10-26 17:29:18.770 18  14  0.0 64.5    medium
12  2000-10-26 17:29:18.770 18  13  0.0 24.0    low
13  2000-10-26 17:29:28.770 18  14  0.0 63.5    medium
14  2000-10-26 17:29:34.770 18  14  0.0 62.0    medium
15  2000-10-26 17:29:40.770 18  14  1.0 61.0    medium
16  2000-10-26 17:29:46.770 18  14  1.0 60.0    medium
17  2000-10-26 17:32:01.180 18  13  1.0 25.0    low
18  2000-10-26 17:32:01.180 18  14  0.0 51.5    medium
19  2000-10-26 17:32:35.180 18  13  0.0 50.0    medium

Expected output:预期输出:

Start                    End                      X   
2000-10-26 10:39:24.310  2000-10-26 17:29:18.770  18
2000-10-26 17:29:40.770  2000-10-26 17:32:35.180  18

Here is a one possible solution, the issue is the end can be two values based on your constraints这是一个可能的解决方案,问题是end可以是基于您的约束的两个值

# Create dict for the solution
expected_dict = {'start': [], 'end': [], 'X': [], 'b_means': []}

# Loop through each group
for name, gp in df.groupby('X'):
  gp['y_shift'] = gp['Y'].shift(-1).ffill().astype(int)
  astart = gp[gp['a'].eq(1.0)].iloc[0]['pd.Timestamp']
  # End logic that can be updated
  aend = gp[gp['a'].eq(0.0) & gp['Y'].ne(gp['y_shift'])].iloc[-1]['pd.Timestamp']
  b_means = gp['b'].mean()

  # Collect all values
  expected_dict ['start'].append(astart)
  expected_dict ['end'].append(aend)
  expected_dict ['X'].append(name)
  expected_dict ['b_means'].append(name)

edf = pd.DataFrame(expected_dict)

                    start                     end   X    b_means
0 2000-10-26 10:39:24.310 2000-10-26 17:29:18.770  18  37.714286
1 2000-10-26 17:29:40.770 2000-10-26 17:32:01.180  19  51.583333

Code:代码:

from datetime import *
from pandas import *

df= pd.DataFrame({'Timestamp': {0: Timestamp('2000-10-26 10:08:27.060000'),
  1: Timestamp('2000-10-26 10:39:24.310000'),
  2: Timestamp('2000-10-26 11:50:48.190000'),
  3: Timestamp('2000-10-26 17:18:07.610000'),
  4: Timestamp('2000-10-26 17:18:09.610000'),
  5: Timestamp('2000-10-26 17:29:10.610000'),
  6: Timestamp('2000-10-26 17:29:10.770000'),
  7: Timestamp('2000-10-26 17:29:12.610000'),
  8: Timestamp('2000-10-26 17:29:14.610000'),
  9: Timestamp('2000-10-26 17:29:14.770000'),
  10: Timestamp('2000-10-26 17:29:16.610000'),
  11: Timestamp('2000-10-26 17:29:18.770000'),
  12: Timestamp('2000-10-26 17:29:18.770000'),
  13: Timestamp('2000-10-26 17:29:28.770000'),
  14: Timestamp('2000-10-26 17:29:34.770000'),
  15: Timestamp('2000-10-26 17:29:40.770000'),
  16: Timestamp('2000-10-26 17:29:46.770000'),
  17: Timestamp('2000-10-26 17:32:01.180000'),
  18: Timestamp('2000-10-26 17:32:01.180000'),
  19: Timestamp('2000-10-26 17:32:35.180000')},
 'X': {0: 18,
  1: 18,
  2: 18,
  3: 18,
  4: 18,
  5: 18,
  6: 18,
  7: 18,
  8: 18,
  9: 18,
  10: 18,
  11: 18,
  12: 18,
  13: 18,
  14: 19,
  15: 19,
  16: 19,
  17: 19,
  18: 19,
  19: 19},
 'Y': {0: 14,
  1: 13,
  2: 14,
  3: 14,
  4: 14,
  5: 14,
  6: 14,
  7: 14,
  8: 14,
  9: 13,
  10: 14,
  11: 14,
  12: 13,
  13: 14,
  14: 14,
  15: 14,
  16: 14,
  17: 13,
  18: 14,
  19: 13},
 'a': {0: 0.0,
  1: 1.0,
  2: 1.0,
  3: 1.0,
  4: 0.0,
  5: 0.0,
  6: 1.0,
  7: 1.0,
  8: 1.0,
  9: 1.0,
  10: 1.0,
  11: 0.0,
  12: 0.0,
  13: 0.0,
  14: 0.0,
  15: 1.0,
  16: 1.0,
  17: 1.0,
  18: 0.0,
  19: 0.0},
 'b': {0: 24.5,
  1: 24.0,
  2: 23.5,
  3: 23.5,
  4: 23.5,
  5: 26.5,
  6: 26.5,
  7: 53.5,
  8: 62.0,
  9: 24.0,
  10: 64.5,
  11: 64.5,
  12: 24.0,
  13: 63.5,
  14: 62.0,
  15: 61.0,
  16: 60.0,
  17: 25.0,
  18: 51.5,
  19: 50.0},
 'Type': {0: 'medium',
  1: 'low',
  2: 'medium',
  3: 'medium',
  4: 'medium',
  5: 'medium',
  6: 'medium',
  7: 'medium',
  8: 'medium',
  9: 'low',
  10: 'medium',
  11: 'medium',
  12: 'low',
  13: 'medium',
  14: 'medium',
  15: 'medium',
  16: 'medium',
  17: 'low',
  18: 'medium',
  19: 'medium'}})

print(df, '\n')

Y_ended = dict()
look_for_start = True
prv_X = 0
skip_rows_till_same_X = False

for ind,row in df.iterrows():
    if skip_rows_till_same_X and prv_X == row['X']:
        continue
    else:
        skip_rows_till_same_X = False

    Y_ended[row['Y']] = False if row['a'] else True        
    
    if look_for_start:
        if row['a']: # Start
            start_timestamp = row['Timestamp']
            look_for_start = False
            Y_ended = Y_ended.fromkeys(Y_ended.keys(), False) 
    else: 
        if all(Y_ended.values()):
            end_timestamp = row['Timestamp']
            look_for_start = True
            skip_rows_till_same_X = True
            print(start_timestamp, end_timestamp, row['X'])
            Y_ended = dict()
    prv_X = row['X']  

Output:输出:

                 Timestamp   X   Y    a     b    Type
0  2000-10-26 10:08:27.060  18  14  0.0  24.5  medium
1  2000-10-26 10:39:24.310  18  13  1.0  24.0     low
2  2000-10-26 11:50:48.190  18  14  1.0  23.5  medium
3  2000-10-26 17:18:07.610  18  14  1.0  23.5  medium
4  2000-10-26 17:18:09.610  18  14  0.0  23.5  medium
5  2000-10-26 17:29:10.610  18  14  0.0  26.5  medium
6  2000-10-26 17:29:10.770  18  14  1.0  26.5  medium
7  2000-10-26 17:29:12.610  18  14  1.0  53.5  medium
8  2000-10-26 17:29:14.610  18  14  1.0  62.0  medium
9  2000-10-26 17:29:14.770  18  13  1.0  24.0     low
10 2000-10-26 17:29:16.610  18  14  1.0  64.5  medium
11 2000-10-26 17:29:18.770  18  14  0.0  64.5  medium
12 2000-10-26 17:29:18.770  18  13  0.0  24.0     low
13 2000-10-26 17:29:28.770  18  14  0.0  63.5  medium
14 2000-10-26 17:29:34.770  19  14  0.0  62.0  medium
15 2000-10-26 17:29:40.770  19  14  1.0  61.0  medium
16 2000-10-26 17:29:46.770  19  14  1.0  60.0  medium
17 2000-10-26 17:32:01.180  19  13  1.0  25.0     low
18 2000-10-26 17:32:01.180  19  14  0.0  51.5  medium
19 2000-10-26 17:32:35.180  19  13  0.0  50.0  medium

2000-10-26 10:39:24.310000 2000-10-26 17:29:18.770000 18
2000-10-26 17:29:40.770000 2000-10-26 17:32:35.180000 19
    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM