简体   繁体   English

Python Pandas 基于日期时间值对列进行计数和求和

[英]Python Pandas Counting and Summing columns based on datetime values

I am trying to count up values if they meet a certain condition and store it in another column, (I want to check how many Tickets are open at the same time as another ticket) Submit date & resolved date are columns looking like this我正在尝试计算满足某个条件的值并将其存储在另一列中,(我想检查有多少票与另一张票同时打开)提交日期和解决日期是这样的列

df['Submit_Date']  = 
  1   10/1/16 23:41
  2   10/1/16 23:50
  3  10/2/16 0:05
  4   10/3/16 5:17

df['Resolved_Date'] = 
  1  10/2/16 2:27
  2  3/9/17 19:39
  3  11/15/16 12:46
  4  11/14/16 17:37

I would like to look at row 2 and see which of the other 3 times were open during any of the same time as row 2 So this answer would be row 1, row 3, and row 4 as they all have submit dates or resolved dates that fall between Oct 2, 2016 and March 9,2017我想查看第 2 行,看看其他 3 次中的哪一次在与第 2 行相同的时间打开所以这个答案将是第 1 行、第 3 行和第 4 行,因为它们都有提交日期或解决日期2016 年 10 月 2 日至 2017 年 3 月 9 日之间
I want to do this for every row though, and scan through all the other columns我想对每一行都这样做,并扫描所有其他列

Here is what I have so far这是我到目前为止所拥有的

df['newcolumn'] = ((df['Submit_Date'] < df['Submit_Date']) |   (df['Resolved_Date'] > df['Resolved_Date'])).sum()

The problem is I want to check if the submit date in that current row is greater than all the other rows and the resolved date in that row is less than all the other rows.问题是我想检查当前行中的提交日期是否大于所有其他行,并且该行中的解决日期是否小于所有其他行。 I want to find all the values that match this criteria for each row and save it in the same row in a new column我想为每一行找到与此条件匹配的所有值,并将其保存在新列的同一行中

You would have to loop across the dataframe as you have to compare each row with every other row.您必须遍历 dataframe 因为您必须将每一行与其他每一行进行比较。 One improvement can be there in the below solution is by sorting by Submit_Date such that you have to compare with either below that record or above that record for the submit_date comparison.以下解决方案中的一项改进是按Submit_Date排序,这样您必须与该记录下方或该记录上方进行比较,以进行 submit_date 比较。

result = list()
for row in df.iterrows():
    cur_data = row[1]
    result.append((((cur_data['Submit_Date'] < df['Submit_Date']) & (df['Submit_Date']< cur_data['Resolved_Date']))
                  | ((cur_data['Submit_Date'] < df['Resolved_Date']) & (df['Resolved_Date'] < cur_data['Resolved_Date']))).sum())
df['count'] = result


         Submit_Date       Resolved_Date    count
1   2016-10-01 23:41:00 2016-10-02 02:27:00 2
2   2016-10-01 23:50:00 2017-03-09 19:39:00 3
3   2016-10-02 00:05:00 2016-11-15 12:46:00 2
4   2016-10-03 05:17:00 2016-11-14 17:37:00 0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM