如何有条件地根据同一数据帧另一列中的值对Pandas数据帧中的行进行计数？

Question

I have a data frame, with rows in it which I want to count conditionally 我有一个数据框，其中有行，我想有条件地计数

     TIME  VALUE Prev_Time
0   23:01      0       NaN
1   23:02      0       NaN
2   23:03      1     23:02
3   23:04      0       NaN
4   23:05      0       NaN
5   23:06      1     23:05
6   23:07      0       NaN
7   23:08      0       NaN
8   23:09      0       NaN
9   23:10      0       NaN
10  23:11      1     23:10
11  23:12      0       NaN
12  23:13      0       NaN
13  23:14      0       NaN
14  23:15      0       NaN
15  23:16      1     23:15
16  23:17      0       NaN

I want to count the rows based on a condition on Column 'Prev_Time', so that... 我想根据“ Prev_Time”列上的条件对行进行计数，以便...

In the first iteration, it starts counting the rows until one row before it finds out the 'Prev_Time' in the column. 在第一个迭代中，它开始对行进行计数，直到找到该列中的“ Prev_Time”为止的一行。
Second and the rest of the iterations, it starts counting including the row where the time is printed. 在第二次和其余的迭代中，它开始计数，包括打印时间的行。

The desired output should be 所需的输出应为

   ROW_COUNT
0          2
1          3
2          5
3          5
4          2

And I want Total Counts too, somehting like (len(df)), which should print 我也想要总计数，像（len（df））这样的东西，应该打印出来

Total Count: 5

Answer 1

Find the good lines: 找到好的台词：

notnull=df[df.VALUE>0]
"""
     TIME  VALUE Prev_Time
2   23:03      1     23:02
5   23:06      1     23:05
10  23:11      1     23:10
15  23:16      1     23:15
"""

use np.split to break : 使用np.split中断：

row_counts=pd.DataFrame({'ROW_COUNT':[len(x) for x in np.split(df,notnull.index)]})
"""
   ROW_COUNT
0          2
1          3
2          5
3          5
4          2
"""

and count : 并计数：

len(row_counts)
"""
5
"""

Answer 2

This somewhat works, you can twerk the code to your needs, but basic idea somewhat! 这有点奏效，您可以根据需要调整代码，但是有些基本概念！

#Dummy data set
df1 = pd.DataFrame({'TIME': np.arange(17), 'VALUE': np.arange(-17,0), 'Prev_time': [np.nan, np.nan,1, np.nan, np.nan,2, np.nan, np.nan, np.nan, np.nan,4, np.nan, np.nan, np.nan, np.nan,5, np.nan]})
#gets the rows that are not null and extracts their index number
df=df1[df1['Prev_time'].notnull()].reset_index()
#Checking for the case where the last row might be null, 
#need to add it manually to the index
if df.loc[len(df)-1]['index'] != (len(df1)-1):
   df.loc[len(df)]=[len(df1),0,0,0]
count=df['index']-df['index'].shift(1).fillna(0)
len(count)

Answer 3

It may not be a perfect answer, shall get what you are looking for: 这可能不是一个完美的答案，它将满足您的要求：

import pandas as pd

#read the data 
d = pd.read_csv('stackdata.txt')

#we need the last row to be identified, so give it a value
d['Prev_Time'][len(d)-1]=1

#get all the rows where Prev_Time is not null
ds = d[d.Prev_Time.notnull()]

#reset the index, you shall get an additional column named index
ds = ds.reset_index()
#get only the newly added index column
dst = ds[ds.columns[0]]

#get the diff of the series
dstr = dst.diff()

#Get the first value from the previous series and assign it. 
dstr[0] = dst[0]

#Addd +1 to the last item -- result required.
dstr[len(dstr)-1] +=1
len(dstr)

如何有条件地根据同一数据帧另一列中的值对Pandas数据帧中的行进行计数？

问题描述

3 个解决方案

解决方案1
3 已采纳 2016-04-05 13:02:28

解决方案2
0 2016-04-05 12:20:03

解决方案3
0 2016-04-05 12:21:25

如何有条件地根据同一数据帧另一列中的值对Pandas数据帧中的行进行计数？

问题描述

3 个解决方案

解决方案1 3 已采纳 2016-04-05 13:02:28

解决方案2 0 2016-04-05 12:20:03

解决方案3 0 2016-04-05 12:21:25

解决方案1
3 已采纳 2016-04-05 13:02:28

解决方案2
0 2016-04-05 12:20:03

解决方案3
0 2016-04-05 12:21:25