[英]How to count rows in a data frame in Pandas conditionally against values in another column of the same data frame?
I have a data frame, with rows in it which I want to count conditionally 我有一个数据框,其中有行,我想有条件地计数
TIME VALUE Prev_Time
0 23:01 0 NaN
1 23:02 0 NaN
2 23:03 1 23:02
3 23:04 0 NaN
4 23:05 0 NaN
5 23:06 1 23:05
6 23:07 0 NaN
7 23:08 0 NaN
8 23:09 0 NaN
9 23:10 0 NaN
10 23:11 1 23:10
11 23:12 0 NaN
12 23:13 0 NaN
13 23:14 0 NaN
14 23:15 0 NaN
15 23:16 1 23:15
16 23:17 0 NaN
I want to count the rows based on a condition on Column 'Prev_Time', so that... 我想根据“ Prev_Time”列上的条件对行进行计数,以便...
The desired output should be 所需的输出应为
ROW_COUNT
0 2
1 3
2 5
3 5
4 2
And I want Total Counts too, somehting like (len(df)), which should print 我也想要总计数,像(len(df))这样的东西,应该打印出来
Total Count: 5
Find the good lines: 找到好的台词:
notnull=df[df.VALUE>0]
"""
TIME VALUE Prev_Time
2 23:03 1 23:02
5 23:06 1 23:05
10 23:11 1 23:10
15 23:16 1 23:15
"""
use np.split
to break : 使用
np.split
中断:
row_counts=pd.DataFrame({'ROW_COUNT':[len(x) for x in np.split(df,notnull.index)]})
"""
ROW_COUNT
0 2
1 3
2 5
3 5
4 2
"""
and count : 并计数:
len(row_counts)
"""
5
"""
This somewhat works, you can twerk the code to your needs, but basic idea somewhat! 这有点奏效,您可以根据需要调整代码,但是有些基本概念!
#Dummy data set
df1 = pd.DataFrame({'TIME': np.arange(17), 'VALUE': np.arange(-17,0), 'Prev_time': [np.nan, np.nan,1, np.nan, np.nan,2, np.nan, np.nan, np.nan, np.nan,4, np.nan, np.nan, np.nan, np.nan,5, np.nan]})
#gets the rows that are not null and extracts their index number
df=df1[df1['Prev_time'].notnull()].reset_index()
#Checking for the case where the last row might be null,
#need to add it manually to the index
if df.loc[len(df)-1]['index'] != (len(df1)-1):
df.loc[len(df)]=[len(df1),0,0,0]
count=df['index']-df['index'].shift(1).fillna(0)
len(count)
It may not be a perfect answer, shall get what you are looking for: 这可能不是一个完美的答案,它将满足您的要求:
import pandas as pd
#read the data
d = pd.read_csv('stackdata.txt')
#we need the last row to be identified, so give it a value
d['Prev_Time'][len(d)-1]=1
#get all the rows where Prev_Time is not null
ds = d[d.Prev_Time.notnull()]
#reset the index, you shall get an additional column named index
ds = ds.reset_index()
#get only the newly added index column
dst = ds[ds.columns[0]]
#get the diff of the series
dstr = dst.diff()
#Get the first value from the previous series and assign it.
dstr[0] = dst[0]
#Addd +1 to the last item -- result required.
dstr[len(dstr)-1] +=1
len(dstr)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.