[英]How to print the first and last indices of separate groups of true values in a Pandas dataframe column
I wrote a program that analyzes HVAC data for operational faults. 我写了一个程序来分析HVAC数据中的操作故障。 The program feeds the input data through a set of rules, and the output is a Pandas dataframe like this one.
该程序通过一组规则来提供输入数据,而输出是像这样的Pandas数据框。
From that output, I use this code to iterate through each column, print the name of the column itself, and print the values from the index (Date) wherever a value in the other column is true: 从该输出中,我使用此代码遍历每一列,打印列本身的名称,并在另一列中的值为真的地方打印索引(Date)中的值:
pos = 0
for column in df:
try:
colname = faults[df.columns[pos]]
print "The fault -" +str (colname)+ "- occurred on:"
except Exception:
pass
try:
print df.loc[df[column] == True, 'Date'].iloc[:]
except TypeError:
pass
print
pos += 1
That output looks like this. 该输出看起来像这样。
The code works fine, but I want to change the output a bit. 该代码工作正常,但我想稍微更改输出。 I want to print just the first and last true values so that the output says something like "the fault occurred from 'x' to 'y'" instead of printing every time a true value occurs.
我只想打印第一个和最后一个真值,以便输出显示类似“从'x'到'y'的故障”,而不是每次出现真值时都打印。
The complicated part is that sometimes there may be multiple groups of 1's in a column, so I can't just print the first and last indices where there are true values. 复杂的部分是,有时一列中可能有多个1组,所以我不能只打印有真值的第一个和最后一个索引。 A column could look like (0,0,1,1,1,0,0,0,1,1,1,1,1,1,0,0,1,0), in which case I would want it to print "the fault occurred from here to here, here to here, and here."
一列可能看起来像(0,0,1,1,1,0,0,0,1,1,1,1,1,1,0,0,1,0),在这种情况下我想要它打印“故障从这里到这里,从这里到这里,从这里发生。”
Is there a way to print the first and last indices of each group of true values in a Pandas dataframe column? 有没有一种方法来打印每一组真值的第一个和最后一个指标在熊猫数据帧列?
here is my suggestion, itterate through the lists to find starts and ends (add first and last if neeeded) and zip them: 这是我的建议,请遍历列表以查找起点和终点(如果需要,请先添加和最后添加)并压缩它们:
df = pd.DataFrame()
df['rule_1'] = [0]*13
df['rule_2'] = [0,0,1,1,1,0,0,0,1,1,1,1,0]
df['rule_3'] = [1]*13
df.index = pd.date_range("2017-12-25 00:00", "2017-12-25 03:00",
freq='0.25H')
for col in df.columns:
starts = [i for i,x in list(enumerate(df[col].values))[1:-1] if
((x==1)&(df[col].values[i-1]==0))]
ends = [i for i,x in list(enumerate(df[col].values))[1:-1] if
((x==1)&(df[col].values[i+1]==0))]
if df[col].values[0]==1:
starts = [0]+starts
if df[col].values[-1]==1:
ends = ends + [-1]
print (col)
for x in zip(df.index[starts], df.index[ends]):
print(x)
print()
output: 输出:
rule_1 规则1
rule_2 规则_2
(Timestamp('2017-12-25 00:30:00'), Timestamp('2017-12-25 01:00:00')) (时间戳('2017-12-25 00:30:00'),时间戳('2017-12-25 01:00:00'))
(Timestamp('2017-12-25 02:00:00'), Timestamp('2017-12-25 02:45:00')) (时间戳('2017-12-25 02:00:00'),时间戳('2017-12-25 02:45:00'))
rule_3 rule_3
(Timestamp('2017-12-25 00:00:00'), Timestamp('2017-12-25 03:00:00')) (时间戳('2017-12-25 00:00:00'),时间戳('2017-12-25 03:00:00'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.