[英]Selecting rows of pandas dataframe according to threshold of column
I have a pandas dataframe with a column "value" and a column "timestamp". 我有一个带有“值”列和“时间戳”列的熊猫数据框。 Now I would like to filter the rows according to thresholds of the timestamp.
现在,我想根据时间戳的阈值过滤行。 I have done the following:
我已经完成以下工作:
idx = df.index[df['timestamp'] >= start and df['timestamp'] <= end]
df = df.loc[idx]
df
is the dataframe and start
and end
are two integers. df
是数据帧, start
和end
是两个整数。
Somehow this does not work. 不知何故,这行不通。 I'm getting an error:
我收到一个错误:
ValueError: The truth value of a DataFrame is ambiguous.
ValueError:DataFrame的真值不明确。 Use a.empty, a.bool(), a.item(), a.any() or a.all().
使用a.empty,a.bool(),a.item(),a.any()或a.all()。
EDIT: There is a further problem. 编辑:还有一个问题。 start is a dataframe with only one value (one row, one column).
start是一个只有一个值(一行,一列)的数据帧。 End is a dataframe with several rows and only one column (but I'm only interested in the last row).
End是一个具有多行且只有一列的数据框(但我只对最后一行感兴趣)。 When I do the following
当我执行以下操作时
print(end.iloc[-1])
print(start.iloc[0])
I'm getting the following output 我得到以下输出
1508504026077
start_timestamp_milli 1508502348946
Name: 0, dtype: int64
When I then try to do print(df[column] >= start.iloc[0])
I'm getting an error: 然后,当我尝试执行
print(df[column] >= start.iloc[0])
,出现错误:
ValueError: Can only compare identically-labeled Series objects
ValueError:只能比较标记相同的Series对象
Consequently, mask=(df['timestamp'] >= start & df['timestamp'] <= end)
also failes. 因此,
mask=(df['timestamp'] >= start & df['timestamp'] <= end)
也失败。
IIUC IIUC
mask=(df['timestamp'] >= start & df['timestamp'] <= end)
df=df[mask]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.