简体   繁体   English

根据列阈值选择熊猫数据框的行

[英]Selecting rows of pandas dataframe according to threshold of column

I have a pandas dataframe with a column "value" and a column "timestamp". 我有一个带有“值”列和“时间戳”列的熊猫数据框。 Now I would like to filter the rows according to thresholds of the timestamp. 现在,我想根据时间戳的阈值过滤行。 I have done the following: 我已经完成以下工作:

idx = df.index[df['timestamp'] >= start and df['timestamp'] <= end]
df = df.loc[idx]

df is the dataframe and start and end are two integers. df是数据帧, startend是两个整数。

Somehow this does not work. 不知何故,这行不通。 I'm getting an error: 我收到一个错误:

ValueError: The truth value of a DataFrame is ambiguous. ValueError:DataFrame的真值不明确。 Use a.empty, a.bool(), a.item(), a.any() or a.all(). 使用a.empty,a.bool(),a.item(),a.any()或a.all()。

EDIT: There is a further problem. 编辑:还有一个问题。 start is a dataframe with only one value (one row, one column). start是一个只有一个值(一行,一列)的数据帧。 End is a dataframe with several rows and only one column (but I'm only interested in the last row). End是一个具有多行且只有一列的数据框(但我只对最后一行感兴趣)。 When I do the following 当我执行以下操作时

    print(end.iloc[-1])
    print(start.iloc[0])

I'm getting the following output 我得到以下输出

1508504026077
start_timestamp_milli    1508502348946
Name: 0, dtype: int64

When I then try to do print(df[column] >= start.iloc[0]) I'm getting an error: 然后,当我尝试执行print(df[column] >= start.iloc[0]) ,出现错误:

ValueError: Can only compare identically-labeled Series objects ValueError:只能比较标记相同的Series对象

Consequently, mask=(df['timestamp'] >= start & df['timestamp'] <= end) also failes. 因此, mask=(df['timestamp'] >= start & df['timestamp'] <= end)也失败。

IIUC IIUC

mask=(df['timestamp'] >= start & df['timestamp'] <= end)

df=df[mask]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM