简体   繁体   English

熊猫-根据Datetime列值删除DataFrame行

[英]Pandas - Dropping DataFrame rows based on Datetime column value

I am currently writing a script where I want to drop some rows of my pandas dataframe according to Datetime values over several years (I want to drop rows where datetime is between February and May. So, I first tried the following code: 我目前正在编写一个脚本,希望在几年中根据Datetime值删除熊猫数据框的某些行(我希望删除datetime在2月至5月之间的行。因此,我首先尝试了以下代码:

game_df['Date'] = game_df[(game_df['Date'].dt.month < 2) & (game_df['Date'].dt.month > 5)]

It gave me the same dataframe with NaN values in the 'Date' column over this period of time. 在这段时间内,它在“日期”列中为我提供了具有NaN值的相同数据框。 So I tried the following code in order to drop the corresponding rows: 因此,我尝试了以下代码以删除相应的行:

game_df['Date'] = game_df[(game_df['Date'].dt.month < 2) & (game_df['Date'].dt.month > 5)].drop(game_df.columns)

But it raised an error like: labels [u' Date ' u' other_column1 ' u' other_column2 ' u' other_column3 ' u' other_column4 '] not contained in axis 但它引发的错误,如:标签[U“ 日期 ‘U’other_column1‘U’other_column2‘U’other_column3‘U’other_column4”]不包含在轴线

Does anyone can solve this problem? 有谁能解决这个问题?

I think you could try something like this using a list of Timestamp s: 我认为您可以使用Timestamp的列表尝试执行以下操作:

If you want to exclude rows with specific dates: 如果要排除具有特定日期的行:

game_df[~game_df['Date'].isin([pd.Timestamp('20150210'), pd.Timestamp('20150301')])]

The ~ is a not operator at the beginning of game_df in case you're not familiar with it. ~not运营商之初game_df如果你不熟悉它。 So it's saying to return the dataframe where the timestamps are not the two dates mentioned. 因此,这是要返回时间戳记不是提到的两个日期的数据帧。

Edit: If you want to exclude a range of rows between specific dates: 编辑:如果要排除特定日期之间的行范围

game_df[~game_df['Date'].isin(pd.date_range(start='20150210', end='20150301'))]

Actually, I've found what I was looking for with the following code: 实际上,我已经使用以下代码找到了想要的东西:

game_df = game_df[(game_df['Date'].dt.month != 2) & (game_df['Date'].dt.month != 3) & (game_df['Date'].dt.month != 4)\
                      & (game_df['Date'].dt.month != 5)]

It is pretty ugly and I truly think it can be done with a more efficient way but it works when it comes to exclude rows whose datetime values are located in a span of time. 这很丑陋,我确实认为可以用一种更有效的方法来完成它,但是当排除日期时间值位于某个时间范围内的行时,它可以工作。

Instead of dropping, I find query much more helpful. 我发现查询比删除更有用。 But you need to change arguments of course to include part of the data you want to keep. 但是,您当然需要更改参数以包括要保留的部分数据。

df.query("Date.dt.month < 2 & Date.dt.month > 5", inplace=True)

if you want to use exact dates: 如果您想使用确切的日期:

df.query("Date <= '2017-01-31' & Date >= '2017-05-01' ", inplace=True)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM