简体   繁体   English

从 Pandas 的数据框中删除带有周末的行

[英]Removing rows with weekends from dataframe in Pandas

I have a Pandas dataframe that looks like this:我有一个看起来像这样的 Pandas 数据框:

df.head()

        Date    Abscount    Year    Quarter Month   Week Number
0   2022-01-03  7.0     2022    1   1   1
1   2022-01-04  17.0    2022    1   1   1
2   2022-01-05  16.0    2022    1   1   1
3   2022-01-06  18.0    2022    1   1   1
4   2022-01-07  18.0    2022    1   1   1

There are a few rows with dates corresponding to the weekends, that I want to remove.有几行日期对应于周末,我想删除它们。

I am trying the following code to duplicate the dataframe with another column that shows the day of the week and then plan to drop those rows with a condition.我正在尝试使用以下代码将数据框与显示星期几的另一列复制,然后计划删除带有条件的那些行。

However, the following code does not work:但是,以下代码不起作用:

df = df[df['Date'].dt.day_name()]

Error:错误:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
c:\Users\013555\Desktop\Time Series Forecasting for Absence\Time series_absence_forecast.ipynb Cell 21' in <cell line: 1>()
----> 1 df = df[df['Date'].dt.day_name()]

File c:\Users\013555\Anaconda3\lib\site-packages\pandas\core\frame.py:3511, in DataFrame.__getitem__(self, key)
   3509     if is_iterator(key):
   3510         key = list(key)
-> 3511     indexer = self.columns._get_indexer_strict(key, "columns")[1]
   3513 # take() does not accept boolean indexers
   3514 if getattr(indexer, "dtype", None) == bool:

File c:\Users\013555\Anaconda3\lib\site-packages\pandas\core\indexes\base.py:5782, in Index._get_indexer_strict(self, key, axis_name)
   5779 else:
   5780     keyarr, indexer, new_indexer = self._reindex_non_unique(keyarr)
-> 5782 self._raise_if_missing(keyarr, indexer, axis_name)
   5784 keyarr = self.take(indexer)
   5785 if isinstance(key, Index):
   5786     # GH 42790 - Preserve name from an Index

File c:\Users\013555\Anaconda3\lib\site-packages\pandas\core\indexes\base.py:5842, in Index._raise_if_missing(self, key, indexer, axis_name)
   5840     if use_interval_msg:
   5841         key = list(key)
-> 5842     raise KeyError(f"None of [{key}] are in the [{axis_name}]")
   5844 not_found = list(ensure_index(key)[missing_mask.nonzero()[0]].unique())
   5845 raise KeyError(f"{not_found} not in index")

KeyError: "None of [Index(['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday',\n       'Sunday', 'Monday', 'Tuesday', 'Wednesday',\n       ...\n       'Saturday', 'Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday',\n       'Friday', 'Saturday', 'Sunday', 'Monday'],\n      dtype='object', length=141)] are in the [columns]"

Please help.请帮忙。 Where am I going wrong?我哪里错了?

What you want to do is simply:你想做的很简单:

df['new_col_name'] = df['Date'].dt.day_name()

However, if you only need the column for a condition, you don't need to add it to the DataFrame.但是,如果您只需要某个条件的列,则无需将其添加到 DataFrame。 You can use it to filter it directly:您可以使用它直接过滤它:

# Example: remove weekends.
df = df[~df['Date'].dt.day_name().isin(['Saturday', 'Sunday'])]

Instead, your line of code was trying to filter the DataFrame with the result of df['Date'].dt.day_name() .相反,您的代码行试图使用df['Date'].dt.day_name()的结果过滤 DataFrame。 Of course, the index of df did not contain the values resulting from it.当然, df的索引不包含由此产生的值。

Note that using DatetimeIndex.weekday should be faster:请注意,使用DatetimeIndex.weekday应该更快:

df = df[df['Date'].weekday.lt(5)]

Found the answer.找到了答案。 I just had to add a column using the following code:我只需要使用以下代码添加一列:

df["Day"] = df["Date"].dt.day_name()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM