简体   繁体   中英

Python Pandas- remove rows based on given value

I think I am close but following error show up: Could you advice what is the reason?

raise KeyError(key) from err KeyError: 'DATE OF OPERATION'

The code is:

import pandas as pd
from pathlib import Path
source_files = sorted(Path(r'/Users/user/Downloads/').glob('*.csv'))

for file in source_files:
 df = pd.read_csv(file)
 #df.columns = df.columns.str.replace(' ', '_')
 df = df[~df['DATE OF OPERATION'].astype(str).str.startswith('202110')]
 #df.columns = df.columns.str.replace('_', ' ')
 name, ext = file.name.split('.')
 df.to_csv(f'{name}.{ext}', index=0)

error:

  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'DATE OF OPERATION'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/user/PycharmProjects/ShareOpe/ShareOpe.py", line 11, in <module>
    df = df.loc[~df['DATE OF OPERATION'].astype(str).str.startswith('202110')]
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'DATE OF OPERATION'

to remove rows you can use loc :

df = df.loc[~df['DATE OF OPERATION'].astype(str).startswith('202110')]

Check out this Pandas Article from may 14 2021.

#drop rows that contain specific 'value' in 'column_name'
df = df[df.your_column_name != value_to_remove]

Erros message was too long for comment so pasting it in Answer:

  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'DATE OF OPERATION'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/user/PycharmProjects/ShareOpe/ShareOpe.py", line 11, in <module>
    df = df.loc[~df['DATE OF OPERATION'].astype(str).str.startswith('202110')]
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'DATE OF OPERATION'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM