简体   繁体   English

如何跳过pandas dataframe中的列继续循环处理

[英]How to skip columns in pandas dataframe and continue processing in loop

I am reading excel files in folder and loading into dataframe. I am fetching values from some columns where some excel files have columns missing that I am looking.我正在读取文件夹中的 excel 个文件并加载到 dataframe。我正在从一些列中获取值,其中一些 excel 文件缺少我正在查找的列。 In that case, I want to populate that missing column as N/A for that excel file and continue processing so I can have save all my results in dataframe. I am learning python and I need help here.在那种情况下,我想为该 excel 文件将缺失的列填充为 N/A 并继续处理,以便我可以将所有结果保存在 dataframe 中。我正在学习 python,我在这里需要帮助。 Below is my code下面是我的代码

from pathlib import Path
import pandas as pd

p = Path(path to excel)
filtered_files = [x for x in p.glob("**/*.xlsx") if not x.name.__contains__("AC0")]

for i, file in enumerate(filtered_files):
   full_df = pd.read_excel(file, sheet_name=[1], header=1)
   df = full_df[1]
   # get column
   col_1_name = df.columns[2]
   ded_ind_df = df[df[col_1_name] == 'DED Individual']
   ded_fmem_df = df[df[col_1_name] == 'DED Family Member']
   result[i] = {
    'IND DED INN': list(ded_ind_df['In-Network\nVALUE']),
    'DED FAM INN':list(ded_fmem_df ['In-Network\nVALUE']),
    'IND DED OON': list(ded_ind_df['Out-of-Network\nVALUE']),
    'DED FAM OON':list(ded_fmem_df ['Out-of-Network\nVALUE'])
     }
result = pd.DataFrame.from_dict(result)

When I run, I am getting below error当我运行时,出现以下错误

IND DED OON': list(ded_ind_df['Out-of-Network\nVALUE']),
indexer = self.columns.get_loc(key)
    raise KeyError(key) from err
KeyError: 'Out-of-Network\nVALUE'

This is because one of the excel has Out of Network column is not there.这是因为其中一个 excel 有 Out of Network 列不存在。 In this case, I want to skip and continue processing next file.在这种情况下,我想跳过并继续处理下一个文件。

You could try using one of these options before your result[i] =... line.您可以在result[i] =...行之前尝试使用这些选项之一。 To create the columns but using empty values创建列但使用空值

for column in ["In-Network\nVALUE", "Out-of-Network\nVALUE"]:
    if column not in ded_ind_df.columns:
        ded_ind_df[column] = None
# result[i] = {...

But, if you want to skip the iteration and jump to the next one:但是,如果您想跳过迭代并跳转到下一个:

if any(column not in ded_ind_df.columns for column in ["In-Network\nVALUE", "Out-of-Network\nVALUE"]):
    continue  # Skip current interation
# result[i] = {...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM