简体   繁体   English

pd.Series - “文件名” KeyError

[英]pd.Series - “Filename” KeyError

I am having trouble running a script for getting counts of predictions from csv files at a given directory.我在运行脚本以从给定目录中的 csv 文件中获取预测计数时遇到问题。 The format of the csv looks like this: csv 的格式如下所示:

Sample data样本数据

and the code is the following:代码如下:

    import os
    from glob import glob
    import pandas as pd

    def get_count(distribution, keyname):
    try:
        count = distribution[keyname]
    except KeyError:
        count = 0
    return count

    main_path = "K:\\...\\folder_name"

    folder_paths = glob("%s\\*" % main_path)
    data = []

    for path in folder_paths:
    file_name = os.path.splitext(os.path.basename(path))[0]
    results = pd.read_csv(path, error_bad_lines=False)results['Label'] = pd.Series(results['Filename'].str.split("\\").str[0])

    distribution = results.Predictions.value_counts()
    print(distribution)

    num_of_x = get_count(distribution, "x")
    num_of_y = get_count(distribution,"y")
    num_of_z = get_count(distribution,"z") 

    d = {"filename": file_name, "x": num_of_x, "y": num_of_y, "z": num_of_z}
    data.append(d)
    df = pd.DataFrame(data=data)
    df.to_csv(os.path.join(main_path,"summary_counts.csv"), index=False)

the output error is Keyerror: "Filename" reffering to the pd.Series function, anyone would know how to solve this? output 错误是 Keyerror: "Filename" 参考 pd.Series function,有人会知道如何解决这个问题吗?

I am using Python 3.7.3 and pandas 1.0.5 and I am a beginner in programming...我正在使用 Python 3.7.3 和 pandas 1.0.5,我是编程初学者...

Many thanks in advance提前谢谢了

During handling of the above exception, another exception occurred:
Traceback (most recent call last):

File ".\save_counts.py", line 24, in <module>
results['Label'] = pd.Series(results['Filename'].str.split("\\").str[0])
File "K:\...\lib\site-packages\pandas\core\frame.py
", line 2800, in __getitem__
indexer = self.columns.get_loc(key)
File "K:\...\site-packages\pandas\core\indexes\
base.py", line 2648, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas\_libs\index.pyx", line 111, in           pandas._libs.index.IndexEngine.get
_loc
 File "pandas\_libs\index.pyx", line 138, in    pandas._libs.index.IndexEngine.get
 _loc
File "pandas\_libs\hashtable_class_helper.pxi", line 1619, in pandas._libs.has
htable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 1627, in pandas._libs.has
htable.PyObjectHashTable.get_item
KeyError: 'Filename'

in here:在这里:

for path in folder_paths:
    file_name = os.path.splitext(os.path.basename(path))[0]
    results = pd.read_csv(path, error_bad_lines=False)
    results['Label'] = pd.Series(results['Filename'].str.split("\\").str[0])

you are creating pd.Series , but those values exist only inside this for loop.您正在创建pd.Series ,但这些值仅存在于此for循环中。

if after this loop you want to use results df in distribution you need to use append()如果在这个循环之后你想在distribution中使用results df 你需要使用append()

create empty df and append results in this df创建空 df 和 append 结果在这个 df

final_results = pd.Dataframe()
for path in folder_paths:
        file_name = os.path.splitext(os.path.basename(path))[0]
        results = pd.read_csv(path, error_bad_lines=False)
        results['Label'] = pd.Series(results['Filename'].str.split("\\").str[0])
        final_results = final_results.append(results)
#and from this point you can continue

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM