如何在 python 中将 pkgutils.get_data 与 csv.reader 一起使用？

Question

我有一个 python 模块，它包含需要在运行时加载的各种数据文件（一组代表曲线的 csv 文件）。 csv 模块运行良好

  # curvefile = "ntc.10k.csv"
  raw = csv.reader(open(curvefile, 'rb'), delimiter=',')

但是如果我将此模块导入另一个脚本，我需要找到数据文件的完整路径。

/project
   /shared
       curve.py
       ntc.10k.csv
       ntc.2k5.csv
   /apps
       script.py

我希望 script.py 仅通过基本文件名而不是完整路径来引用曲线。 在模块代码中，我可以使用：

pkgutil.get_data("curve", "ntc.10k.csv")

这在查找文件时效果很好，但它返回已读入的 csv 文件，而 csv.reader 需要文件句柄本身。 有什么办法可以让这两个模块很好地协同工作？ 它们都是标准的库模块，所以我没想到会出现问题。 我知道我可以开始拆分 pkgutil 二进制文件数据，但是我可能不使用 csv 库。

我知道我可以只在模块代码中使用它，而忘记 pkgutils，但 pkgutils 似乎正是它的用途。

this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, curvefile)
raw = csv.reader(open(DATA_PATH, "rb"))

Answer 1

我向get_data打开了源代码，让它返回文件的路径而不是已加载的文件很简单。 这个模块应该可以解决问题。 使用关键字as_string=True返回读取到内存中的文件，或者使用as_string=False返回路径。

import os, sys

from pkgutil import get_loader

def get_data_smart(package, resource, as_string=True):
"""Rewrite of pkgutil.get_data() that actually lets the user determine if data should
be returned read into memory (aka as_string=True) or just return the file path.
"""

loader = get_loader(package)
if loader is None or not hasattr(loader, 'get_data'):
    return None
mod = sys.modules.get(package) or loader.load_module(package)
if mod is None or not hasattr(mod, '__file__'):
    return None

# Modify the resource name to be compatible with the loader.get_data
# signature - an os.path format "filename" starting with the dirname of
# the package's __file__
parts = resource.split('/')
parts.insert(0, os.path.dirname(mod.__file__))
resource_name = os.path.join(*parts)
if as_string:
    return loader.get_data(resource_name)
else:
    return resource_name

Answer 2

这不是理想的，尤其是对于非常大的文件，但是您可以使用StringIO通过read（）方法将字符串转换为csv.reader应该能够处理的东西。

csvdata = pkgutil.get_data("curve", "ntc.10k.csv") 
csvio = StringIO(csvdata)
raw = csv.reader(csvio)

Answer 3

另一种方法是将json.loads（）与file.decode（）一起使用。 由于get_data（）以字节为单位检索数据，需要按顺序将其转换为字符串以将其作为json处理

import json
import pkgutil
data_file = pkgutil.get_data('test.testmodel', 'data/test_data.json')
length_data_file = len(json.loads(data_file.decode()))

参考

Answer 4

在提出这个问题 10 多年后，但我使用谷歌来到这里并进入了其他答案中发布的兔子洞。 如今，这似乎更直接了当。 在我使用 stdlib 的importlib实现下面， importlib文件系统路径作为字符串返回到包的资源。 应该与 3.6+ 一起使用。

import importlib.resources
import os


def get_data_file_path(package: str, resource: str) -> str:
    """
    Returns the filesystem path of a resource marked as package
    data of a Python package installed.

    :param package: string of the Python package the resource is
                    located in, e.g. "mypackage.module"
    :param resource: string of the filename of the resource (do not
                     include directory names), e.g. "myfile.png"
    :return: string of the full (absolute) filesystem path to the
             resource if it exists.
    :raises ModuleNotFoundError: In case the package `package` is not found.
    :raises FileNotFoundError: In case the file in `resource` is not
                               found in the package.
    """
    # Guard against non-existing files, or else importlib.resources.path
    # may raise a confusing TypeError.
    if not importlib.resources.is_resource(package, resource):
        raise FileNotFoundError(f"Python package '{package}' resource '{resource}' not found.")

    with importlib.resources.path(package, resource) as resource_path:
        return os.fspath(resource_path)

如何在 python 中将 pkgutils.get_data 与 csv.reader 一起使用？

问题描述

4 个解决方案

解决方案1
5 2012-12-08 02:38:42

解决方案2
2 已采纳 2011-02-15 13:07:28

解决方案3
0 2017-07-11 06:22:16

解决方案4
0 2021-11-19 00:01:48

如何在 python 中将 pkgutils.get_data 与 csv.reader 一起使用？

问题描述

4 个解决方案

解决方案1 5 2012-12-08 02:38:42

解决方案2 2 已采纳 2011-02-15 13:07:28

解决方案3 0 2017-07-11 06:22:16

解决方案4 0 2021-11-19 00:01:48

解决方案1
5 2012-12-08 02:38:42

解决方案2
2 已采纳 2011-02-15 13:07:28

解决方案3
0 2017-07-11 06:22:16

解决方案4
0 2021-11-19 00:01:48