简体   繁体   English

使用 pyreadstat 读取内存中的 SPSS 文件(.sav 或 .zsav)

[英]Reading an SPSS file (.sav or .zsav) inmemory using pyreadstat

I've been developing a Django application.我一直在开发一个 Django 应用程序。 I know that there are some different of reading an SPSS file.我知道读取 SPSS 文件有一些不同。 One way is using pandas.一种方法是使用 pandas。

import pandas as pd

file_path = "./my_spss_file.sav"
df = pd.read_spss(file_path)

Another way is using pyreadstat另一种方法是使用 pyreadstat

import pyreadstat
df, meta = pyreadstat.read_sav('./my_spss_file.sav')

As you can see above, unlike pandas, using using pyreadstat I can get the meta information such as variables and values of labels.正如你在上面看到的,与 pandas 不同,使用 pyreadstat 我可以获得元信息,例如变量和标签的值。 So, that is what I am using.所以,这就是我正在使用的。 The problem with this pyreadstat is that I cannot use it for inmemory read.这个 pyreadstat 的问题是我不能将它用于内存读取。 After uploading an spss file from a browser, each time I have to upload it to a directory and then read the file from there using pyreadstat module.从浏览器上传 spss 文件后,每次我都必须将其上传到目录,然后使用 pyreadstat 模块从那里读取文件。

def upload_file(request):
    result = None
    # Get the context from the request.
    context = RequestContext(request)
    if request.is_ajax():
        if "POST" == request.method:
            global my_df
            global _explore
            global base_dir
            file = request.FILES['file']
            file_name = file.name
            base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
            try:
                my_df = None
                # Determine the type of the file and get the dataframe
                if file_name.endswith('.csv'):
                    my_df = pd.read_csv(file, header=0)
                elif file_name.endswith('.xlsx') or file_name.endswith('.xls'):
                    my_df = pd.read_excel(file, header=0)
                elif file_name.endswith('.sav') or file_name.endswith('.zsav'):
                    handle_uploaded_file(file, str(file))
                    file_path = os.path.join(base_dir, "upload\\") + file_name
                    my_df = util.read_spss_file(file_path)

def read_spss_file(f_name):
    df, meta = pyreadstat.read_sav(f_name, apply_value_formats=True)
    return df

def handle_uploaded_file(file, filename):
    upload_dir = os.path.join(base_dir, "upload\\") #base_dir + 'upload/'
    if not os.path.exists(upload_dir):
        os.mkdir(upload_dir)

    with open(upload_dir + filename, 'wb+') as destination:
        for chunk in file.chunks():
            destination.write(chunk)

I don't want to write an uploaded spss file to the disk.我不想将上传的 spss 文件写入磁盘。 So, I was wondering whether there is a way to read an inmemory spss file using pyreadstat module?所以,我想知道是否有办法使用 pyreadstat 模块读取内存中的 spss 文件?

Unfortunately it is not possible at the moment.不幸的是,目前这是不可能的。

Pyreadstat relies on the C library Readstat which currently absolutely requires a file on disk. Pyreadstat 依赖于 C 库 Readstat,它目前绝对需要磁盘上的文件。

The issue has been raised here .这个问题已经在这里提出了。

Pandas read_spss also uses pyreadstat in the background, so both methods are actually the same. Pandas read_spss也在后台使用了pyreadstat,所以这两种方法其实是一样的。

class TempFile(type(pathlib.Path())):  # type: ignore
    def __exit__(self, exc_type, exc_val, exc_tb):
        filepath = str(self.absolute())
        try:
            os.remove(filepath)
        except OSError:
            logger.error('romve temporary file: %s failed!', filepath)
        self._closed = True

buffer = BytesIO()  # the bytes data
    
with TempFile('/tmp/file.sav') as fp:
try:
    fp.write_bytes(io.getvalue())
    return read_sav(fp, encoding=encoding)
except xxx:
    # do some fallback
    pass

this will help read from bytes in memeory这将有助于从内存中读取字节

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM