简体   繁体   English

如何将 large.sav 文件转换为 csv 文件

[英]How to convert large .sav file into csv file

I am trying to convert a big ~2GB SPSS (.SAV) file into CSV using Python.我正在尝试使用 Python 将一个大的 ~2GB SPSS (.SAV) 文件转换为 CSV。

If there was a file which size < 500MB, there is no problem doing the following:如果有一个大小 < 500MB 的文件,则执行以下操作没有问题:

import pandas as pd
df = pd.read_spss('stdFile.sav')
df.to_csv("stdFile.csv", encoding = "utf-8-sig")

but in this case, i got a MemoryError...但在这种情况下,我得到了一个 MemoryError ......

Iam looking forward solutions, not necessarily in Python.我期待解决方案,不一定在 Python 中。 But I don't have a SPSS license, so I must transform the file with another tool.但我没有 SPSS 许可证,所以我必须用另一个工具转换文件。

You can use python's pyreadstat package to read the spss file in chunks, and save each chunk to the csv:可以使用python的pyreadstat package分块读取spss文件,并将每个块保存到csv中:

import pyreadstat
fpath = "path/to/stdFile.sav"
outpath = "stdFile.csv"
# chunksize determines how many rows to be read per chunk
reader = pyreadstat.read_file_in_chunks(pyreadstat.read_sav, fpath, chunksize= 10000)

cnt = 0
for df, meta in reader:
    # if on the first iteration write otherwise append
    if cnt>0:
        wmode = "a"
        header = False
    else:
        wmode = "w"
        header = True
    # write
    df.to_csv(outpath, mode=wmode, header=header)
    cnt+=1


more information here: https://github.com/Roche/pyreadstat#reading-rows-in-chunks更多信息在这里: https://github.com/Roche/pyreadstat#reading-rows-in-chunks

First import module savReaderWriter to convert.sav file into structured array then import module numpy to convert structured array into csv:首先导入模块savReaderWriter将.sav文件转换为结构化数组,然后导入模块numpy将结构化数组转换为csv:

pip install savReaderWriter

savReaderWriter savReaderWriter

import savReaderWriter 
import numpy as np

reader_np = savReaderWriter.SavReaderNp("stdFile.sav")
array = reader_np.to_structured_array("outfile.dat") 
np.savetxt("stdFile.csv", array, delimiter=",")
reader_np.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM