[英]Reading an SPSS file (.sav or .zsav) inmemory using pyreadstat
I've been developing a Django application.我一直在开发一个 Django 应用程序。 I know that there are some different of reading an SPSS file.
我知道读取 SPSS 文件有一些不同。 One way is using pandas.
一种方法是使用 pandas。
import pandas as pd
file_path = "./my_spss_file.sav"
df = pd.read_spss(file_path)
Another way is using pyreadstat另一种方法是使用 pyreadstat
import pyreadstat
df, meta = pyreadstat.read_sav('./my_spss_file.sav')
As you can see above, unlike pandas, using using pyreadstat I can get the meta information such as variables and values of labels.正如你在上面看到的,与 pandas 不同,使用 pyreadstat 我可以获得元信息,例如变量和标签的值。 So, that is what I am using.
所以,这就是我正在使用的。 The problem with this pyreadstat is that I cannot use it for inmemory read.
这个 pyreadstat 的问题是我不能将它用于内存读取。 After uploading an spss file from a browser, each time I have to upload it to a directory and then read the file from there using pyreadstat module.
从浏览器上传 spss 文件后,每次我都必须将其上传到目录,然后使用 pyreadstat 模块从那里读取文件。
def upload_file(request):
result = None
# Get the context from the request.
context = RequestContext(request)
if request.is_ajax():
if "POST" == request.method:
global my_df
global _explore
global base_dir
file = request.FILES['file']
file_name = file.name
base_dir = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
try:
my_df = None
# Determine the type of the file and get the dataframe
if file_name.endswith('.csv'):
my_df = pd.read_csv(file, header=0)
elif file_name.endswith('.xlsx') or file_name.endswith('.xls'):
my_df = pd.read_excel(file, header=0)
elif file_name.endswith('.sav') or file_name.endswith('.zsav'):
handle_uploaded_file(file, str(file))
file_path = os.path.join(base_dir, "upload\\") + file_name
my_df = util.read_spss_file(file_path)
def read_spss_file(f_name):
df, meta = pyreadstat.read_sav(f_name, apply_value_formats=True)
return df
def handle_uploaded_file(file, filename):
upload_dir = os.path.join(base_dir, "upload\\") #base_dir + 'upload/'
if not os.path.exists(upload_dir):
os.mkdir(upload_dir)
with open(upload_dir + filename, 'wb+') as destination:
for chunk in file.chunks():
destination.write(chunk)
I don't want to write an uploaded spss file to the disk.我不想将上传的 spss 文件写入磁盘。 So, I was wondering whether there is a way to read an inmemory spss file using pyreadstat module?
所以,我想知道是否有办法使用 pyreadstat 模块读取内存中的 spss 文件?
Unfortunately it is not possible at the moment.不幸的是,目前这是不可能的。
Pyreadstat relies on the C library Readstat which currently absolutely requires a file on disk. Pyreadstat 依赖于 C 库 Readstat,它目前绝对需要磁盘上的文件。
The issue has been raised here .这个问题已经在这里提出了。
Pandas read_spss also uses pyreadstat in the background, so both methods are actually the same. Pandas read_spss也在后台使用了pyreadstat,所以这两种方法其实是一样的。
class TempFile(type(pathlib.Path())): # type: ignore
def __exit__(self, exc_type, exc_val, exc_tb):
filepath = str(self.absolute())
try:
os.remove(filepath)
except OSError:
logger.error('romve temporary file: %s failed!', filepath)
self._closed = True
buffer = BytesIO() # the bytes data
with TempFile('/tmp/file.sav') as fp:
try:
fp.write_bytes(io.getvalue())
return read_sav(fp, encoding=encoding)
except xxx:
# do some fallback
pass
this will help read from bytes in memeory这将有助于从内存中读取字节
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.