简体   繁体   English

如何自动将 .dat 文件列表及其字典(单独的 .dct 格式)转换为 pandas 数据帧?

[英]How to automate the process of converting the list of .dat files, with their dictionaries (in seperate .dct format), to pandas data frames?

The following code coverts.dat files into data frames with the use of its dictionary file in.dct format.以下代码使用其字典文件 in.dct 格式将 .dat 文件转换为数据帧。 It works well.它运作良好。 But my problem is that I was unable to automate this process, creating a loop that takes the pairs of these files from lists is a little bit tricky, atleast for me.但我的问题是我无法自动执行此过程,创建一个从列表中获取这些文件对的循环有点棘手,至少对我来说是这样。 I could really use some help with that.我真的需要一些帮助。

try:
    from statadict import parse_stata_dict
except ImportError:
    !pip install statadict

import pandas as pd
from statadict import parse_stata_dict
dict_file = '2015_2017_FemPregSetup.dct'
data_file = '2015_2017_FemPregData.dat'
stata_dict = parse_stata_dict(dict_file)
stata_dict

nsfg = pd.read_fwf(data_file, 
                   names=stata_dict.names, 
                   colspecs=stata_dict.colspecs)
# nsfg is now a pandas DataFrame

These are the lists of files that I would like to convert into data frames.这些是我想转换成数据框的文件列表。 Every.dat file has its own dictionary file:每个.dat 文件都有自己的字典文件:

dat_name = ['2002FemResp.dat',
'2002Male.dat'...

dct_name = ['2002FemResp.dct',
'2002Male.dct'...

Assuming both lists have the same length and you will want to save the csv dataframe you could try:假设两个列表的长度相同并且您想要保存 csv dataframe 您可以尝试:

c=0
for dat,dct in zip(dat_name, dct_name):
    c+=1
    stata_dict = parse_stata_dict(dct)
    pd.read_fwf(dat, names=stata_dict.names, colspecs=stata_dict.colspecs).to_csv(r'path_name\file_name_{}.csv'.format(c))
    # don't forget the '.csv'! 

Also consider that if you are not using windows you need to use '/' rather than '\' in your path (or you can use os.path.join() to avoid this issue.还要考虑,如果您不使用 windows,则需要在路径中使用“/”而不是“\”(或者您可以使用os.path.join()来避免此问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM