繁体   English   中英

用Python转置人口普查平面文件

[英]Transpose Census Flat Files in Python

我正在尝试转换此美国人口普查平面文件: http : //www2.census.gov/govs/retire/2013indiv_unit_reported_data.txt (使用Python)。

在第一列中,前14个字符代表一行,后三个字符代表一列。 第二列是该列和该行的值。 似乎无法找出使用Python将其制成表格的好方法。

旁注:我的最终目标是创建一个脚本,将这些文件自动导入到ArcGIS中,这就是为什么我试图在Python中做到这一点。

尽管您也可以在纯Python中执行此操作,但是使用pandas会使其成为一个非常简单的问题,因为它是一个枢轴操作:

df = pd.read_csv("2013indiv_unit_reported_data.txt", delim_whitespace=True, 
                 names=["rowcol", "data"])
df["row"] = df["rowcol"].str[:14]
df["col"] = df["rowcol"].str[14:]
df_new = df.pivot(index="row", columns="col", values="data")
df_new = df_new.fillna("")
df_new.to_csv("table.dat", index=False)

产生一个DataFrame,其左上角看起来像

>>> df_new.iloc[:5,:5]
col                 V87           X01          X02          X04           X05
row                                                                          
01000000003401  0131312  139748131312  82075131312               213456131312
01000000003402  01313NR  474241131312      01313NR               627892131312
01000000003403  01313NR       01313NR   3677131312                    0131312
01000000003701  01313NR     578131312      01313NR                 3309131312
01103703710000            122741313NR               119541313NR    27761313NR

和一个输出数据文件看起来像

>>> !head table.dat
V87,X01,X02,X04,X05,X06,X08,X11,X12,X21,X30,X33,X35,X42,X44,X46,X47,Z01,Z02,Z03,Z04,Z05,Z13,Z14,Z15,Z16,Z62,Z63,Z68,Z70,Z71,Z72,Z73,Z75,Z76,Z77,Z78,Z81,Z82,Z83,Z84,Z87,Z88,Z89,Z91,Z93,Z96,Z98,Z99
0131312,139748131312,82075131312,,213456131312,125363131312,1294714131312,895475131312,44837131312,393606131312,0131312,0131312,0131312,0131312,1309366131312,955067131312,3333131312,84169131312,10554131312,35773131312,3826131312,3498131312,780456131312,87838131312,27181131312,0131312,0131312,2266097131312,389145131312,1309366131312,172000131312,138000131312,0131312,53844131312,30325131312,2266097131312,5056820131312,9984289131312,958400131312,0131312,0131312,0131312,4461131312,0131312,01313NR,9767131312,984714131312,0131312,125363131312
01313NR,474241131312,01313NR,,627892131312,0131312,27384181313NR,1893321131312,55891131312,404296131312,932401131312,219743131312,01313NR,01313NR,29514461313NR,1963274131312,01313NR,133791131312,18568131312,69259131312,4990131312,4121131312,1720307131312,119270131312,53744131312,0131312,61902131312,3830519131312,378156131312,2951446131312,334155131312,304611131312,9006131312,01313NR,1337911313NR,38305191313NR,10514970131312,20596906131312,1963274131312,01313NR,01313NR,01313NR,26140131312,650756131312,01313NR,34803131312,2090646131312,01313NR,01313NR

如果您真的想手动执行此操作,则应执行以下操作:

with open("2013indiv_unit_reported_data.txt") as fp:
    all_data = {}
    for line in fp:
        rowcol, data = line.split()
        row, col = rowcol[:14], rowcol[14:]
        all_data[row, col] = data

import csv
rows, cols = [sorted({key[i] for key in all_data}) for i in range(2)]
with open("table2.dat", "wb") as fp: # python 2
    writer = csv.writer(fp)
    writer.writerow(cols)
    for row in rows:
        line = [all_data.get((row, col), '') for col in cols]
        writer.writerow(line)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM