简体   繁体   English

使用python读取txt文件的每一行并将其部分分成csv文件

[英]read each line of a txt file and section parts out into a csv file using python

I have a txt file that is formatted to have each line have a maximum of 784 characters. 我有一个txt文件,其格式设置为每行最多784个字符。 Each line will be a row in the csv and certain lengths of characters are the columns. 每行将是csv中的一行,并且某些长度的字符是列。

I have it working to run perfectly on the first line but I cannot figure out how to get it to run on every line in the file. 我可以使其在第一行上完美运行,但是我无法弄清楚如何使其在文件的每一行上运行。 I have tried a few different approaches but I think I am just going down the wrong rabbit holes. 我尝试了几种不同的方法,但是我认为我只是走错了兔子洞。 I hope you guys can help! 希望大家能帮忙! The code is below: 代码如下:

with open('file.txt', 'r') as f, open('file.csv', 'w') as out_f:

        each_line = f.readline()
        filet = each_line[0]
        srcky = each_line[1: 33]
        clmst = each_line[34]
        postd = each_line[35:43]
        rcvdt = each_line[43:51]
        mbrno = each_line[51:62]
        pelcd = each_line[62:64]
        plname = each_line[64:89]
        pmfnam = each_line[89:114]
        pmbidt = each_line[114:122]
        pmbsex = each_line[122]
        mmbrno = each_line[123:134]
        mlname = each_line[134:159]
        mmfnam = each_line[159:184]
        mmbsex = each_line[184]
        mmbidt = each_line[185:193]
        grpid = each_line[193:199]
        plncd = each_line[199:202]
        aprno = each_line[202:211]
        prvno = each_line[211:217]
        psnam = each_line[217:232]
        ptype = each_line[232:242]
        pazip = each_line[242:251]
        pprov = each_line[251]
        lineCounter = each_line[252:260]
        ssvdt = each_line[260:268]
        ensvdt = each_line[268:276]
        enplsv = each_line[276:278]
        ensrsn = each_line[278:280]
        becat = each_line[280:283]
        diag1 = each_line[283:291]
        diag2 = each_line[291:299]
        diag3 = each_line[299:307]
        diag4 = each_line[307:315]
        diag5 = each_line[315:323]
        diag6 = each_line[323:331]
        diag7 = each_line[331:339]
        diag8 = each_line[339:347]
        diag9 = each_line[347:355]
        diag10 = each_line[355:363]
        endxf1 = each_line[363]
        endxf2 = each_line[364]
        endxf3 = each_line[365]
        endxf4 = each_line[366]
        pcdcd = each_line[367:376]
        emod1 = each_line[376:378]
        emod2 = each_line[378:380]
        emod3 = each_line[380:382]
        emod4 = each_line[382:384]
        pcdqt = each_line[384:387]
        pcdqt1 = each_line[387:389]
        bilam = each_line[389:400]
        netam = each_line[400:411]
        alwam = each_line[411:422]
        dctam = each_line[422:433]
        copam = each_line[433:444]
        ncvam = each_line[444:455]
        cobsv = each_line[455:466]
        ncrsn = each_line[466:468]
        revcd = each_line[468:472]
        drgcd = each_line[472:475]
        sprc1 = each_line[475:483]
        sprc2 = each_line[483:491]
        sprc3 = each_line[491:499]
        sprc4 = each_line[499:507]
        sprc5 = each_line[507:515]
        sprc6 = each_line[515:523]
        filetc = each_line[523:570]
        vvndno = each_line[570:580]
        vname = each_line[580:610]
        vadd1 = each_line[610:665]
        vadd2 = each_line[665:720]
        vcity = each_line[720:735]
        vstate = each_line[735:737]
        vzip = each_line[737:746]
        hpatc = each_line[746:784]
        icdver = each_line[784]

        newfile = (filet + ',' + srcky + ',' + clmst + ',' + postd + ',' + rcvdt + ',' + mbrno + ',' + pelcd + ',' + plname + ',' + pmfnam + ',' + pmbidt + ',' + pmbsex + ',' + mmbrno + ',' + mlname + ',' + mmfnam + ',' + mmbsex + ',' + mmbidt + ',' + grpid + ',' + plncd + ',' + aprno + ',' + prvno + ',' + psnam + ',' + ptype + ',' + pazip + ',' + pprov + ',' + lineCounter + ',' + ssvdt + ',' + ensvdt + ',' + enplsv + ',' + ensrsn + ',' + becat + ',' + diag1 + ',' + diag2 + ',' + diag3 + ',' + diag4 + ',' + diag5 + ',' + diag6 + ',' + diag7 + ',' + diag8 +
                ',' + diag9 + ',' + diag10 + ',' + endxf1 + ',' + endxf2 + ',' + endxf3 + ',' + endxf4 + ',' + pcdcd + ',' + emod1 + ',' + emod2 + ',' + emod3 + ',' + emod4 + ',' + pcdqt + ',' + pcdqt1 + ',' + bilam + ',' + netam + ',' + alwam + ',' + dctam + ',' + copam + ',' + ncvam + ',' + cobsv + ',' + ncrsn + ',' + revcd + ',' + drgcd + ',' + sprc1 + ',' + sprc2 + ',' + sprc3 + ',' + sprc4 + ',' + sprc5 + ',' + sprc6 + ',' + filetc + ',' + vvndno + ',' + vname + ',' + vadd1 + ',' + vadd2 + ',' + vcity + ',' + vstate + ',' + vzip + ',' + hpatc + ',' + icdver)
        out_f.write(str(newfile))
      out_f.close

You need to iterate over each line in the input file with a for loop. 您需要使用for循环遍历输入文件中的每一行。 Also, you do not need to close a file handle inside a with statement 另外,您不需要在with语句中关闭文件句柄

with open('file.txt', 'r') as f, open('file.csv', 'w') as out_f:
    for each_line in f:
        filet = each_line[0]
        srcky = each_line[1: 33]
        clmst = each_line[34]
        postd = each_line[35:43]
        rcvdt = each_line[43:51]
        mbrno = each_line[51:62]
        pelcd = each_line[62:64]
        plname = each_line[64:89]
        pmfnam = each_line[89:114]
        pmbidt = each_line[114:122]
        pmbsex = each_line[122]
        mmbrno = each_line[123:134]
        mlname = each_line[134:159]
        mmfnam = each_line[159:184]
        mmbsex = each_line[184]
        mmbidt = each_line[185:193]
        grpid = each_line[193:199]
        plncd = each_line[199:202]
        aprno = each_line[202:211]
        prvno = each_line[211:217]
        psnam = each_line[217:232]
        ptype = each_line[232:242]
        pazip = each_line[242:251]
        pprov = each_line[251]
        lineCounter = each_line[252:260]
        ssvdt = each_line[260:268]
        ensvdt = each_line[268:276]
        enplsv = each_line[276:278]
        ensrsn = each_line[278:280]
        becat = each_line[280:283]
        diag1 = each_line[283:291]
        diag2 = each_line[291:299]
        diag3 = each_line[299:307]
        diag4 = each_line[307:315]
        diag5 = each_line[315:323]
        diag6 = each_line[323:331]
        diag7 = each_line[331:339]
        diag8 = each_line[339:347]
        diag9 = each_line[347:355]
        diag10 = each_line[355:363]
        endxf1 = each_line[363]
        endxf2 = each_line[364]
        endxf3 = each_line[365]
        endxf4 = each_line[366]
        pcdcd = each_line[367:376]
        emod1 = each_line[376:378]
        emod2 = each_line[378:380]
        emod3 = each_line[380:382]
        emod4 = each_line[382:384]
        pcdqt = each_line[384:387]
        pcdqt1 = each_line[387:389]
        bilam = each_line[389:400]
        netam = each_line[400:411]
        alwam = each_line[411:422]
        dctam = each_line[422:433]
        copam = each_line[433:444]
        ncvam = each_line[444:455]
        cobsv = each_line[455:466]
        ncrsn = each_line[466:468]
        revcd = each_line[468:472]
        drgcd = each_line[472:475]
        sprc1 = each_line[475:483]
        sprc2 = each_line[483:491]
        sprc3 = each_line[491:499]
        sprc4 = each_line[499:507]
        sprc5 = each_line[507:515]
        sprc6 = each_line[515:523]
        filetc = each_line[523:570]
        vvndno = each_line[570:580]
        vname = each_line[580:610]
        vadd1 = each_line[610:665]
        vadd2 = each_line[665:720]
        vcity = each_line[720:735]
        vstate = each_line[735:737]
        vzip = each_line[737:746]
        hpatc = each_line[746:784]
        icdver = each_line[784]

        newline = (filet + ',' + srcky + ',' + clmst + ',' + postd + ',' + rcvdt + ',' + mbrno + ',' + pelcd + ',' + plname + ',' + pmfnam + ',' + pmbidt + ',' + pmbsex + ',' + mmbrno + ',' + mlname + ',' + mmfnam + ',' + mmbsex + ',' + mmbidt + ',' + grpid + ',' + plncd + ',' + aprno + ',' + prvno + ',' + psnam + ',' + ptype + ',' + pazip + ',' + pprov + ',' + lineCounter + ',' + ssvdt + ',' + ensvdt + ',' + enplsv + ',' + ensrsn + ',' + becat + ',' + diag1 + ',' + diag2 + ',' + diag3 + ',' + diag4 + ',' + diag5 + ',' + diag6 + ',' + diag7 + ',' + diag8 +
                ',' + diag9 + ',' + diag10 + ',' + endxf1 + ',' + endxf2 + ',' + endxf3 + ',' + endxf4 + ',' + pcdcd + ',' + emod1 + ',' + emod2 + ',' + emod3 + ',' + emod4 + ',' + pcdqt + ',' + pcdqt1 + ',' + bilam + ',' + netam + ',' + alwam + ',' + dctam + ',' + copam + ',' + ncvam + ',' + cobsv + ',' + ncrsn + ',' + revcd + ',' + drgcd + ',' + sprc1 + ',' + sprc2 + ',' + sprc3 + ',' + sprc4 + ',' + sprc5 + ',' + sprc6 + ',' + filetc + ',' + vvndno + ',' + vname + ',' + vadd1 + ',' + vadd2 + ',' + vcity + ',' + vstate + ',' + vzip + ',' + hpatc + ',' + icdver)

        out_f.write(str(newline))

In addition to @Joshua observation about a for loop, you can generalize the algorithm for shorter code. 除了@Joshua关于for循环的观察之外,您还可以将算法推广到更短的代码。 There may be an error in the OP's code as well. OP的代码中也可能有错误。 each_line[33] is skipped. each_line[33]被跳过。 If intentional, the code below accounts for it: 如果有意,请使用下面的代码:

import csv

# starts for each column
cols = (0,1,34,35,43,51,62,64,89,114,122,123,134,159,184,185,193,199,202,211,
        217,232,242,251,252,260,268,276,278,280,283,291,299,307,315,323,331,
        339,347,355,363,364,365,366,367,376,378,380,382,384,387,389,400,411,
        422,433,444,455,466,468,472,475,483,491,499,507,515,523,570,580,610,
        665,720,735,737,746,784,785)

# newline='' per csv documentation.
with open('file.txt') as f, open('file.csv','w',newline='') as out_f:
    writer = csv.writer(out_f)
    for each_line in f:
        line = []
        for i in range(len(cols)-1):
            # compute the slice for each column
            start,end = cols[i],cols[i+1]
            # This may be an error in OP's code, but each_line[33] is skipped.
            if end == 34:
                end = 33
            line.append(each_line[start:end])
        writer.writerow(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM