简体   繁体   English

txt文件到python的excel转换

[英]Txt file to excel conversion in python

I'm trying to convert text file to excel sheet in python. 我正在尝试将文本文件转换为python中的excel表。 The txt file contains data in the below specified formart txt文件包含以下指定的formart中的数据

样本数据

Column names: reg no, zip code, loc id, emp id, lastname, first name. 列名:reg号,邮政编码,loc id,emp id,姓氏,名。 Each record has one or more error numbers. 每条记录都有一个或多个错误号。 Each record have their column names listed above the values. 每个记录的列名都列在值的上方。 I would like to create an excel sheet containing reg no, firstname, lastname and errors listed in separate rows for each record. 我想创建一个包含工作表编号,名字,姓氏和错误的Excel工作表,这些记录在每条记录的单独行中列出。

How can I put the records in excel sheet ? 如何将记录放入Excel工作表中? Should I be using regular expressions ? 我应该使用正则表达式吗? And how can I insert error numbers in different rows for that corresponding record? 我该如何在相应记录的不同行中插入错误号?

Expected output: 预期产量:

在此处输入图片说明

Here is the link to the input file: https://github.com/trEaSRE124/Text_Excel_python/blob/master/new.txt 这是输入文件的链接: https : //github.com/trEaSRE124/Text_Excel_python/blob/master/new.txt

Any code snippets or suggestions are kindly appreciated. 任何代码片段或建议,请多多关照。

Here is a draft code. 这是草稿代码。 Let me know if any changes needed: 让我知道是否需要任何更改:

# import pandas as pd
from collections import OrderedDict
from datetime import date
import csv

with open('in.txt') as f:
    with open('out.csv', 'wb') as csvfile:
        spamwriter = csv.writer(csvfile, delimiter=',', quoting=csv.QUOTE_MINIMAL)
        #Remove inital clutter
        while("INPUT DATA" not in f.readline()):
            continue 

        header = ["REG NO", "ZIP CODE", "LOC ID", "EMP ID", "LASTNAME", "FIRSTNAME", "ERROR"]; data = list(); errors = list()
        spamwriter.writerow(header)
        print header

        while(True):
            line = f.readline()
            errors = list()
            if("END" in line):
                exit()
            try:
                int(line.split()[0])
                data = line.strip().split()
                f.readline() # get rid of \n
                line = f.readline()
                while("ERROR" in line):
                    errors.append(line.strip())
                    line = f.readline()
                spamwriter.writerow(data + errors)
                spamwriter.flush() 
            except:
                continue


        # while(True):
            # line = f.readline()

Use python-2 to run. 使用python-2来运行。 The errors are appended as subsequent columns. 错误将作为后续列添加。 It's slightly complicated the way you want it. 您想要的方式有点复杂。 I can fix it if still needed 如果仍然需要,我可以修复它

Output looks like: 输出如下: 在此处输入图片说明

You can do this using the openpyxl library which is capable of depositing items directly into a spreadsheet. 您可以使用openpyxl库来执行此操作,该库能够将项目直接存入电子表格中。 This code shows how to do that for your particular situation. 这段代码显示了如何针对您的特定情况执行此操作。

NEW_PERSON, ERROR_LINE = 1,2
def Line_items():
    with open('katherine.txt') as katherine:
        for line in katherine:
            line = line.strip()
            if not line:
                continue
            items = line.split()
            if items[0].isnumeric():
                yield NEW_PERSON, items
            elif items[:2] == ['ERROR', 'NUM']:
                yield ERROR_LINE, line
            else:
                continue

from openpyxl import Workbook
wb = Workbook()
ws = wb.active

ws['A2'] = 'REG NO'
ws['B2'] = 'LASTNAME'
ws['C2'] = 'FIRSTNAME'
ws['D2'] = 'ERROR'

row = 2
for kind, data in Line_items():
    if kind == NEW_PERSON:
        row += 2
        ws['A{:d}'.format(row)] = int(data[0])
        ws['B{:d}'.format(row)] = data[-2]
        ws['C{:d}'.format(row)] = data[-1]
        first = True
    else:
        if first:
            first = False
        else:
            row += 1
        ws['D{:d}'.format(row)] = data

wb.save(filename='katherine.xlsx')

This is a screen snapshot of the result. 这是结果的屏幕快照。

电子表格

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM