繁体   English   中英

在python 3中读取Crystal Report .rpt文件并将其转换为.csv或.xlsx

[英]Read and convert crystal report .rpt file to .csv or .xlsx in python 3

我正在尝试在Python3中编写一个小脚本,以打开和读取一堆Crystal Reports .rpt文件,并将它们转换为.csv或.xlsx文件。

我检查了pandas和python3官方文档,但是没有运气。

我在这些文件之一中的Linux机器上运行了文件命令,它给了我以下信息:Composite Document File V2 Document,Little Endian,Os:Windows,版本4.10,代码页:1256,修订号:97

这导致我进入olefile库,并且能够加载内容,但是内容为字节格式。

如果有人能帮助我将字节数据(可能先解码)加载到熊猫并将其保存到可读的csv或xlsx文件中,将不胜感激。

谢谢,问候

我手动制作了一个版本,因为read_fwf和其他转换方式无法读取我的.rpt文件

for file_name in file_names:
    list_all = []
    print('Starting File:', file_name)
    with open(os.path.join(INPUT_PATH, file_name),'r', encoding="utf8") as file:
        i= 0 
        for line in file:
            if i == 1: 
                sizes = re.split(' ',line)
                sizes_ = [len(re.sub('^\-','',x)) for x in sizes]
                break
            i += 1

    with open(os.path.join(INPUT_PATH, file_name), 'r', encoding="utf8") as file:     
        i = 0
        for line in file:
            if i == 0:
                line = re.sub(r'[^\x00-\x7F]+','', line)
            i += 1
            if not line[0:1] in ['Ã','á'] and line[0:3]!='---' and len(line.strip()) > 3 and line[:16] != 'Completion time:':
                grabber = []
                trace = 0
                for dist in sizes_:
                    grabber.append(line[trace:dist+2+trace].strip())
                    trace += dist+2
                list_all.append(grabber)
    headers = ['_'.join(i.split('|')[-1:]) for i in list_all[0]]
    df = pd.DataFrame(list_all[1:], columns=headers)
    new_name = file_name.replace('.rpt','.csv')
    df.to_csv(os.path.join(PROCCESSED_PATH,new_name), index=False)
    print('Outputted File:', new_name)
import sys
import csv
import codecs

with open("holder.csv", "w") as my_empty_csv:
  pass  
#holder.csv is the output file

def convert(inputFile,outputFile):

    writer = csv.writer(outputFile)
    fieldIndexes = []
    headers = ""

    for idx, val in enumerate(inputFile):
        if(idx == 0):
            headers = val
        elif(idx == 1):
            fieldIndexes = list(getFieldIndexes(val," "))
            row = list(getFields(headers,fieldIndexes))
            writer.writerow(row)
        else:
            row = list(getFields(val,fieldIndexes))
            writer.writerow(row)

def getFieldIndexes(input, sep):
    lastIndex = 0
    for idx, c in enumerate(input):
        if(c == sep):
            yield (lastIndex,idx)
            lastIndex = idx+1
    yield lastIndex, len(input)

def getFields(input, indexes):
    for index in indexes:
        yield input[index[0]:index[1]].strip()

if __name__ == '__main__':
    if(len(sys.argv) == 3):
        with open(sys.argv[1],encoding='utf-8-sig') as inputFile:
            with open(sys.argv[2],'w',newline='') as outputFile:
                convert(inputFile,outputFile)
    else:
        print("Usage: rpt2csv.py inputFile outputFile")

来源: https : //github.com/16bytes/rpt2csv.py

您不必转换。 您可以直接将它们读为csv:

import pandas as pd
df = pd.read_csv("yourfile.rpt")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM