简体   繁体   English

Python 将部分 txt 文件提取到 excel 工作表的脚本

[英]Python Script to extract part of txt file to an excel sheet

I have a txt file that looks like this:我有一个看起来像这样的 txt 文件:


 category test_1

      aaa.com; test info - tw

      bbb.com; test info - al

 category test_2

      ccc.com; test info - al

      ddd.com; test info - tw

      eee.com; test info - tw

 category test_3

      fff.com; test info - tw

      ggg.com; test info - al

      hhh.com; test info - tw

      iii.com; test info - al

I need help creating a Python script that pulls a portion of the txt file and exports it to an excel file.我需要帮助创建一个 Python 脚本,该脚本提取 txt 文件的一部分并将其导出到 excel 文件。 For example, if I want to export the entries in category 'test_1', the script would produce the following output in an excel file.例如,如果我想导出类别“test_1”中的条目,脚本将在 excel 文件中生成以下 output。


      |    A.   |       B.       |   C.  |
   ---------------------------------------
   1. | aaa.com | test info - tw |       |
   ---------------------------------------
   2. | bbb.com | test info - al |       |
   ---------------------------------------
   3. |         |                |       |

I have tried to use the code below我尝试使用下面的代码

My txt file is saved on my desktop as autotest.txt我的 txt 文件作为 autotest.txt 保存在我的桌面上


 import pandas as pd

 df = pd.read_csv(‘C:\Users\A12345\Desktop\autotest.txt’)

 df.to_excel(‘output.xlsx’, ‘Sheet1’)

When I run this code, it doesn't create an excel file.当我运行此代码时,它不会创建 excel 文件。 I've also tried to add an excel file named 'output.xlsx' on my desktop and when I ran the script it didn't add the text to the excel file either.我还尝试在我的桌面上添加一个名为“output.xlsx”的 excel 文件,当我运行脚本时,它也没有将文本添加到 excel 文件中。

I used the module XlsxWriter ;我使用了模块XlsxWriter you can install it with pip3 install XlsxWriter .您可以使用pip3 install XlsxWriter安装它。 The code i wrote works as expected:我编写的代码按预期工作:

import xlsxwriter 

# this is used to filter. The code expect for the category num, such as 1, 2 or 3
num = input('Give me category number: ')
# you can do checks here if input should be something different
num = int(num)

start_portion_line = 'category test_{}'.format(num) 
end_portion_line = 'category test_{}'.format(num + 1) 

start_index = 0
end_index = 0
with open('path/to/your/txt/file', 'r') as f:
    lines = f.readlines()
    # find indexes that define the wanted portion
    for i,line in zip(range(len(lines)), lines):
        if line.strip() == start_portion_line:
            start_index = i
        elif line.strip() == end_portion_line:
            end_index = i - 1
if end_index == 0:
    end_index = len(lines)

# getting only the wanted lines
lines = lines[start_index:end_index]
# removing blank lines
while '\n' in lines:
    lines.remove('\n')

workbook = xlsxwriter.Workbook('output.xlsx')
worksheet = workbook.add_worksheet()
for i,line in zip(range(len(lines)), lines):
    # removing initial spaces
    line = line.strip()
    # separating tokens
    columns = line.split(';')
    # writing
    for col,j in zip(columns, range(len(columns))):
        worksheet.write(i, j, col)

workbook.close()

Its possible to convert that unique format to csv with 'category' as a keyword可以将这种独特的格式转换为 csv 并使用“类别”作为关键字



file=open("text_file.txt",'r')
data=file.read()
categories=data.split('category')#One approach, if a 'category' string is present
dict_format={}
for categor_data in categories:
    items=categor_data.split('\n') #split to lines
    dict_format[items[0].replace(" ", "")]=items[1:]#removes spaces from name of categories
    
for name in dict_format:
    print(name)

print("which category to export to.csv format?")
answer=input()

with open(answer+".csv",'w') as csv:
    for row in dict_format[answer][:-1]:
        if row != "": #if not empty.
            csv.write(row.replace(";",",")+"\n")
            
    csv.write(dict_format[answer][-1].replace(";",","))
    csv.close()

#Now you should be able convert that csv file to xlsx using pandas
    

The Console Window:控制台 Window:

>>>run.py
test_1
test_2
test_3
which category to export to.csv format?
test_1
>>> 

The test_1.csv file looks like in text format: test_1.csv 文件看起来像文本格式:


      aaa.com, test info - tw
      bbb.com, test info - al
 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM