简体   繁体   中英

Python Extract Text between two strings into Excel

I have a text file like this

blablablabla
blablablabla
Start
Hello
World
End
blablabla

I want to extract the strings between Start and End and write them into an Excel cell. My code thus far looks like this:

import xlsxwriter
workbook = xlsxwriter.Workbook("Test1.xlsx")
worksheet = workbook.add_worksheet()

flist = open("TextTest.txt").readlines()

parsing = False
for line in flist:

    if line.startswith("End"):
       parsing = False
    if parsing:
       worksheet.write(1,1,line)
    if line.startswith("Start"):
       parsing = True

workbook.close()

However it returns only an empty workbook. What am I doing wrong?

First of all, you seem to be always writing in the line number 1

Second, the numeration starts at 0

With these two small changes, this should do what you want :

parsing = False
linewrite=0

for line in liste:

    if line.startswith("End"):
       parsing = False
    if parsing:
       worksheet.write(linewrite,0,line)
       print line,
       linewrite+=1
    if line.startswith("Start"):
       parsing = True

workbook.close()

The data is being written to the cell, but one problem is that worksheet.write() will overwrite the contents of the cell, so only the last item written will be present.

You can solve this by accumulating the lines between Start and End and then writing them with one worksheet.write() :

import xlsxwriter

workbook = xlsxwriter.Workbook("Test1.xlsx")
worksheet = workbook.add_worksheet()

with open("TextTest.txt") as data:
    lines = []
    for line in data:
        line = line.strip()
        if line == "Start":
           lines = []
        elif line == "End":
            worksheet.write(0, 0, '\n'.join(lines))

workbook.close()

Here lines are accumulated into a list. When an end line is seen the contents of that list are joined with new lines (you could substitute this for another character, eg space) and written to the cell.

I don't have much of a experience with excel stuff in python but you can try openpyxl, I found it much easier to understand.

Solution to your problem:

import openpyxl
wb = openpyxl.Workbook()
destination_filename = "my.xlsx"
ws = wb.active
ws.title = "sheet1"
flist = open("text.txt").readlines()
row = 1
column = 'A'
parsing = False

for i in flist:      

    if i.startswith("End"):
        parsing = False
    if parsing:
        coord = column + str(row)
        ws[coord] = i
        row += 1        
    if i.startswith("Start"):
        parsing = True

wb.save(filename = destination_filename)

Edit(Writing all lines in one cell):

You have to create new variable to which you can add your lines, and at the end you will assign the string variable to cell in worksheet.

String=""
for i in flist:

    if i.startswith("End"):
        parsing = False    
    if parsing:
        i = i.strip("\n")
        String += str(i) + ","
    if i.startswith("Start"):
        parsing = True

ws['A1'] = String
wb.save(filename = destination_filename)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM