简体   繁体   English

忽略从 txt 文件到 python 中的 excel 文件的一行

[英]disregard a line from txt file to excel file in python

I need to disregard lines when someone enters and leaves the channel as you can see in the below txt file.正如您在下面的 txt 文件中看到的那样,当有人进入和离开频道时,我需要忽略线路。

7:52:01 AM sherr entered the channel
7:52:05 AM sherr
hello GOOD morning .
おはようございます。
7:52:09 AM sherr
Who ?
誰?
7:52:16 AM sherr
OK .
わかりました。
7:52:25 AM sherr left the channel.
7:52:32 AM gigi entered the channel
7:52:45 AM gigi
OK .
わかりました。

my code is supposed to output excel file that in every 3lines, they're in the same row and 1st line in 1st column, 2nd line in 2nd column and 3rd line in 3rd column.我的代码应该是 output excel 文件,在每 3 行中,它们在同一行和第一列中的第一行,第二列中的第二行和第三列中的第三行。 But I need to disregard the lines that has entered the channel and left the channel on it.但是我需要忽略已经进入频道并离开频道的线路。 what should I add?我应该添加什么? my codes looks like that as you can see below.我的代码看起来像您在下面看到的那样。

from openpyxl import Workbook
import copy

wb = Workbook()

with open('txtfile.txt', encoding='utf-8') as sherr:
    row = 1
    column = 1
    ws = wb.active
    for line in sherr:
        if column == 1:
            ## split the line and rejoin
            value = " ".join(line.strip().split(' ')[1:])
        else:
            value = line.strip()
            
        ws.cell(row=row, column=column, value=value)
        
        if (column := column + 1) > 3:
            row += 1
            column = 1
 
    for row in ws.iter_rows():
        for cell in row:      
            alignment = copy.copy(cell.alignment)
            alignment.wrapText=True
            cell.alignment = alignment
            
    for column_cells in ws.columns:
        length = max(len(str(cell.value)) for cell in column_cells)
        ws.column_dimensions[column_cells[0].column_letter].width = length

    wb.save('txt_to_exl.xlsx')

You can use a list comprehension before your first for loop to filter out the unwanted lines.您可以在第一个for循环之前使用列表理解来过滤掉不需要的行。

with open('txtfile.txt', encoding='utf-8') as sherr:
    row = 1
    column = 1
    ws = wb.active

    lines = [line for line in sherr.readlines()
            if 'entered the channel' not in line
            and 'left the channel' not in line]

    for line in lines:

Note that I changed sherr to lines in at the start of the for loop.请注意,我在for循环的开头将sherr更改为lines in。

The condition above is that lines should be disregarding if they contain the strings 'entered the channel' or 'left the channel', but notice that this could cause unwanted behaviour if somebody happened to include those strings as part of the conversation.上面的条件是,如果线路包含字符串“进入频道”或“离开频道”,则应该忽略它们,但请注意,如果有人碰巧将这些字符串作为对话的一部分,这可能会导致不需要的行为。 You could make this a bit more robust if necessary by checking that the line also begins with a number, etc., or using regular expressions.如有必要,您可以通过检查该行是否也以数字等开头或使用正则表达式来使其更加健壮。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM