簡體   English   中英

將兩個空列表之間的所有列表(字符串列表)合並為 Python 中的一個列表

[英]Combine all lists(list of strings) between two empty lists into one list in Python

我想將兩個空列表之間的所有列表轉換為一個列表。 例子

    []
    ['The', 'tablets', 'are', 'filled', 'into', 'cylindrically', 'shaped', 'bottles', 'made', 'of', 'white', 'coloured']
    ['polyethylene.', 'The', 'volumes', 'of', 'the', 'bottles', 'depend', 'on', 'the', 'tablet', 'strength', 'and', 'amount', 'of']
    ['tablets,', 'ranging', 'from', '20', 'to', '175', 'ml.', 'The', 'screw', 'type', 'cap', 'is', 'made', 'of', 'white', 'coloured']
    ['polypropylene', 'and', 'is', 'equipped', 'with', 'a', 'tamper', 'proof', 'ring.']
    []
    ['PVC/PVDC', 'blister', 'pack']
    []
    ['Blisters', 'are', 'made', 'in', 'a', 'thermo-forming', 'process', 'from', 'a', 'PVC/PVDC', 'base', 'web.', 'Each', 'tablet']
    ['is', 'filled', 'into', 'a', 'separate', 'blister', 'and', 'a', 'lidding', 'foil', 'of', 'aluminium', 'is', 'welded', 'on.', 'The', 'blisters']
    ['are', 'opened', 'by', 'pressing', 'the', 'tablets', 'through', 'the', 'lidding', 'foil.', 'PVDC', 'foil', 'is', 'in', 'contact', 'with']
    ['the', 'tablets.']
    []
    ['Aluminium', 'blister', 'pack']
    []

從這里我想要的第一個列表是:

['The', 'tablets', 'are', 'filled', 'into', 'cylindrically', 'shaped', 'bottles', 'made', 'of', 'white', 'coloured', 'polyethylene.', 'The', 'volumes', 'of', 'the', 'bottles', 'depend', 'on', 'the', 'tablet', 'strength', 'and', 'amount', 'of', 'tablets,', 'ranging', 'from', '20', 'to', '175', 'ml.', 'The', 'screw', 'type', 'cap', 'is', 'made', 'of', 'white', 'coloured','polypropylene', 'and', 'is', 'equipped', 'with', 'a', 'tamper', 'proof', 'ring.'] 

下一個列表變為:

['PVC/PVDC', 'blister', 'pack']

並且模式應該繼續。 到目前為止的代碼:

import csv, re
filepath = r'C:\Users\techj\Music\Data\Tagged\090388 (1.0,CURRENT,LATEST APPROVED.txt)'

with open(filepath) as f:
        content = f.readlines()
#        s = ' '.join(x for x in content if x)
#        print(s)

        for line in content:
            line = line.split()
            print(line)

這可能不是您要查找的內容,但我認為您正在嘗試從文件中讀取段落。 此代碼將為您提供以下段落:

with open(path) as f:
    data=f.read()
paragraphs=data.split("\n\n")

現在,如果您想要每個段落中的單詞,可以將它們按空格分隔:

all_words=[]
for paragraph in paragraphs:
    words=paragraph.split(" ")
    all_words.append(words)
print(all_words)

嘗試這個,

filepath = r'C:\Users\techj\Music\Data\Tagged\090388 (1.0,CURRENT,LATEST APPROVED.txt)'

with open(filepath, 'r') as file:
    _temp = []
    for line in file:
        _line = line.split()
        if _line:
            _temp+=_line
        else:
            print(_temp)
            _temp = []

對於 python 3.8,

with open(filepath, 'r') as file:
    _temp = []
    for line in file: 
        if (_line:=line.split()):
            _temp+=_line
        else:
            print(_temp)
            _temp = []

不確定您的輸入文件是否看起來像您顯示的數據。 如果是這樣,這將產生您正在尋找的 output。

with open(filepath, 'r') as f:
    content = f.readlines()
    temp_line = []
    for line in content:
        line = line.strip("\n[]'").split("', '")
        if len(line[0]) == 0:
            if temp_line:
                print(temp_line)
                temp_line = []
        else:
            temp_line.extend(line)

Output 是:

['The', 'tablets', 'are', 'filled', 'into', 'cylindrically', 'shaped', 'bottles', 'made', 'of', 'white', 'coloured', 'polyethylene.', 'The', 'volumes', 'of', 'the', 'bottles', 'depend', 'on', 'the', 'tablet', 'strength', 'and', 'amount', 'of', 'tablets,', 'ranging', 'from', '20', 'to', '175', 'ml.', 'The', 'screw', 'type', 'cap', 'is', 'made', 'of', 'white', 'coloured', 'polypropylene', 'and', 'is', 'equipped', 'with', 'a', 'tamper', 'proof', 'ring.']
['PVC/PVDC', 'blister', 'pack']
['Blisters', 'are', 'made', 'in', 'a', 'thermo-forming', 'process', 'from', 'a', 'PVC/PVDC', 'base', 'web.', 'Each', 'tablet', 'is', 'filled', 'into', 'a', 'separate', 'blister', 'and', 'a', 'lidding', 'foil', 'of', 'aluminium', 'is', 'welded', 'on.', 'The', 'blisters', 'are', 'opened', 'by', 'pressing', 'the', 'tablets', 'through', 'the', 'lidding', 'foil.', 'PVDC', 'foil', 'is', 'in', 'contact', 'with', 'the', 'tablets.']
['Aluminium', 'blister', 'pack']

如果不是,它只是一個帶有文本的 .txt 文件,邏輯就簡單一點:

    for line in content:
        line = line.split()
        if not line:
            if temp_line:
                print(temp_line)
                temp_line = []
        else:
            temp_line.extend(line)

這可能需要額外的:

if temp_line:
    print(temp_line)

最后,以防您的輸入文件以一行文本結尾。

由於我無權訪問您的文件,但我想測試我的算法,因此我創建了兩個生成器函數,它們會將輸入行生成為字符串列表。 第一個生成器函數基於您的代碼讀取文件並將每一行拆分為字符串列表。 第二個,我用於測試,使用字符串列表的預拆分列表。 您只需要將對 line_producer_2 的調用替換為對line_producer_2的調用, line_producer_1從文件中獲取輸入。

def line_producer_1():
    import csv, re
    filepath = r'C:\Users\techj\Music\Data\Tagged\090388 (1.0,CURRENT,LATEST APPROVED.txt)'

    with open(filepath) as f:
            content = f.readlines()
    #        s = ' '.join(x for x in content if x)
    #        print(s)

            for line in content:
                line = line.split()
                yield line

def line_producer_2():
    lines = [
        [],
        ['The', 'tablets', 'are', 'filled', 'into', 'cylindrically', 'shaped', 'bottles', 'made', 'of', 'white', 'coloured'],
        ['polyethylene.', 'The', 'volumes', 'of', 'the', 'bottles', 'depend', 'on', 'the', 'tablet', 'strength', 'and', 'amount', 'of'],
        ['tablets,', 'ranging', 'from', '20', 'to', '175', 'ml.', 'The', 'screw', 'type', 'cap', 'is', 'made', 'of', 'white', 'coloured'],
        ['polypropylene', 'and', 'is', 'equipped', 'with', 'a', 'tamper', 'proof', 'ring.'],
        [],
        ['PVC/PVDC', 'blister', 'pack'],
        [],
        ['Blisters', 'are', 'made', 'in', 'a', 'thermo-forming', 'process', 'from', 'a', 'PVC/PVDC', 'base', 'web.', 'Each', 'tablet'],
        ['is', 'filled', 'into', 'a', 'separate', 'blister', 'and', 'a', 'lidding', 'foil', 'of', 'aluminium', 'is', 'welded', 'on.', 'The', 'blisters'],
        ['are', 'opened', 'by', 'pressing', 'the', 'tablets', 'through', 'the', 'lidding', 'foil.', 'PVDC', 'foil', 'is', 'in', 'contact', 'with'],
        ['the', 'tablets.'],
        [],
        ['Aluminium', 'blister', 'pack'],
        [],
    ]
    for line in lines:
        yield line

accumulated_lines = []
for line in line_producer_2():
    if line:
        accumulated_lines.extend(line)
    elif accumulated_lines:
        print(accumulated_lines)
        accumulated_lines = []
if accumulated_lines:
    print(accumulated_lines)

印刷:

['The', 'tablets', 'are', 'filled', 'into', 'cylindrically', 'shaped', 'bottles', 'made', 'of', 'white', 'coloured', 'polyethylene.', 'The', 'volumes', 'of', 'the', 'bottles', 'depend', 'on', 'the', 'tablet', 'strength', 'and', 'amount', 'of', 'tablets,', 'ranging', 'from', '20', 'to', '175', 'ml.', 'The', 'screw', 'type', 'cap', 'is', 'made', 'of', 'white', 'coloured', 'polypropylene', 'and', 'is', 'equipped', 'with', 'a', 'tamper', 'proof', 'ring.']
['PVC/PVDC', 'blister', 'pack']
['Blisters', 'are', 'made', 'in', 'a', 'thermo-forming', 'process', 'from', 'a', 'PVC/PVDC', 'base', 'web.', 'Each', 'tablet', 'is', 'filled', 'into', 'a', 'separate', 'blister', 'and', 'a', 'lidding', 'foil', 'of', 'aluminium', 'is', 'welded', 'on.', 'The', 'blisters', 'are', 'opened', 'by', 'pressing', 'the', 'tablets', 'through', 'the', 'lidding', 'foil.', 'PVDC', 'foil', 'is', 'in', 'contact', 'with', 'the', 'tablets.']
['Aluminium', 'blister', 'pack']

看演示

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM