如何按 python 中的字符串中的特定單詞對行進行分組

Question

我在 python 中有一個多行字符串，看起來像這樣

"""1234 dog list some words 1432 cat line 2 1789 cat line3 1348 dog line 4 1678 dog line 5 1733 fish line 6 1093 cat more words"""

我希望能夠按 python 中的動物對特定行進行分組。 所以我的 output 看起來像

dog
1234 dog list some words 
1348 dog line 4
1678 dog line 5

cat
1432 cat line 2 
1789 cat line3 
1093 cat more words

fish
1733 fish line 6

到目前為止，我知道我需要按每一行拆分文本

def parser(txt):
    for line in txt.splitlines():
        print(line)

但我不確定如何繼續。 我如何將每行與動物分組？

Answer 1

您可以使用defaultdict並拆分每一行：

from collections import defaultdict

txt = """123 dog foo
456 cat bar
1234 dog list some words
1348 dog line 4
1432 cat line 2 
1789 cat line3 
1093 cat more words
1678 dog line 5
"""


def parser(txt):
    result = defaultdict(list)
    for line in txt.splitlines():
        num, animal, _ = line.split(' ', 2)  # split the first 2 blancs, skip the rest!
        result[animal].append(line)  # add animal and the whole line into result
    return result

result = parser(txt)
for animal, lines in result.items():
    print('>>> %s' % animal)
    for line in lines:
        print(line)
    print("")

Output：

>>> dog
123 dog foo
1234 dog list some words
1348 dog line 4
1678 dog line 5

>>> cat
456 cat bar
1432 cat line 2 
1789 cat line3 
1093 cat more words

Answer 2

str1 = """1234 dog list some words 1432 cat line 2 1789 cat line3 1348 dog line 4 1678 dog line 5 1733 fish line 6 1093 cat more words"""

animals = ["dog", "cat", "fish"]
tmp = {}
tmp1= []
currentAnimal = ""
listOfWords = str1.split(" ")
for index, line in enumerate(listOfWords, start=1):
    if line in animals:
        currentAnimal = line
        if len(tmp1)>0:
            tmp1.pop()
            if currentAnimal not in tmp.keys():
                tmp[currentAnimal] = []
            tmp[currentAnimal].append(tmp1)
            tmp1=[]
        tmp1 = []
        tmp1.append(listOfWords[index-2])
        tmp1.append(listOfWords[index-1])
    else:
        tmp1.append(listOfWords[index-1])

for eachKey in tmp:
    print eachKey
    listOfStrings = tmp[eachKey]
    for eachItem in listOfStrings:
        if len(eachItem) > 0:
            print (" ").join(eachItem)

OUTPUT：

fish
1678 dog line 5
dog
1789 cat line3
1348 dog line 4
cat
1234 dog list some words
1432 cat line 2
1733 fish line 6

Answer 3

我知道還有其他答案，但我更喜歡我的答案（哈哈哈）。

無論如何，我解析了原始字符串，就好像該字符串沒有\n （換行符）字符一樣。

為了得到動物和句子，我使用了正則表達式：

import re

# original string with no new line characters
txt = """1234 dog list some words 1432 cat line 2 1789 cat line3 1348 dog line 4 1678 dog line 5 1733 fish line 6 1093 cat more words"""

# use findall to capture the groups
groups = re.findall("(?=(\d{4} (\w+) .*?(?=\d{4}|$)))", txt)

此時，我得到了組中的元groups列表：

>>> groups
[('1234 dog list some words ', 'dog'),
 ('1432 cat line 2 ', 'cat'),
 ('1789 cat line3 ', 'cat'),
 ('1348 dog line 4 ', 'dog'),
 ('1678 dog line 5 ', 'dog'),
 ('1733 fish line 6 ', 'fish'),
 ('1093 cat more words', 'cat')]

然后我想把所有提到同一種動物的句子分組。 這就是為什么我創建了一個名為 hash 表（又名字典，在 Python 中）的數據結構：

# create a dictionary to store the formatted data
dct = {}
for group in groups:
    if group[1] in dct:
        dct[group[1]].append(group[0])
    else:
        dct[group[1]] = [group[0]]

dct字典如下所示：

>>> dct
{'dog': ['1234 dog list some words ', '1348 dog line 4 ', '1678 dog line 5 '],
 'cat': ['1432 cat line 2 ', '1789 cat line3 ', '1093 cat more words'],
 'fish': ['1733 fish line 6 ']}

最后，我們只需要以您想要的格式打印它：

# then print the result in the format you like
for key, value in dct.items():
    print(key)
    for sentence in value:
        print(sentence)
    print()

output 是：

dog
1234 dog list some words 
1348 dog line 4 
1678 dog line 5 

cat
1432 cat line 2 
1789 cat line3 
1093 cat more words

fish
1733 fish line 6

最終代碼如下：

import re

# original string with no new line characters
txt = """1234 dog list some words 1432 cat line 2 1789 cat line3 1348 dog line 4 1678 dog line 5 1733 fish line 6 1093 cat more words"""

# use findall to capture the groups
groups = re.findall("(?=(\d{4} (\w+) .*?(?=\d{4}|$)))", txt)

# create a dictionary to store the formatted data
dct = {}
for group in groups:
    if group[1] in dct:
        dct[group[1]].append(group[0])
    else:
        dct[group[1]] = [group[0]]

# then print the result in the format you like
for key, value in dct.items():
    print(key)
    for sentence in value:
        print(sentence)
    print()

如何按 python 中的字符串中的特定單詞對行進行分組

問題描述

3 個解決方案

解決方案1
1 已采納 2019-10-03 20:18:58

解決方案2
1 2019-10-03 21:37:35

解決方案3
1 2019-10-04 11:46:32

如何按 python 中的字符串中的特定單詞對行進行分組

問題描述

3 個解決方案

解決方案1 1 已采納 2019-10-03 20:18:58

解決方案2 1 2019-10-03 21:37:35

解決方案3 1 2019-10-04 11:46:32

解決方案1
1 已采納 2019-10-03 20:18:58

解決方案2
1 2019-10-03 21:37:35

解決方案3
1 2019-10-04 11:46:32