簡體   English   中英

用Python制作句子中的獨特單詞字典

[英]Making a dictionary of unique words in constitution with their sentences in Python

我試圖制作一個美國憲法中所有唯一詞的字典,並以鍵作為單詞,將值作為該詞所在的句子(可以是多個句子),所以我列出了所有唯一詞,然后我有所有句子的列表,但在遍歷兩個句子以查看句子中是否存在該單詞時遇到麻煩。 我該怎么做呢? 我正在使用python並且有點過初學者水平

非常感謝

在字典中存儲句子的索引會更節省內存,但是根據您想要的數據結構,您可以使用以下內容跳過單詞列表:

word_sentences = {}

for sentence in sentences:
    for word in ' '.split(sentence):
        if not word in word_sentences:
            word_sentences[word] = []

        word_sentences[word].append(sentence)

這是使用正則表達式(正則表達式)的一種方法:

import re

slist=['a cat','a dog','a yacht','Cats and Dogs']
wlist=['cat','dog']

for aword in wlist:
    w=re.compile(".*(" + aword + ").*", re.IGNORECASE)
    print(aword, [i for i,s in enumerate(slist) for m in [w.search(s)] if m])

這將打印:

cat [0, 3]
dog [1, 3]

這實際上聽起來像是一個有趣的項目! 我認為最好的方法是解析並保留一組獨特的句子,同時保留一組獨特的單詞。 檢查行中的注釋以獲取解釋。 我們會刪除標點符號,以免出現任何帶有逗號的尷尬字眼。

import string
from collections import defaultdict

with open('const.txt') as f:
    data = f.readlines()

word_to_sentence_cache = defaultdict(set) # to make sure we don't repeat sentences with the same word multiple times
for line in data:
    cleaned_line = line.translate(None, string.punctuation) # we ignore commas and such when finding words
    words = cleaned_line.split()
    for word in words:
        word_to_sentence_cache[word].add(line)


def print_sentences_from_constitution_with_word(word_to_sentence_cache, word):
    words = word_to_sentence_cache.get(word, None)
    words = [word.rstrip() for word in words] if words != None else 'Not in Constitution'
    print words

print_sentences_from_constitution_with_word(word_to_sentence_cache,'people')

['right of the people to keep and bear Arms, shall not be infringed.', 'The right of the people to be secure in their persons, houses, papers, and', 'of the press; or the right of the people peaceably to assemble, and to petition', 'executive thereof to make temporary appointments until the people fill the', 'State, elected by the people thereof, for six years; and each Senator shall']

print_sentences_from_constitution_with_word(word_to_sentence_cache,'People')#注意大小寫

['Year by the People of the several States, and the Electors in each State shall', 'We the People of the United States, in Order to form a more perfect Union,']

print_sentences_from_constitution_with_word(word_to_sentence_cache,“恐龍”)

Not in Constitution

這是我使用的參考txt文件: https : //www.usconstitution.net/const.txt

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM