[英]Finding a word following two input words from txt file in Python
我正在制作一個字典,其中鍵是 txt 文件中兩個連續單詞的元組,每個鍵的值是直接在鍵后面找到的單詞列表。 例如,
>>> with open('alice.txt') as f:
... d = associated_words(f)
>>> d[('among', 'the')]
>>> ['people', 'party.', 'trees,', 'distant', 'leaves,', 'trees', 'branches,', 'bright']
到目前為止,我的代碼如下,但尚未完成。 有人可以幫忙嗎?
def associated_words(f):
from collections import defaultdict
d = defaultdict(list)
with open('alice.txt', 'r') as f:
lines = f.read().replace('\n', '')
a, b, c = [], [], []
lines.replace(",", "").replace(".", "")
lines = line.split(" ")
for (i, word) in enumerate(lines):
d['something to replace'].append(lines[i+2])
像這樣的東西? (應該很容易適應......)
from pathlib import Path
from collections import defaultdict
DATA_PATH = Path(__file__).parent / '../data/alice.txt'
def next_word(fh):
'''
a generator that returns the next word from the file; with special
characters removed; lower case.
'''
transtab = str.maketrans(',.`:;()?!—', ' ') # replace unwanted chars
for line in fh.readlines():
for word in line.translate(transtab).split():
yield word.lower()
def handle_triplet(dct, triplet):
'''
add a triplet to the dictionary dct
'''
dct[(triplet[0], triplet[1])].append(triplet[2])
dct = defaultdict(list) # dictionary that defaults to []
with DATA_PATH.open('r') as fh:
generator = next_word(fh)
triplet = (next(generator), next(generator), next(generator))
handle_triplet(dct, triplet)
for word in generator:
triplet = (triplet[1], triplet[2], word)
handle_triplet(dct, triplet)
print(dct)
輸出(摘錄...;不在整個文本上運行)
defaultdict(<class 'list'>, {
('enough', 'under'): ['her'], ('rattle', 'of'): ['the'],
('suppose', 'they'): ['are'], ('flung', 'down'): ['his'],
('make', 'with'): ['the'], ('ring', 'and'): ['begged'],
('taken', 'his'): ['watch'], ('could', 'show'): ['you'],
('said', 'tossing'): ['his'], ('a', 'bottle'): ['marked', 'they'],
('dead', 'silence'): ['instantly', 'alice', "'it's"], ...
假設你的文件看起來像這樣
each them theirs tree life what not hope
代碼:
lines = [line.strip().split(' ') for line in open('test.txt')]
d = {}
for each in lines:
d[(each[0],each[1])] = each[2:]
print d
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.