簡體   English   中英

如何創建名稱為python中其他數組的單詞的字典?

[英]How to create dictionary with the name being a word from other array in python?

當文件中有新單詞時,我想創建字典,以將文件名和位置存儲在該單詞的字典中。

例如:

file1="This is apple"
file2="This is mango"

字典像:

this={file1:0,file2:0}
is={file1:5,file2:5}
apple={file1:8}
mango={file2:8}

我的代碼檢索單詞:

files=['sample1.txt']
for filename in files:
    file = open(filename, 'r')

    dict={}
    for line in file:
        for word in line.split():
            word_name=word
            if((word_name not in dict.keys())):
                word={}      # here the different dictionaries should be created 
                dict[word_name]=0
            dict[word_name]+=1

在這里,“字典”字典存儲單詞和出現次數。

有什么建議么 ?

如果需要這種結構{word : {file1: count1, file2: count2}}

file1="This is apple"
file2="This is mango"
# you can read from a file incrementally and update the Counter
from collections import Counter
c1 = Counter(file1.split())
c2 = Counter(file2.split())
# do a dict comp
result = {i:{"file1": c1[i], "file2": c2[i]} for i in c1.keys() | c2.keys()}
# see if it worked
In[440]: result
Out[440]:
{'This': {'file1': 1, 'file2': 1},
 'apple': {'file1': 1, 'file2': 0},
 'is': {'file1': 1, 'file2': 1},
 'mango': {'file1': 0, 'file2': 1}}

更新:

如果您想要這種結構{word : {file1: [pos1, pos2...], file2: [pos1, pos2...]}}

import re

from collections import defaultdict
result = defaultdict(lambda: {"file1": [], "file2": []})

for name, f in zip(["file1", "file2"], [file1, file2]):
    ps = [match.start() for match in re.finditer(r"\b\S+\b", f)] 
    for word, p in zip(f.split(), ps):
        result[word][name].append(p)

In [489]: dict(result)
Out[489]:
{'This': {'file1': [0], 'file2': [0]},
 'apple': {'file1': [8], 'file2': []},
 'is': {'file1': [5], 'file2': [5]},
 'mango': {'file1': [], 'file2': [8]}}

我不相信有一種方法可以為每個輸入在每個單詞之后命名實際的字典,但是此代碼應提供所需的輸出(由於是字典,因此未排序)

import re
files="sample1.txt"
handle = open(files)
wordlist=[]
filenum={}

for line in handle:
    line = line.rstrip()
    if not line.startswith("file"): 
        continue
    sent = re.findall('"([^"]*)"',line) #regexp to capture text between quotations
    filenum[(line[:line.find("=")])]=sent[0] #store file numbers (file1, file2) in dictionary with sentence as value
    words=sent[0].split(" ") #collect words in sentence
    for word in words:
        if word not in wordlist: #only add words not already added
            wordlist.append(word)

x=0
for word in wordlist:
    wordpos=dict()
    for k,v in filenum.items():
        if v.find(word)!=-1:
            wordpos[k]=v.find(word, x)
    if (x+len(word)+1)<len(v):
        x=x+len(word)+1
    print word+"="
    print wordpos

這應該產生:

This={'file2': 0, 'file1': 0}
is={'file2': 5, 'file1': 5}
apple={'file1': 8}
mango={'file2': 8}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM