如何創建名稱為python中其他數組的單詞的字典？

Question

當文件中有新單詞時，我想創建字典，以將文件名和位置存儲在該單詞的字典中。

例如：

file1="This is apple"
file2="This is mango"

字典像：

this={file1:0,file2:0}
is={file1:5,file2:5}
apple={file1:8}
mango={file2:8}

我的代碼檢索單詞：

files=['sample1.txt']
for filename in files:
    file = open(filename, 'r')

    dict={}
    for line in file:
        for word in line.split():
            word_name=word
            if((word_name not in dict.keys())):
                word={}      # here the different dictionaries should be created 
                dict[word_name]=0
            dict[word_name]+=1

在這里，“字典”字典存儲單詞和出現次數。

有什么建議么？

Answer 1

如果需要這種結構{word : {file1: count1, file2: count2}} 。

file1="This is apple"
file2="This is mango"
# you can read from a file incrementally and update the Counter
from collections import Counter
c1 = Counter(file1.split())
c2 = Counter(file2.split())
# do a dict comp
result = {i:{"file1": c1[i], "file2": c2[i]} for i in c1.keys() | c2.keys()}
# see if it worked
In[440]: result
Out[440]:
{'This': {'file1': 1, 'file2': 1},
 'apple': {'file1': 1, 'file2': 0},
 'is': {'file1': 1, 'file2': 1},
 'mango': {'file1': 0, 'file2': 1}}

更新：

如果您想要這種結構{word : {file1: [pos1, pos2...], file2: [pos1, pos2...]}} 。

import re

from collections import defaultdict
result = defaultdict(lambda: {"file1": [], "file2": []})

for name, f in zip(["file1", "file2"], [file1, file2]):
    ps = [match.start() for match in re.finditer(r"\b\S+\b", f)] 
    for word, p in zip(f.split(), ps):
        result[word][name].append(p)

In [489]: dict(result)
Out[489]:
{'This': {'file1': [0], 'file2': [0]},
 'apple': {'file1': [8], 'file2': []},
 'is': {'file1': [5], 'file2': [5]},
 'mango': {'file1': [], 'file2': [8]}}

Answer 2

我不相信有一種方法可以為每個輸入在每個單詞之后命名實際的字典，但是此代碼應提供所需的輸出（由於是字典，因此未排序）

import re
files="sample1.txt"
handle = open(files)
wordlist=[]
filenum={}

for line in handle:
    line = line.rstrip()
    if not line.startswith("file"): 
        continue
    sent = re.findall('"([^"]*)"',line) #regexp to capture text between quotations
    filenum[(line[:line.find("=")])]=sent[0] #store file numbers (file1, file2) in dictionary with sentence as value
    words=sent[0].split(" ") #collect words in sentence
    for word in words:
        if word not in wordlist: #only add words not already added
            wordlist.append(word)

x=0
for word in wordlist:
    wordpos=dict()
    for k,v in filenum.items():
        if v.find(word)!=-1:
            wordpos[k]=v.find(word, x)
    if (x+len(word)+1)<len(v):
        x=x+len(word)+1
    print word+"="
    print wordpos

這應該產生：

This={'file2': 0, 'file1': 0}
is={'file2': 5, 'file1': 5}
apple={'file1': 8}
mango={'file2': 8}

如何創建名稱為python中其他數組的單詞的字典？

問題描述

2 個解決方案

解決方案1
0 2016-04-17 13:05:53

更新：

解決方案2
0 2016-04-17 15:44:10

如何創建名稱為python中其他數組的單詞的字典？

問題描述

2 個解決方案

解決方案1 0 2016-04-17 13:05:53

更新：

解決方案2 0 2016-04-17 15:44:10

解決方案1
0 2016-04-17 13:05:53

解決方案2
0 2016-04-17 15:44:10