简体   繁体   中英

Adding words from a file to a dictionary

I want to add each word from a text file to a dictionary, how do I do this?

(I have a file 'words.txt', I have opened and read the file and the list of words is in the variable "lines" below)

d = {}

for i in lines:
    for word in i.split():
        d[???] = word

What code do I put where the '???' is?

I basically want the dictionary to look like this:

{0: firstword, 1: secondword, 2: thirdword, 3: fourthword...}

I figured that getting the index position of each word in the list could work but I'm not exactly sure how to do this.

It doesn't seem too complicated to do but I'm stuck.

say you have a variable words having list of words ['firstword', 'secondword', 'thirdword', 'fourthword']

so your code would be like:

d = {}
for k, v in enumerate(words):
    d[k] = v

You can keep track of the "current index" in a separate variable c and use that as the value for the word in your dictionary:

d = {}
c = 0

for i in lines:
    for word in i.split():
        d[word] = c
        c += 1

Note that here the dictionary will store the highest index of the duplicated word.

Each line overwrites the line before it in your dictionary. But you can work around that like:

d = {}
k = 0
for i in lines:
    for word in i.split():
        d[str(k)] = word
        k = k + 1

Why are you using dictionary for this? Dictionaries are useful when they are used with keys with meanings. You could've just used a list for this task.

Also, you can increase the performance by preallocating your list and then fill it with your algorithm.

There are many answers questioning why you need to do this which is valid, however I'll try and answer the direct question. Also, I think dealing with duplicates is necessary. The lower index(first time word is seen) takes precedence...which is an assumption on my part, but it makes sense considering your question.

#first populate a word:index dictionary
#ensure duplicates don't overwrite...for this use "in" which is fast
d1 = {}
ix = 0
for i in lines:
    for word in i.split():
        if word not in d1:
            #only add word to the dict if it is NOT already in (addressing duplicates)
            d1[word] = ix
            ix += 1

#now "reverse" the dict
d = {}  #new dict
for word in d1:
    d[d1[word]] = word

now you have a dict word:index with unique words+index

First open a file and write some lines.

fname = 'textfile.txt'
with open(fname, 'w') as textfile:
    textfile.write('zero one two three four five\n')
    textfile.write('six seven eight nine ten')

Enumerate through the words in whichever fashion you desire. If you use a generator expression it works nicely with a dict comprehension.

word_positions = {}
with open(fname, 'r') as textfile:
    words = (word for line in textfile.readlines() for word in line.split())
    word_positions = {i: word for i, word in enumerate(words)}

This yields,

word_positions

{0: 'zero',
 1: 'one',
 2: 'two',
 3: 'three',
 4: 'four',
 5: 'five',
 6: 'six',
 7: 'seven',
 8: 'eight',
 9: 'nine',
 10: 'ten'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM