简体   繁体   中英

Python dict from mobypos.txt file

I have a file from the Moby Project that pairs words with one or more letters indicating their part of speech. For example:

hemoglobin\N
hemogram\N
hemoid\A
hemolysin\N
hemolysis\N
hemolytic\A
hemophile\NA
hemophiliac\N

Hemoglobin is a noun, hemoid is an adjective, and hemophile can be used as a noun or an adjective.

I have created a dict from this file that pairs a word with the letters indicating its part of speech using the following code:

mm = open("mobypos.txt").readlines()
pairs = []
for x in mm:
    pairs.append(x.split("\\"))
posdict = dict(pairs)

This works successfully. What I want to do is generate lists called nouns , verbs , adjectives , etc that contain all the words of this part of speech. What is the fastest way to do this, given that len(posdict.keys()) returns 233340

You can use a generator expression to get an iterator of relative words :

nouns = (w for w,type in posdict.iteritems() if type=='N')

But note that since iterators are one shot iterables , when you just want to iterate over them and don't want to get the specific items or use some functions like len its better to use them, which are very optimized in terms of memory use. but if you want to use them a lot of times you better to use a list comprehension.

nouns = [w for w,type in posdict.iteritems() if type=='N']

You can use list comprehension

nouns = [word, type in posdict.iteritems() if 'N' in type]

adjs = [word, type in posdict.iteritems() if 'A' in type]

verbs = [word, type in posdict.iteritems() if 'V' in type]

The use of the in operator in the if clause will place words with multiple types accordingly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM