简体   繁体   中英

How to create dictionary from multiple list of string?

I want to create a dictionary from list of strings. For example I have these list

AAAA
AAAA
AAAA
BBBB
BBBB
CCCC
CCCC
CCCC
....

Then I want to create a dictionary with numbering value from that, how to do that?

I explored some code but still have no idea

import os
path = "directoryA"
dirList = os.listdir(path)


with open("check.txt", "w") as a:
    for path, subdirs, files in os.walk(path):
        for filename in files:
            # I have splitted the text and now I want to create dictionary 
            #from it

            mylist = filename.split("_") # the text format is AAAA_0 and I split 
                                         #it so I can have list of 'AAAA' and '0'

            k = mylist[0] #I only take 'AAAA' string after splitting
            print(k) # here the output only give text output. From this I want to 
                     # put into dictionary            

This is the output after print(k) and these are not list

AAAA
AAAA
AAAA
BBBB
BBBB
CCCC
CCCC
CCCC
....

This is my expected result

myDic ={
    'AAAA': 0,
    'BBBB': 1,
    'CCCC': 2,
    'DDDD': 3,
    # ... and so on
}

Assuming the contents of check.txt looks like li , start by getting all unique elements in your list of strings by using a set, and then sort the unique list alphabetically

After that, use dictionary comprehension and enumerate to generate your dictionary

li = [
    "AAAA",
    "AAAA",
    "AAAA",
    "BBBB",
    "BBBB",
    "CCCC",
    "CCCC",
    "CCCC"]

#Get the list of unique strings by converting to a set
li = (list(set(li)))

#Sort the list lexicographically
li = sorted(li)

#Create your dictionary via dictionary comprehension and enumerate
dct =  {item:idx for idx, item in enumerate(li)}
print(dct)

The output will be

{'AAAA': 0, 'BBBB': 1, 'CCCC': 2}

We should be able to create the list of strings li like so

import os
path = "directoryA"
dirList = os.listdir(path)
li = []

with open("check.txt", "w") as a:
    for path, subdirs, files in os.walk(path):
        for filename in files:
            # I have splitted the text and now I want to create dictionary 
            #from it

            mylist = filename.split("_") # the text format is AAAA_0 and I split 
                                         #it so I can have list of 'AAAA' and '0'

            k = mylist[0] 
            #append item to li
            li.append(k) 

You can use itertools.groupby to group the strings assuming they are sorted as you have them (it not, sort them first). Then enumerate() over the groups which will give you the count:

from itertools import groupby
l = [
    "AAAA", 
    "AAAA", 
    "AAAA", 
    "BBBB",
    "BBBB",
    "CCCC",
    "CCCC",
    "CCCC"]

d = {key:i for i, (key, group) in enumerate(groupby(l))}
# {'AAAA': 0, 'BBBB': 1, 'CCCC': 2}

If you are reading from a file and the strings are not sorted, you can add an entry and increment each time you find something not yet in the dict. The values will be sorted based on the first time you see a given string. For example:

from itertools import count, filterfalse

i = count(1)
d = {}

with open('test.txt') as f:
    for line in filterfalse(lambda l: l.strip() in d, f):
        d[line.strip()] = next(i)

You can use dict.fromkeys() to build a dict and count() to fill values:

from itertools import count

lst = ["AAAA", "AAAA", "AAAA", "BBBB", "BBBB", "CCCC", "CCCC", "CCCC"]

dct = dict.fromkeys(lst)
c = count()

for key in dct:
    dct[key] = next(c)

print(dct)
# {'AAAA': 0, 'BBBB': 1, 'CCCC': 2}

Assuming keys of dictionary are :

keys = ['A', 'B', 'C']

Then:

id = range(len(keys))
d = dict(zip(keys, id))

I would do it following way:

data = ['A','A','A','B','B','C','C','D','C']
unique = [i for inx,i in enumerate(data) if data.index(i)==inx]
print(unique) # ['A', 'B', 'C', 'D']
d = {(i,inx) for inx,i in enumerate(unique)}
print(d) # {('D', 3), ('A', 0), ('B', 1), ('C', 2)}

Idea behind this method might be described as: get value from list only if it occurs first time (same value did not appear earlier). I utilized fact that .index method of list , always returns lowest value possible. Note that in this method same values do not have to be neighbors.

first you have to remove duplicates based on this answer: How do you remove duplicates from a list whilst preserving order?

so it will be like this:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

l = [
"AAAA", 
"AAAA", 
"AAAA", 
"BBBB",
"BBBB",
"CCCC",
"CCCC",
"CCCC"]

#first remove duplicates
s = f7(l)

#create desired dict
dict(zip(s,range(len(s))))
#{'AAAA': 0, 'CCCC': 1, 'BBBB': 2}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM