简体   繁体   中英

How to create a nested python dictionary with keys as strings?

Summary of issue: I'm trying to create a nested Python dictionary, with keys defined by pre-defined variables and strings. And I'm populating the dictionary from regular expressions outputs. This mostly works. But I'm getting an error because the nested dictionary - not the main one - doesn't like having the key set to a string, it wants an integer. This is confusing me. So I'd like to ask you guys how I can get a nested python dictionary with string keys.

Below I'll walk you through the steps of what I've done. What is working, and what isn't. Starting from the top:

# Regular expressions module
import re

# Read text data from a file
file = open("dt.cc", "r")
dtcc = file.read()

# Create a list of stations from regular expression matches
stations = sorted(set(re.findall(r"\n(\w+)\s", dtcc)))

The result is good, and is as something like this: stations = ['AAAA','BBBB','CCCC','DDDD']

# Initialize a new dictionary
rows = {}

# Loop over each station in the station list, and start populating 
for station in stations:
    rows[station] = re.findall("%s\s(.+)" %station, dtcc)

The result is good, and is something like this: rows['AAAA'] = ['AAAA 0.1132 0.32 P',...]

However, when I try to create a sub-dictionary with a string key:

for station in stations:
    rows[station] = re.findall("%s\s(.+)" %station, dtcc)
    rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)

I get the following error.

"TypeError: list indices must be integers, not str"

It doesn't seem to like that I'm specifying the second dictionary key as "dt". If I give it a number instead, it works just fine. But then my dictionary key name is a number, which isn't very descriptive.

Any thoughts on how to get this working?

The issue is that by doing

rows[station] = re.findall(...)

You are creating a dictionary with the station names as keys and the return value of re.findall method as values, which happen to be lists. So by calling them again by

rows[station]["dt"] = re.findall(...)

on the LHS row[station] is a list that is indexed by integers, which is what the TypeError is complaining about. You could do rows[station][0] for example, you would get the first match from the regex. You said you want a nested dictionary. You could do

rows[station] = dict()
rows[station]["dt"] = re.findall(...)

To make it a bit nicer, a data structure that you could use instead is a defaultdict from the collections module.

The defaultdict is a dictionary that accepts a default type as a type for its values. You enter the type constructor as its argument. For example dictlist = defaultdict(list) defines a dictionary that has as values lists! Then immediately doing dictlist[key].append(item1) is legal as the list is automatically created when setting the key.

In your case you could do

from collections import defaultdict

rows = defaultdict(dict)

for station in stations:
    rows[station]["bulk"] = re.findall("%s\s(.+)" %station, dtcc)
    rows[station]["dt"] = re.findall("%s\s(\S+)" %station, dtcc)

Where you have to assign the first regex result to a new key, "bulk" here but you can call it whatever you like. Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM