简体   繁体   中英

How do I remove special characters from the values in a nested dictionary

I have a dictionary like below:

{
'file1.txt': {'address': [],  'ORG': []},
'file2.txt': {'address': [],  'ORG': ['DEF Pvt. Ltd','One Solutions (Asia) Limited' ]}
}

I need to remove the special characters from the 'ORG' key.

I know for a normal dictionary we can do {key.strip(): item.strip() for key, item in my_dict.items()}

but I'm not sure how to do it for a nested one, any ideas?

You could do:

for key, value in my_dict.items():
    my_dict['key'] = [char for char not in list_special]

Here, list_special is a list of special characters.

if it is a list in the dictionary, you would have to add a nested for loop.

According to the unknown number of elements in the ORG key, I highly prefer to use two for loops and regex.

So, you need to import the regex library and then use re.sub which can be used to delete specific characters.

The regex I used in this question is:

[^a-zA-Z\d\s:]

The whole script:

import re
myDict = {
'file1.txt': {'address': [],  'ORG': []},
'file2.txt': {'address': [],  'ORG': ['DEF Pvt. Ltd','One Solutions (Asia) Limited' ]}
}
for key, item in myDict.items():
  tempList = myDict[key]["ORG"]
  for index, value in enumerate(tempList):
    tempList[index] = re.sub(r"[^a-zA-Z\d\s:]", "", value)
  myDict[key]["ORG"] = tempList
print(myDict)

Output :

{
   'file1.txt': {'address': [], 'ORG': []}, 
   'file2.txt': {'address': [], 'ORG': ['DEF Pvt Ltd', 'One Solutions Asia Limited']}
}

According to my comment before, here is a working solution regardless the depth of the nested dictionary and without any hardcoded key names:

Original code

Modified code:

import re

def change_dict_naming_convention(d, convert_function):
    """
    Convert a nested dictionary from one convention to another.
    Args:
        d (dict): dictionary (nested or not) to be converted.
        convert_function (func): function that takes the string in one convention and returns it in the other one.
    Returns:
        Dictionary with the new keys.
    """

    if isinstance(d, dict):
      new = {}
      for k, v in d.items():
          new_v = v
          if isinstance(v, dict):
              new_v = change_dict_naming_convention(v, convert_function)
          elif isinstance(v, list):
              new_v = list()
              for x in v:
                  new_v.append(change_dict_naming_convention(x, convert_function))
          elif isinstance(v, str):
            new_v = convert_function(v)
          new[convert_function(k)] = new_v
    elif isinstance(d, str):
      new = convert_function(d)
    else:
      new = d

    return new


def convert_function(value):
  return re.sub(r"[^a-zA-Z\d\s\.:]", "", value)


orignial_dict: dict = {
'file1.txt': {'address': {'street': 'Any Street No. 22 (a)',
                          'zipCode': 1234,
                          'city': 'AnyCity (Asia)'},  
              'ORG': ['123 (DEF) #Pvt. -Ltd','One Solutions (Asia) Limited']},
'file2.txt': {'address': [],  
              'ORG': ['DEF Pvt. Ltd','One Solutions (Asia) Limited' ]}
}

new_dict: dict = change_dict_naming_convention(orignial_dict, convert_function)

print(new_dict)

Output:

{'file1.txt': {'address': {'street': 'Any Street No. 22 a', 'zipCode': 1234, 'city': 'AnyCity Asia'}, 'ORG': ['123 DEF Pvt. Ltd', 'One Solutions Asia Limited']}, 
 'file2.txt': {'address': [], 'ORG': ['DEF Pvt. Ltd', 'One Solutions Asia Limited']}}

I hope this helps you ;)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM