简体   繁体   中英

String replacement loop using Python dictionary

Can someone suggest the best way to do an iterative string replacement from a dictionary?

I'm going to go down a column of addresses, which look like:

Address1=" 122 S 102 ct,"

I have my conversion logic as:

CT=['ct','ct,','ct.','court']
DR=['drive,','drive.','drive','driv','dr,','dr.','dr']
dictionary={"CT":CT, "DR":DR}

How should I search for all dictionary values within Address1 and replace them with the corresponding key?

The goal is for " 122 S 102 ct," to become " 122 S 102 CT" , etc.

I can't quite get the syntax to replace with the corresponding key.

did you try?:

Splits = Address1.split("")
for i in Splits:
    if i in CT:
        i = 'CT'
    if i in DR:
        i = 'DR'

print(" ".join(Splits))  # " " will keep the spacing between words

Here is a sketch of a solution.

With Original Approach

You should use a dictionary from string to list of string, eg

conversions = {
    'CT': [ 'ct', 'ct,' 'ct.', 'court' ],
    'DR': [ 'drive', 'drive.', 'driv', 'dr', 'dr.' ]
}

Now, you can step through each word in the input, and replace it:

def get_transformed_address(input):
    result = ''
    for word in input.split(' ')
        result += ' ' + maybe_convert(word)

    return result

Where maybe_convert() is:

def maybe_convert(phrase):
    for canonical, representations in conversions.items():
        if representations.contains(phrase):
            return canonical

    # default is pass-through of input
    return phrase

With Regex

Probably a cleaner solution is to just use a map of replacement regexes on the input. eg

conversions = {
    '/court_pattern_here/': 'CT',
    '/drive_pattern_here/': 'DR'
}

and then:

for regex, replacement in conversions.items():
    input = input.replace(regex, replacement)

You can prebuild inverse dictionary with an activestate dictionary inversion snippet

http://code.activestate.com/recipes/415100-invert-a-dictionary-where-values-are-lists-one-lin/

def invert(d):
   return dict( (v,k) for k in d for v in d[k] ) 

This is a sample that might help. Your mileage may vary.

CT=['ct','ct,','ct.','court']
DR=['drive,','drive.','drive','driv','dr,','dr.','dr']
dictionary={"CT":CT, "DR":DR}
address1 =' 122 S 102 ct,'

We start by looking at each key and matching value (ie your list of elements). We then iterate over each element in the value and check to see if the element is present. If yes... we then use the replacement method to replace the offending element with the key from the dictionary.

for key, value in dictionary.items():
    for element in value:
        if element in address1:
            address_new = address1.replace(element, key)
print(address_new) 
from string import punctuation

def transform_input(column):
  words = column.rstrip(punctuation).split()
  for key, values in conversions.items():
      for ind, word in enumerate(words):
          if word in values:
            words[ind] = key
  return ' '.join(words)


Address1=" 122 S 102 ct,"

conversions = {
    'CT': [ 'ct', 'ct,' 'ct.', 'court' ],
    'DR': [ 'drive', 'drive.', 'driv', 'dr', 'dr.' ]
}

print(transform_input(Address1)) # 122 S 102 CT

Thank you to everyone for the help. Here is what I ended up with.

    import pandas as pd
    import re 

    inputinfo="C:\\path"
    data=pd.DataFrame(pd.read_excel(inputinfo,parse_cols ="A",converters={"A":str}))

    TRL=['trl']
    WAY=['wy']                                                                 #do before HWY
    HWY=['hwy','hy']
    PATH=['path','pth']
    LN=['lane','ln.','ln']
    AVE=['avenue','ave.','av']
    CIR=['circle','circ.','cir']
    TER=['terrace','terace','te']
    CT=['ct','ct,','ct.','court']
    PL=['place','plc','pl.','pl']
    CSWY=['causeway','cswy','csw']
    PKWY=['parkway','pkway','pkwy','prkw']
    DR=['drive,','drive.','drive','driv','dr,','dr.','dr']
    PSGE=['passageway','passage','pasage','pass.','pass','pas']
    BLVD=['boulevard','boulevar','blvd.','blv.','blvb','blvd','boul','bvld','bl.','blv','bl']

    regex=r'(\d)(th)|(\d)(nd)|(3)(rd)|(1)(st)'
    Lambda= lambda m: m.group(1) if m.group(1) else m.group(3) if m.group(3) else m.group(5) if m.group(5)else m.group(7) if m.group(7) else ''
# the above takes care of situations like "123 153*rd* st"

    for row in range(0,data.shape[0]):
            String = re.sub(regex,Lambda,str(data.loc[row,"Street Name"]))
            Splits = String.split(" ")
            print (str(row)+" of "+str(data.shape[0]))
            for i in Splits:
                    ind=Splits.index(i)
                    if i in AVE:
                            Splits[ind]="AVE"
                    if i in TRL:
                            Splits[ind]="TRL"
                    if i in WAY:
                            Splits[ind]="WAY"
                    if i in HWY:
                            Splits[ind]="HWY"
                    if i in PATH:
                            Splits[ind]="PATH"
                    if i in TER:
                            Splits[ind]="TER"
                    if i in LN:
                            Splits[ind]="LN"
                    if i in CIR:
                            Splits[ind]="CIR"
                    if i in CT:
                            Splits[ind]="CT"
                    if i in PL:
                            Splits[ind]="PL"
                    if i in CSWY:
                            Splits[ind]="CSWY"
                    if i in PKWY:
                            Splits[ind]="PKWY"
                    if i in DR:
                            Splits[ind]="DR"
                    if i in PSGE:
                            Splits[ind]="PSGE"
                    if i in BLVD:
                            Splits[ind]="BLVD"  
            data.loc[row,"Street Name Modified"]=(" ".join(Splits))

    data.to_csv("C:\\path\\StreetnameSample_output.csv",encoding='utf-8')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM