Can someone suggest the best way to do an iterative string replacement from a dictionary?
I'm going to go down a column of addresses, which look like:
Address1=" 122 S 102 ct,"
I have my conversion logic as:
CT=['ct','ct,','ct.','court']
DR=['drive,','drive.','drive','driv','dr,','dr.','dr']
dictionary={"CT":CT, "DR":DR}
How should I search for all dictionary values within Address1
and replace them with the corresponding key?
The goal is for " 122 S 102 ct,"
to become " 122 S 102 CT"
, etc.
I can't quite get the syntax to replace with the corresponding key.
did you try?:
Splits = Address1.split("")
for i in Splits:
if i in CT:
i = 'CT'
if i in DR:
i = 'DR'
print(" ".join(Splits)) # " " will keep the spacing between words
Here is a sketch of a solution.
You should use a dictionary from string to list of string, eg
conversions = {
'CT': [ 'ct', 'ct,' 'ct.', 'court' ],
'DR': [ 'drive', 'drive.', 'driv', 'dr', 'dr.' ]
}
Now, you can step through each word in the input, and replace it:
def get_transformed_address(input):
result = ''
for word in input.split(' ')
result += ' ' + maybe_convert(word)
return result
Where maybe_convert()
is:
def maybe_convert(phrase):
for canonical, representations in conversions.items():
if representations.contains(phrase):
return canonical
# default is pass-through of input
return phrase
Probably a cleaner solution is to just use a map of replacement regexes on the input. eg
conversions = {
'/court_pattern_here/': 'CT',
'/drive_pattern_here/': 'DR'
}
and then:
for regex, replacement in conversions.items():
input = input.replace(regex, replacement)
You can prebuild inverse dictionary with an activestate dictionary inversion snippet
http://code.activestate.com/recipes/415100-invert-a-dictionary-where-values-are-lists-one-lin/
def invert(d):
return dict( (v,k) for k in d for v in d[k] )
This is a sample that might help. Your mileage may vary.
CT=['ct','ct,','ct.','court']
DR=['drive,','drive.','drive','driv','dr,','dr.','dr']
dictionary={"CT":CT, "DR":DR}
address1 =' 122 S 102 ct,'
We start by looking at each key and matching value (ie your list of elements). We then iterate over each element in the value and check to see if the element is present. If yes... we then use the replacement method to replace the offending element with the key from the dictionary.
for key, value in dictionary.items():
for element in value:
if element in address1:
address_new = address1.replace(element, key)
print(address_new)
from string import punctuation
def transform_input(column):
words = column.rstrip(punctuation).split()
for key, values in conversions.items():
for ind, word in enumerate(words):
if word in values:
words[ind] = key
return ' '.join(words)
Address1=" 122 S 102 ct,"
conversions = {
'CT': [ 'ct', 'ct,' 'ct.', 'court' ],
'DR': [ 'drive', 'drive.', 'driv', 'dr', 'dr.' ]
}
print(transform_input(Address1)) # 122 S 102 CT
Thank you to everyone for the help. Here is what I ended up with.
import pandas as pd
import re
inputinfo="C:\\path"
data=pd.DataFrame(pd.read_excel(inputinfo,parse_cols ="A",converters={"A":str}))
TRL=['trl']
WAY=['wy'] #do before HWY
HWY=['hwy','hy']
PATH=['path','pth']
LN=['lane','ln.','ln']
AVE=['avenue','ave.','av']
CIR=['circle','circ.','cir']
TER=['terrace','terace','te']
CT=['ct','ct,','ct.','court']
PL=['place','plc','pl.','pl']
CSWY=['causeway','cswy','csw']
PKWY=['parkway','pkway','pkwy','prkw']
DR=['drive,','drive.','drive','driv','dr,','dr.','dr']
PSGE=['passageway','passage','pasage','pass.','pass','pas']
BLVD=['boulevard','boulevar','blvd.','blv.','blvb','blvd','boul','bvld','bl.','blv','bl']
regex=r'(\d)(th)|(\d)(nd)|(3)(rd)|(1)(st)'
Lambda= lambda m: m.group(1) if m.group(1) else m.group(3) if m.group(3) else m.group(5) if m.group(5)else m.group(7) if m.group(7) else ''
# the above takes care of situations like "123 153*rd* st"
for row in range(0,data.shape[0]):
String = re.sub(regex,Lambda,str(data.loc[row,"Street Name"]))
Splits = String.split(" ")
print (str(row)+" of "+str(data.shape[0]))
for i in Splits:
ind=Splits.index(i)
if i in AVE:
Splits[ind]="AVE"
if i in TRL:
Splits[ind]="TRL"
if i in WAY:
Splits[ind]="WAY"
if i in HWY:
Splits[ind]="HWY"
if i in PATH:
Splits[ind]="PATH"
if i in TER:
Splits[ind]="TER"
if i in LN:
Splits[ind]="LN"
if i in CIR:
Splits[ind]="CIR"
if i in CT:
Splits[ind]="CT"
if i in PL:
Splits[ind]="PL"
if i in CSWY:
Splits[ind]="CSWY"
if i in PKWY:
Splits[ind]="PKWY"
if i in DR:
Splits[ind]="DR"
if i in PSGE:
Splits[ind]="PSGE"
if i in BLVD:
Splits[ind]="BLVD"
data.loc[row,"Street Name Modified"]=(" ".join(Splits))
data.to_csv("C:\\path\\StreetnameSample_output.csv",encoding='utf-8')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.