简体   繁体   English

使用Python字典的字符串替换循环

[英]String replacement loop using Python dictionary

Can someone suggest the best way to do an iterative string replacement from a dictionary? 有人可以建议从字典中进行迭代字符串替换的最佳方法吗?

I'm going to go down a column of addresses, which look like: 我将向下浏览一列地址,如下所示:

Address1=" 122 S 102 ct,"

I have my conversion logic as: 我的转换逻辑为:

CT=['ct','ct,','ct.','court']
DR=['drive,','drive.','drive','driv','dr,','dr.','dr']
dictionary={"CT":CT, "DR":DR}

How should I search for all dictionary values within Address1 and replace them with the corresponding key? 我应该如何在Address1搜索所有字典值,并用相应的键替换它们?

The goal is for " 122 S 102 ct," to become " 122 S 102 CT" , etc. 目标是使" 122 S 102 ct,"成为" 122 S 102 CT"等。

I can't quite get the syntax to replace with the corresponding key. 我不太能用相应的键替换语法。

did you try?: 你试过了吗?:

Splits = Address1.split("")
for i in Splits:
    if i in CT:
        i = 'CT'
    if i in DR:
        i = 'DR'

print(" ".join(Splits))  # " " will keep the spacing between words

Here is a sketch of a solution. 这是一个解决方案的草图。

With Original Approach 采用原始方法

You should use a dictionary from string to list of string, eg 您应该使用从字符串到字符串列表的字典,例如

conversions = {
    'CT': [ 'ct', 'ct,' 'ct.', 'court' ],
    'DR': [ 'drive', 'drive.', 'driv', 'dr', 'dr.' ]
}

Now, you can step through each word in the input, and replace it: 现在,您可以逐步检查输入中的每个单词,并将其替换:

def get_transformed_address(input):
    result = ''
    for word in input.split(' ')
        result += ' ' + maybe_convert(word)

    return result

Where maybe_convert() is: 哪里maybe_convert()是:

def maybe_convert(phrase):
    for canonical, representations in conversions.items():
        if representations.contains(phrase):
            return canonical

    # default is pass-through of input
    return phrase

With Regex 使用正则表达式

Probably a cleaner solution is to just use a map of replacement regexes on the input. 可能更干净的解决方案是仅在输入上使用替换正则表达式的映射。 eg 例如

conversions = {
    '/court_pattern_here/': 'CT',
    '/drive_pattern_here/': 'DR'
}

and then: 接着:

for regex, replacement in conversions.items():
    input = input.replace(regex, replacement)

You can prebuild inverse dictionary with an activestate dictionary inversion snippet 您可以使用活动状态字典反转代码段预先构建反向字典

http://code.activestate.com/recipes/415100-invert-a-dictionary-where-values-are-lists-one-lin/ http://code.activestate.com/recipes/415100-invert-a-dictionary-where-values-are-lists-one-lin/

def invert(d):
   return dict( (v,k) for k in d for v in d[k] ) 

This is a sample that might help. 这是一个示例,可能会有所帮助。 Your mileage may vary. 你的旅费可能会改变。

CT=['ct','ct,','ct.','court']
DR=['drive,','drive.','drive','driv','dr,','dr.','dr']
dictionary={"CT":CT, "DR":DR}
address1 =' 122 S 102 ct,'

We start by looking at each key and matching value (ie your list of elements). 我们首先查看每个键和匹配值(即您的元素列表)。 We then iterate over each element in the value and check to see if the element is present. 然后,我们遍历值中的每个元素,并检查该元素是否存在。 If yes... we then use the replacement method to replace the offending element with the key from the dictionary. 如果是,则...然后使用替换方法用字典中的键替换有问题的元素。

for key, value in dictionary.items():
    for element in value:
        if element in address1:
            address_new = address1.replace(element, key)
print(address_new) 
from string import punctuation

def transform_input(column):
  words = column.rstrip(punctuation).split()
  for key, values in conversions.items():
      for ind, word in enumerate(words):
          if word in values:
            words[ind] = key
  return ' '.join(words)


Address1=" 122 S 102 ct,"

conversions = {
    'CT': [ 'ct', 'ct,' 'ct.', 'court' ],
    'DR': [ 'drive', 'drive.', 'driv', 'dr', 'dr.' ]
}

print(transform_input(Address1)) # 122 S 102 CT

Thank you to everyone for the help. 谢谢大家的帮助。 Here is what I ended up with. 这就是我最后得到的。

    import pandas as pd
    import re 

    inputinfo="C:\\path"
    data=pd.DataFrame(pd.read_excel(inputinfo,parse_cols ="A",converters={"A":str}))

    TRL=['trl']
    WAY=['wy']                                                                 #do before HWY
    HWY=['hwy','hy']
    PATH=['path','pth']
    LN=['lane','ln.','ln']
    AVE=['avenue','ave.','av']
    CIR=['circle','circ.','cir']
    TER=['terrace','terace','te']
    CT=['ct','ct,','ct.','court']
    PL=['place','plc','pl.','pl']
    CSWY=['causeway','cswy','csw']
    PKWY=['parkway','pkway','pkwy','prkw']
    DR=['drive,','drive.','drive','driv','dr,','dr.','dr']
    PSGE=['passageway','passage','pasage','pass.','pass','pas']
    BLVD=['boulevard','boulevar','blvd.','blv.','blvb','blvd','boul','bvld','bl.','blv','bl']

    regex=r'(\d)(th)|(\d)(nd)|(3)(rd)|(1)(st)'
    Lambda= lambda m: m.group(1) if m.group(1) else m.group(3) if m.group(3) else m.group(5) if m.group(5)else m.group(7) if m.group(7) else ''
# the above takes care of situations like "123 153*rd* st"

    for row in range(0,data.shape[0]):
            String = re.sub(regex,Lambda,str(data.loc[row,"Street Name"]))
            Splits = String.split(" ")
            print (str(row)+" of "+str(data.shape[0]))
            for i in Splits:
                    ind=Splits.index(i)
                    if i in AVE:
                            Splits[ind]="AVE"
                    if i in TRL:
                            Splits[ind]="TRL"
                    if i in WAY:
                            Splits[ind]="WAY"
                    if i in HWY:
                            Splits[ind]="HWY"
                    if i in PATH:
                            Splits[ind]="PATH"
                    if i in TER:
                            Splits[ind]="TER"
                    if i in LN:
                            Splits[ind]="LN"
                    if i in CIR:
                            Splits[ind]="CIR"
                    if i in CT:
                            Splits[ind]="CT"
                    if i in PL:
                            Splits[ind]="PL"
                    if i in CSWY:
                            Splits[ind]="CSWY"
                    if i in PKWY:
                            Splits[ind]="PKWY"
                    if i in DR:
                            Splits[ind]="DR"
                    if i in PSGE:
                            Splits[ind]="PSGE"
                    if i in BLVD:
                            Splits[ind]="BLVD"  
            data.loc[row,"Street Name Modified"]=(" ".join(Splits))

    data.to_csv("C:\\path\\StreetnameSample_output.csv",encoding='utf-8')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM