繁体   English   中英

根据列表中的项目拆分 DataFrame 中的列

[英]Split column in DataFrame based on item in list

我有下表,想将每一行分成三列:state,邮政编码和城市。 State 和邮政编码很容易,但我无法提取城市。 我想过在街道同义词之后和 state 之前拆分每个字符串,但我似乎弄错了循环,因为它只会使用我列表中的最后一项。

输入数据:

    Address Text
0   11 North Warren Circle Lisbon Falls ME 04252
1   227 Cony Street Augusta ME 04330
2   70 Buckner Drive Battle Creek MI
3   718 Perry Street Big Rapids MI
4   14857 Martinsville Road Van Buren MI
5   823 Woodlawn Ave Dallas TX 75208
6   2525 Washington Avenue Waco TX 76710
7   123 South Main St Dallas TX 75201

我正在尝试实现的 output (对于所有行,但我只写了前两个以节省时间)

    City          State    Postcode 
0   Lisbon Falls  ME       04252
1   Augusta       ME       04330

我的代码:

# Extract postcode and state
df["Zip"] = df["Address Text"].str.extract(r'(\d{5})', expand = True)
df["State"] = df["Address Text"].str.extract(r'([A-Z]{2})', expand = True)

# Split after these substrings
street_synonyms = ["Circle", "Street", "Drive", "Road", "Ave", "Avenue", "St"]

# This is where I got stuck
df["Syn"] = df["Address Text"].apply(lambda x: x.split(syn))
df

这是一种方法:

import pandas as pd

# data
df = pd.DataFrame(
    ['11 North Warren Circle Lisbon Falls ME 04252',
     '227 Cony Street Augusta ME 04330',
     '70 Buckner Drive Battle Creek MI',
     '718 Perry Street Big Rapids MI',
     '14857 Martinsville Road Van Buren MI',
     '823 Woodlawn Ave Dallas TX 75208',
     '2525 Washington Avenue Waco TX 76710',
     '123 South Main St Dallas TX 75201'],
    columns=['Address Text'])

# Extract postcode and state
df["Zip"] = df["Address Text"].str.extract(r'(\d{5})', expand=True)
df["State"] = df["Address Text"].str.extract(r'([A-Z]{2})', expand=True)

# Split after these substrings
street_synonyms = ["Circle", "Street", "Drive", "Road", "Ave", "Avenue", "St"]


def find_city(address, state, street_synonyms):
    for syn in street_synonyms:
        if syn in address:
            # remove street
            city = address.split(syn)[-1]
            # remove State and postcode
            city = city.split(state)[0]
            return city


df['City'] = df.apply(lambda x: find_city(x['Address Text'], x['State'], street_synonyms), axis=1)

print(df[['City', 'State', 'Zip']])

"""
             City State    Zip
0   Lisbon Falls     ME  04252
1        Augusta     ME  04330
2   Battle Creek     MI    NaN
3     Big Rapids     MI    NaN
4      Van Buren     MI  14857
5         Dallas     TX  75208
6       nue Waco     TX  76710
7         Dallas     TX  75201
"""

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM