简体   繁体   English

将逗号后的上学期提取到新列中

[英]Extract last term after comma into new column

I have a pandas dataframe which is essentially 2 columns and 9000 rows 我有一个熊猫数据框,它基本上是2列和9000行

CompanyName  |  CompanyAddress

and the address is in the form 地址为以下形式

Line1, Line2, ..LineN, PostCode

ie basically different numbers of comma-separated items in a string (or dtype 'object'), and I want to just pull out the post code ie the item after the last comma in the field 即字符串(或dtype'object')中逗号分隔的项目的数量基本上不同,我只想提取邮政编码,即该字段中最后一个逗号之后的项目

I've tried the Dot notation string manipulation suggestions (possibly badly): 我尝试了点符号字符串操作建议(可能很糟糕):

df_address['CompanyAddress'] = df_address['CompanyAddress'].str.rsplit(', ') 

which just put '[ ]' around the fields - I had no success trying to isolate the last component of any split-up/partitioned string, with maxsplit kicking up errors. 只是将“ []”放在字段周围-我尝试用maxsplit错误来尝试分离任何拆分/分区字符串的最后一个组件都没有成功。

I had a small degree of success following EdChums comment to Pandas split Column into multiple columns by comma 在EdChums评论Pandas以逗号分隔列为多列之后,我取得了一定程度的成功

pd.concat([df_address[['CompanyName']], df_address['CompanyAddress'].str.rsplit(', ', expand=True)], axis=1)

However, whilst isolating the Postcode, this just creates multiple columns and the post code is in columns 3-6... equally no good. 但是,在隔离邮政编码的同时,这只会创建多列,而邮政编码在3-6列中……同样不好。

It feels incredibly close, please advise. 感觉非常接近,请告知。

    EmployerName    Address
0   FAUCET INN LIMITED  [Union, 88-90 George Street, London, W1U 8PA]
1   CITIBANK N.A    [Citigroup Centre,, Canary Wharf, Canada Squar...
2   AGENCY 2000 LIMITED     [Sovereign House, 15 Towcester Road, Old Strat...
3   Transform Trust     [Unit 11 Castlebridge Office Village, Kirtley ...
4   R & R.C.BOND (WHOLESALE) LIMITED    [One General Street, Pocklington Industrial Es...
5   MARKS & SPENCER FINANCIAL SERVICES PLC  [Marks & Spencer Financial, Services Kings Mea...

Given the DataFrame, 给定DataFrame,

df = pd.DataFrame({'Name': ['ABC'], 'Address': ['Line1, Line2, LineN, PostCode']})

    Address                         Name
0   Line1, Line2, LineN, PostCode   ABC

If you need only post code, you can extract that using rsplit and re-assign it to the column Address. 如果只需要邮政编码,则可以使用rsplit将其提取出来,然后将其重新分配给“地址”列。 It will save you the step of concat. 这将为您节省连接的步骤。

df['Address'] = df['Address'].str.rsplit(',').str[-1] 

You get 你得到

    Address     Name
0   PostCode    ABC

Edit: Give that you have dataframe with address values in list 编辑:给你有数据框与列表中的地址值

df = pd.DataFrame({'Name': ['FAUCET INN LIMITED'], 'Address': [['Union, 88-90 George Street, London, W1U 8PA']]})

    Address                                         Name
0   [Union, 88-90 George Street, London, W1U 8PA]   FAUCET INN LIMITED

You can get last element using 您可以使用获取最后一个元素

df['Address'] = df['Address'].apply(lambda x: x[0].split(',')[-1])

You get 你得到

    Address     Name
0   W1U 8PA     FAUCET INN LIMITED

Just rsplit the existing column into 2 columns - the existing one and a new one. 只是rsplit现有列到两列-在现有的和新的。 Or two new ones if you want to keep the existing column intact. 或两个新的,如果您想保持现有的列完整。

df['Address'], df['PostCode'] = df['Address'].str.rsplit(', ', 1).str

Edit: Since OP's Address column is a list with 1 string in it, here is a solution for that specifically: 编辑:由于OP的“地址”列是其中包含1个字符串的列表,因此专门为此提供一种解决方案:

df['Address'], df['PostCode'] = df['Address'].map(lambda x: x[0]).str.rsplit(', ', 1).str

rsplit返回一个列表,尝试rsplit(',')[0]获取源代码行中的最后一个元素

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 正则表达式在最后一个逗号后提取字符串,没有数字 - regex expression extract string after last comma with no numbers 删除句点,然后将“@”后的 email 扩展名删除到新列中以提取名字和姓氏信息 - Remove period then email extension after '@' into new column to extract first and last name information 从字符串中提取字词的最后一个作为熊猫中的新列 - Extract first and last words from strings as a new column in pandas 将每列的最后 N 个正值提取到新数据框中 - Extract into a new dataframe the last N positive value for each column 关于最后一行和最后一列之后的新行和空格 - About new line and space after last row and last column respectively 在 groupby 之后将数据从列提取到新列 - Extract data from column to new column after groupby 如何使用逗号交换名字和姓氏并添加新列? - How do I swap first and last names with a comma and add a new column? 在某个单词之后提取文本字符串并在 Pandas 中创建一个新列 - Extract text strings after certain word and create a new column in Pandas 提取特定字符后的部分字符串,并放置在新列中 - Extract partial string after a specific character and place in new column 在str.split操作之后创建一个具有最后2个值的新列 - Creating a new column with last 2 values after a str.split operation
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM