在第二个或第三个大写字母python之后插入空格

Question

I have a pandas dataframe containing addresses. 我有一个包含地址的熊猫数据框。 Some are formatted correctly like 481 Rogers Rd York ON . 有些格式正确，例如481 Rogers Rd York ON 。 Others have a space missing between the city quandrant and the city name, for example: 101 9 Ave SWCalgary AB or even possibly: 101 9 Ave SCalgary AB , where SW refers to south west and S to south. 其他人在城市象限和城市名称之间缺少空格，例如： 101 9 Ave SWCalgary AB甚至可能： 101 9 Ave SCalgary AB ，其中SW表示西南， S表示南。

I'm trying to find a regex that will add a space between second and third capital letters if they are followed by lowercase letters, or if there are only 2 capitals followed by lower case, add a space between the first and second. 我试图找到一个正则表达式，如果第二个和第三个大写字母后跟小写字母，或者如果只有2个大写字母后跟小写字母，则在第二个和第三个大写字母之间添加一个空格，请在第一个和第二个之间添加一个空格。

So far, I've found that ([AZ]{2,3}[az]) will match the situation correctly, but I can't figure out how to look back into it and sub at position 2 or 3. Ideally, I'd like to use an index to split the match at [-2:] but I can't figure out how to do this. 到目前为止，我发现([AZ]{2,3}[az])可以正确匹配这种情况，但是我无法弄清楚如何回过头来查看它在位置2或3处的位置。我想使用索引在[-2:]处分割比赛，但我不知道该怎么做。

I found that re.findall('(?<=[AZ][AZ])[AZ][az].+', '101 9 Ave SWCalgary AB') will return the last part of the string and I could use a look forward regex to find the start and then join them but this seems very inefficient. 我发现re.findall('(?<=[AZ][AZ])[AZ][az].+', '101 9 Ave SWCalgary AB')将返回字符串的最后一部分，我可以使用期待正则表达式找到起点，然后加入他们，但这似乎效率很低。

Thanks 谢谢

Answer 1

You can use 您可以使用

([A-Z]{1,2})(?=[A-Z][a-z])

to capture the first (or first and second) capital letters, and then use lookahead for a capital letter followed by a lowercase letter. 捕获第一个（或第一个和第二个）大写字母，然后使用大写字母lookahead后面跟一个小写字母。 Then, replace with the first group and a space: 然后，用第一组和一个空格替换：

re.sub(r'([A-Z]{1,2})(?=[A-Z][a-z])', r'\1 ', str)

https://regex101.com/r/TcB4Ph/1 https://regex101.com/r/TcB4Ph/1

Answer 2

You may use 您可以使用

df['Test'] = df['Test'].str.replace(r'\b([A-Z]{1,2})([A-Z][a-z])', r'\1 \2')

See this regex demo 观看此正则表达式演示

Details 细节

\\b - a word boundary \\b单词边界
([AZ]{1,2}) - Capturing group 1 (later referred with \\1 from the replacement pattern): one or two uppercase letters ([AZ]{1,2}) -捕获组1（后来在替换模式中以\\1 ）：一个或两个大写字母
([AZ][az]) - Capturing group 2 (later referred with \\2 from the replacement pattern): an uppercase letter + a lowercase one. ([AZ][az]) -捕获组2（在替换模式中后来用\\2 ）：大写字母+小写字母。

If you want to specifically match city quadrants , you may use a bit more specific regex: 如果要特别匹配城市象限 ，则可以使用更具体的正则表达式：

df['Test'] = df['Test'].str.replace(r'\b([NS][EW]|[NESW])([A-Z][a-z])', r'\1 \2')

See this regex demo . 请参阅此正则表达式演示。 Here, [NS][EW]|[NESW] matches N or S that are followed with E or W , or a single N , E , S or W . 在这里， [NS][EW]|[NESW]匹配后跟E或W N或S或单个N ， E ， S或W

Pandas demo: 熊猫演示：

import pandas as pd
df = pd.DataFrame({'Test':['481 Rogers Rd York ON', 
'101 9 Ave SWCalgary AB',
'101 9 Ave SCalgary AB']})
>>> df['Test'].str.replace(r'\b([A-Z]{1,2})([A-Z][a-z])', r'\1 \2')
0      481 Rogers Rd York ON
1    101 9 Ave SW Calgary AB
2     101 9 Ave S Calgary AB
Name: Test, dtype: object

在第二个或第三个大写字母python之后插入空格

问题描述

2 个解决方案

解决方案1
0 2018-10-06 21:46:26

解决方案2
0 已采纳 2018-10-06 22:27:50

在第二个或第三个大写字母python之后插入空格

问题描述

2 个解决方案

解决方案1 0 2018-10-06 21:46:26

解决方案2 0 已采纳 2018-10-06 22:27:50

解决方案1
0 2018-10-06 21:46:26

解决方案2
0 已采纳 2018-10-06 22:27:50