简体   繁体   English

在第二个或第三个大写字母python之后插入空格

[英]Insert space after the second or third capital letter python

I have a pandas dataframe containing addresses. 我有一个包含地址的熊猫数据框。 Some are formatted correctly like 481 Rogers Rd York ON . 有些格式正确,例如481 Rogers Rd York ON Others have a space missing between the city quandrant and the city name, for example: 101 9 Ave SWCalgary AB or even possibly: 101 9 Ave SCalgary AB , where SW refers to south west and S to south. 其他人在城市象限和城市名称之间缺少空格,例如: 101 9 Ave SWCalgary AB甚至可能: 101 9 Ave SCalgary AB ,其中SW表示西南, S表示南。

I'm trying to find a regex that will add a space between second and third capital letters if they are followed by lowercase letters, or if there are only 2 capitals followed by lower case, add a space between the first and second. 我试图找到一个正则表达式,如果第二个和第三个大写字母后跟小写字母,或者如果只有2个大写字母后跟小写字母,则在第二个和第三个大写字母之间添加一个空格,请在第一个和第二个之间添加一个空格。

So far, I've found that ([AZ]{2,3}[az]) will match the situation correctly, but I can't figure out how to look back into it and sub at position 2 or 3. Ideally, I'd like to use an index to split the match at [-2:] but I can't figure out how to do this. 到目前为止,我发现([AZ]{2,3}[az])可以正确匹配这种情况,但是我无法弄清楚如何回过头来查看它在位置2或3处的位置。我想使用索引在[-2:]处分割比赛,但我不知道该怎么做。

I found that re.findall('(?<=[AZ][AZ])[AZ][az].+', '101 9 Ave SWCalgary AB') will return the last part of the string and I could use a look forward regex to find the start and then join them but this seems very inefficient. 我发现re.findall('(?<=[AZ][AZ])[AZ][az].+', '101 9 Ave SWCalgary AB')将返回字符串的最后一部分,我可以使用期待正则表达式找到起点,然后加入他们,但这似乎效率很低。

Thanks 谢谢

You can use 您可以使用

([A-Z]{1,2})(?=[A-Z][a-z])

to capture the first (or first and second) capital letters, and then use lookahead for a capital letter followed by a lowercase letter. 捕获第一个(或第一个和第二个)大写字母,然后使用大写字母lookahead后面跟一个小写字母。 Then, replace with the first group and a space: 然后,用第一组和一个空格替换:

re.sub(r'([A-Z]{1,2})(?=[A-Z][a-z])', r'\1 ', str)

https://regex101.com/r/TcB4Ph/1 https://regex101.com/r/TcB4Ph/1

You may use 您可以使用

df['Test'] = df['Test'].str.replace(r'\b([A-Z]{1,2})([A-Z][a-z])', r'\1 \2')

See this regex demo 观看此正则表达式演示

Details 细节

  • \\b - a word boundary \\b单词边界
  • ([AZ]{1,2}) - Capturing group 1 (later referred with \\1 from the replacement pattern): one or two uppercase letters ([AZ]{1,2}) -捕获组1(后来在替换模式中以\\1 ):一个或两个大写字母
  • ([AZ][az]) - Capturing group 2 (later referred with \\2 from the replacement pattern): an uppercase letter + a lowercase one. ([AZ][az]) -捕获组2(在替换模式中后来用\\2 ):大写字母+小写字母。

If you want to specifically match city quadrants , you may use a bit more specific regex: 如果要特别匹配城市象限 ,则可以使用更具体的正则表达式:

df['Test'] = df['Test'].str.replace(r'\b([NS][EW]|[NESW])([A-Z][a-z])', r'\1 \2')

See this regex demo . 请参阅此正则表达式演示 Here, [NS][EW]|[NESW] matches N or S that are followed with E or W , or a single N , E , S or W . 在这里, [NS][EW]|[NESW]匹配后跟EW NS或单个NESW

Pandas demo: 熊猫演示:

import pandas as pd
df = pd.DataFrame({'Test':['481 Rogers Rd York ON', 
'101 9 Ave SWCalgary AB',
'101 9 Ave SCalgary AB']})
>>> df['Test'].str.replace(r'\b([A-Z]{1,2})([A-Z][a-z])', r'\1 \2')
0      481 Rogers Rd York ON
1    101 9 Ave SW Calgary AB
2     101 9 Ave S Calgary AB
Name: Test, dtype: object

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当且仅当前一个字母不是大写字母时,才如何在大写字母前插入空格? - How do I insert space before capital letter if and only if previous letter is not capital? Python Regex - 检查大写字母后面的大写字母 - Python Regex - checking for a capital letter with a lowercase after 如何在没有空格的情况下对大写字母进行正则表达式| Python 3 - How to regex Capital Letter without space before | Python 3 Python - 数据框列在“.”后重命名为大写字母 - Python - data frame columns rename with capital letter after the '.' 大写字母单词计数python - Capital letter word count python 如果大写字母前面和后面跟着一个小写字母,则插入空格 - Python - Insert space if uppercase letter is preceded and followed by one lowercase letter - Python 如何使用正则表达式提取第二个大写字母后的所有文本(数字、字母、符号)? - How do I extract with regex all the text (numbers, letters, symbols) after the second capital letter? 查找以大写字母作为起始字母但前面没有空格的单词 - find words with capital letter as starting letter but not preceded by space 如何打印 python 中每个字符串的第一个字母、第二个字母和第三个字母? - How do I print the first letter and second letter and third.. of each string in python? 当且仅当前一个字母也不也是大写的Pythonic方式才能在大写字母前添加空间 - Pythonic Way to Add Space Before Capital Letter If and Only If Previous Letter is Not Also Capital
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM