[英]using regex to identify characters and digits in python
I have phone numbers that might look like:我的电话号码可能如下所示:
927-6847
611-6701p3715ou264-5435
869-6289fillemichelinemoisan
613-5000p4238soirou570-9639cel
and so on...等等...
I want to identify and break them into:我想识别并将它们分成:
9276847
6116701
2645435
8696289
6135000
5709639
String to store somewhere else:存储在其他地方的字符串:
611-6701p3715ou264-5435
869-6289fillemichelinemoisan
613-5000p4238soirou570-9639cel
When there is a p
between digits, The number after p is an extension- get the number before p and save the whole string somewhere else When there is ou
, another number starts after that When there is cel
or any random string, get the number part and save the whole string somewhere else当数字之间有一个
p
时,p 之后的数字是一个扩展 - 获取 p 之前的数字并将整个字符串保存在其他地方 当有ou
时,另一个数字在该数字之后开始 当有cel
或任何随机字符串时,获取数字部分并将整个字符串保存在其他地方
Edit: This is what I have tried:编辑:这是我尝试过的:
phNumber='928-4612cel'
if not re.match('^[\d]*$', phNumber):
res = re.match("(.*?)[a-z]",re.sub('[^\d\w]', '', phNumber)).group(1)
I am looking to handle cases and identify which of the strings had more characters before they were chopped off through regex我正在寻找处理案例并确定哪些字符串在通过正则表达式被切断之前具有更多字符
First let me confirm again your request:首先让我再次确认您的要求:
import re
# make a list to input all the string want to test,
EXAMPLE = [
"927-6847",
"9276847"
"927.6847"
"611-6701p3715ou264-5435",
"6116701p3715ou264-5435",
"869-6289fillemichelinemoisan",
"869.6289fillemichelinemoisan",
"8696289fillemichelinemoisan",
"613-5000p4238soirou570-9639cel",
]
def save_phone_number(test_string,output_file_name):
number_to_save = []
# regex pattern of "xxx-xxxx" where x is digits
regex_pattern = r"[0-9]{3}-[0-9]{4}"
phone_numbers = re.findall(regex_pattern,test_string)
# remove the "-"
for item in phone_numbers:
number_to_save.append(item.replace("-",""))
# save to file
with open(output_file_name,"a") as file_object:
for item in number_to_save:
file_object.write(item+"\n")
def save_somewhere_else(test_string,output_file_name):
string_to_save = []
# regex pattern if there is any alphabet in the string
# (.*) mean any character with any length
# [a-zA-Z] mean if there is a character that is lower or upper alphabet
regex_pattern = r"(.*)[a-zA-Z](.*)"
if re.match(regex_pattern,test_string) is not None:
with open(output_file_name,"a") as file_object:
file_object.write(test_string+"\n")
if __name__ == "__main__":
phone_number_file = "phone_number.txt"
somewhere_file = "somewhere.txt"
for each_string in EXAMPLE:
save_phone_number(each_string,phone_number_file)
save_somewhere_else(each_string,somewhere_file)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.