简体   繁体   English

正则表达式-在第一个逗号之前获取所有内容-python

[英]Regex - get everything before first comma - python

I have my input data utf8 encoded. 我将输入数据编码为utf8。

I'm applying a regular expression on the input to find everything before the comma. 我正在对输入应用正则表达式以查找逗号前的所有内容。
However my regex returns None , though I can see the comma visually. 但是我的正则表达式返回None ,尽管我可以在视觉上看到逗号。

What's wrong with it? 它出什么问题了?
I tested if ',' in MyString , which works fine. 我测试了MyString ','是否正常。

Here is my input data: 这是我的输入数据:

 ID            MyString
765427       Units G2 and G3, kings Drive
207162       Unit 5/165,Elizabeth Palace
47568        Unit 766 - 767 Gate 7,Jacks Way,
15498        Unit F, Himalayas Street,

As per my regex - re.search(r".*?,", s['MyString']) , 根据我的正则表达式re.search(r".*?,", s['MyString'])
I expect my output to be: 我希望我的输出是:

 ID            MyString
765427       Units G2 and G3,
207162       Unit 5/165,
47568        Unit 766 - 767 Gate 7,
15498        Unit F,

But what I am getting is: 但是我得到的是:

 ID            MyString
765427       Units G2 and G3,
207162       None
47568        Unit 766 - 767 Gate 7,
15498        None

Please correct if my understanding is right on the regex. 如果我对正则表达式的理解正确,请更正。 Else what's wrong. 否则怎么了。 I can't figure out whats wrong with this. 我不知道这是怎么了。

As @idjaw suggested above, an easier way to accomplish this is to use the split() function: 如上面@idjaw所建议的,一种更简单的方法是使用split()函数:

my_string = 'Unit 5/165,Elizabeth Palace'
ans = my_string.split(',', 1)[0]  # maxsplit = 1; 
print ans  

Result: 结果:
Unit 5/165

You could even get away with leave off the maxsplit=1 parameter, in this case: 您甚至maxsplit=1参数,在这种情况下:

ans = my_string.split(',')[0]

Also, note that while not technically an error, it is considered best practice to reserve first-letter capitalization of variable names for classes. 另外,请注意,尽管从技术上讲不是错误,但保留类的变量名的首字母大写被认为是最佳实践。 See What is the naming convention in Python for variable and function names? 请参阅Python中变量和函数名称的命名约定是什么? and PEP8 variable naming conventions. PEP8变量命名约定。

regex solution: 正则表达式解决方案:
I noticed that in your example results, when there was a space following the comma (in the string to be analyzed), you got the expected result. 我注意到在示例结果中,当逗号(在要分析的字符串中)后有空格时,您可以得到预期的结果。
However, when there was no space following the comma, your regex returned "None". 但是,当逗号后没有空格时 ,您的正则表达式将返回“ None”。

try using the regex pattern (.*?,) rather than .*?, 尝试使用正则表达式模式(.*?,)而不是.*?,

Here are a couple online tools for debugging and testing regex expressions: 这是用于调试和测试正则表达式的几个在线工具:
http://pythex.org/ http://pythex.org/
https://regex101.com/ https://regex101.com/
(has an option to generate the code for you, though it may be more verbose than necessary) (可以选择为您生成代码,尽管它可能比必要的更为冗长)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM