简体   繁体   English

仅当某个字符串不紧跟在另一个特定字符之后时才在该特定字符上分割字符串

[英]Split a string on a certain character only if it doesn't follow directly after another particular character

I have the following code line which is splitting the string data2 up into a list upon instances of a white space: 我有以下代码行,该代码行将字符串data2根据空白实例拆分为一个列表:

string_list = data2.split()

However in some of my data there are dates in the format "28, Dec" . 但是,在我的某些数据中,日期格式为"28, Dec" Here the above code is splitting on the white space between the date and the month when I don't want it to. 在这里,上面的代码在日期和月份之间的空白处分割,而我不希望这样做。 Is there a way I can say "split on the white space, but not if it is after a comma"? 有什么办法可以说“在空格上分割,但是如果在逗号后面则不行”?

You need to use regular expressions . 您需要使用正则表达式

>>> re.split('(?<!,) ', 'blah blah, blah')
['blah', 'blah, blah']

From the link: 从链接:

(?<!...) Matches if the current position in the string is not preceded by a match for .... This is called a negative lookbehind assertion. (?<!...)如果字符串中的当前位置之前没有...的匹配项,则匹配。这称为否定性后向断言。 Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. 类似于肯定的后置断言,所包含的模式必须仅匹配某个固定长度的字符串。 Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched. 以否定的后向断言开头的模式可以在要搜索的字符串的开头匹配。

Use re.split with a negative lookbehind expression: re.split与负向后的表达式一起使用:

re.split(r'(?<!,)\s','I went on 28, Dec')
Out[53]: ['I', 'went', 'on', '28, Dec']

You can split using a regular expression and utilize look-behind expressions to make sure that you don't split on a whitespace character that is preceded by a comma: 您可以使用正则表达式进行拆分,并利用后向表达式来确保您不会拆分以逗号开头的空格字符:

>>> import re
>>> s = 'foo bar 28, Dec bar baz'
>>> re.split('(?<!,)\s', s)
['foo', 'bar', '28, Dec', 'bar', 'baz']

Sorry to refloat this thread, but I was trying to decode sqlite cells, and something seems odd to me. 很抱歉无法重新使用此线程,但是我试图解码sqlite单元,对我来说似乎有些奇怪。 I´ll explain. 我会解释。 I´m trying to code two different numbers into one cell by creating a string with a 0 in between and then numerizing it, so for example: a=4 b=7 c=str(4)+'0'+str(7) 我试图通过创建一个介于0和0之间的数字然后将其数字化的方法将两个不同的数字编码到一个单元格中,例如: a=4 b=7 c=str(4)+'0'+str(7)

The problem is when the first number is 10, so I´m using this re.split('0([1-9])','1003') ['10','3',''] 问题是当第一个数字为10时,所以我正在使用此re.split('0([1-9])','1003') ['10','3','']

Why I´m getting a trhee lenght list when it should be just 2? 为什么我只有2个时才得到Trhee长度清单?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM