简体   繁体   English

如何阻止正则表达式匹配不需要的空字符串?

[英]How do I stop regex from matching unwanted empty strings?

Im working on a problem set to count sentences.我正在研究一个计算句子的问题。 I decided to implement by using regular expressions to split the string at the characters "?, ., .".我决定通过使用正则表达式在字符“?,.,.”处拆分字符串来实现。 When I pass my text to re,split.当我将文本传递给 re,split 时。 it is including an empty string at the end of the list.它在列表末尾包含一个空字符串。

source code:源代码:

from cs50 import get_string
import re


def main():
    text = get_string("Text: ")
    cole_liau(text)


# Implement 0.0588 * L - 0.296 * S - 15.8; l = avg num of letters / 100 words , S = avg num of sentences / 100 words
def cole_liau(intext):

    words = []
    letters = []

    sentences = re.split(r"[.!?]+", intext)
    print(sentences)
    print(len(sentences))

main()

Output: Output:

Text: Congratulations.文字:恭喜。 Today is your day.今天是你的好日子。 You're off to Great Places!你要去伟大的地方! You're off and away!你已经离开了!

['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away", '']

5 5

I tried adding the + expression to make sure it was matching at least 1 [.?.] but that did not work either.我尝试添加 + 表达式以确保它至少匹配 1 [.?.] 但这也不起作用。

You may use a comprehension:您可以使用理解:

def cole_liau(intext):

    words = []
    letters = []

    sentences = [sent for sent in re.split(r"[.!?]+", intext) if sent]
    print(sentences)
    print(len(sentences))

Which yields哪个产量

['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away"]
4

As to why re.split() returns an empty string, see this answer .至于为什么re.split()返回一个空字符串,请看这个答案

re.split is working fine here. re.split在这里工作正常。 You have a !你有一个! at the end of the last sentence, so it will split the text before (a sentence), and after (a null character).在最后一个句子的末尾,所以它会在(一个句子)之前和之后(一个 null 字符)分割文本。

You can just add [:-1] at the end of your line to remove the last element of the list:您只需在行尾添加[:-1]即可删除列表的最后一个元素:

sentences = re.split(r"[.!?]+", intext)[:-1]

Output: Output:

['Congratulations', ' Today is your day', " You're off to Great Places", " You're off and away"]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何去除不需要的字符和字符串? - How do I strip unwanted characters and strings? 使用正则表达式拆分会产生不需要的空字符串 - Splitting using regex is giving unwanted empty strings 提供BMI值时,如何阻止BMI计算器添加不需要的“无”? - How do I stop my BMI calculator from adding an unwanted “None” when giving the BMI value? 如何阻止聚合函数向 dataframe 添加不需要的行? - How do I stop aggregate functions from adding unwanted rows to dataframe? 如何从python中的字符串数组列表中删除停用词? - How do I remove stop words from an arraylist of strings in python? 如何阻止pyCharm抱怨下划线字符串? - How do I stop pyCharm from complaining about underscore strings? 如何从 Python DataFrame 列中的字符串中删除不需要的部分 - How do I remove unwanted parts from strings in a Python DataFrame column 如何从xarray DataArray中删除不需要的空尺寸(挤压不起作用) - how do I remove unwanted empty dimension from xarray DataArray (squeeze doesn't work) 如何从最里面的一对开始反转每对匹配括号中包含的字符串? CodeFights - How do I reverse the strings contained in each pair of matching parentheses, starting from the innermost pair? CodeFights 如何从文本文件中读取行,并在数据库中的所有行和表中搜索匹配的字符串? - How do I read in lines from a text file and search through all rows and tables in a database for matching strings?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM