繁体   English   中英

字符串之间用逗号分隔,但带有条件(忽略逗号分隔的单个单词)

[英]String separation by commas, but with a condition (ignore comma separated single word)

使用以下代码(有点混乱,我承认),我用逗号分隔字符串,但条件是,当字符串包含逗号分隔的单个单词时,它不会分隔,例如:它不会分隔"Yup, there's a reason why you want to hit the sack just minutes after climax"但是却将"The increase in heart rate, which you get from masturbating, is directly beneficial to the circulation, and can reduce the likelihood of a heart attack"['The increase in heart rate', 'which you get from masturbating', 'is directly beneficial to the circulation', 'and can reduce the likelihood of a heart attack']

问题在于,代码遇到这样的字符串时会失败: "When men ejaculate, it releases a slew of chemicals including oxytocin, vasopressin, and prolactin, all of which naturally help you hit the pillow." 我不希望催产素后分离,而是催乳素后分离。 我需要一个正则表达式来做到这一点。

import os
import textwrap
import re
import io
from textblob import TextBlob


string = str(input_string)

listy= [x.strip() for x in string.split(',')]
listy = [x.replace('\n', '') for x in listy]
listy = [re.sub('(?<!\d)\.(?!\d)', '', x) for x in listy]
listy = filter(None, listy) # Remove any empty strings    

newstring= []

for segment in listy:

    wc = TextBlob(segment).word_counts

    if listy[len(listy)-1] != segment:

        if len(wc) > 3:  # len(segment.split(' ')) > 7:
            newstring.append(segment+"&&")
        else:
            newstring.append(segment+",")

    else:

        newstring.append(segment)

sep = [x.strip() for x in (' '.join(newstring)).split('&&')]

考虑以下。

mystr="When men ejaculate, it releases a slew of chemicals including oxytocin, vasopressin, and prolactin, all of which naturally help you hit the pillow."

rExp=r",(?!\s+(?:and\s+)?\w+,)"
mylst=re.compile(rExp).split(mystr)
print(mylst)

应该给出以下输出。

['When men ejaculate', ' it releases a slew of chemicals including oxytocin, vasopressin, and prolactin', ' all of which naturally help you hit the pillow.']

让我们看看如何分割字符串...

,(?!\s+\w+,)

使用每个逗号后跟( (?! ->负向前方) \\s+\\w+,空格和带逗号的单词。
如果使用vasopressin, and上述操作将失败vasopressin, andand之后没有, 因此,在其中引入条件and\\s+

,(?!\s+(?:and\s+)?\w+,)

虽然我可能要使用以下内容

,(?!\s+(?:(?:and|or)\s+)?\w+,)

在这里测试正则表达式
在这里测试代码

本质上考虑更换您的生产线

listy= [x.strip() for x in string.split(',')]

listy= [x.strip() for x in re.split(r",(?!\s+(?:and\s+)?\w+,)",string)]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM