简体   繁体   English

如何在列表项 python 中保留分隔符

[英]How to retain delimiter within list item python

I'm writing a program which jumbles clauses within a text using punctuation marks as delimiters for when to split the text.我正在编写一个程序,该程序使用标点符号作为何时拆分文本的分隔符来混淆文本中的子句。

At the moment my code has a large list where each item is a group of clauses.目前我的代码有一个很大的列表,其中每个项目都是一组子句。

import re
from random import shuffle
clause_split_content = []

text = ["this, is. a test?", "this: is; also. a test!"]

for i in text:
        clause_split = re.split('[,;:".?!]', i)
        clause_split.remove(clause_split[len(clause_split)-1])
        for x in range(0, len(clause_split)):
                clause_split_content.append(clause_split[x])
shuffle(clause_split_content)
print(*content, sep='')

at the moment the result jumbles the text without retaining the punctuation which is used as the delimiter to split it.目前结果使文本混乱而不保留用作分隔符的标点符号来拆分它。 The output would be something like this: output 是这样的:

a test this also this is a test is

I want to retain the punctuation within the final output so it would look something like this:我想在最后的 output 中保留标点符号,所以它看起来像这样:

a test! this, also. this: is. a test? is;

Option 1: Shuffle words in each index and combine into sentence.选项1:将每个索引中的单词打乱并组合成句子。

from random import shuffle

count = 0
sentence = ''
new_text = []
text = ["this, is. a test?", "this: is; also. a test!"]

while count < len(text):
    new_text.append(text[count].split())
    shuffle(new_text[count])
    count += 1

for i in new_text:
    for j in i:
        sentence += j + ' '

print(sentence)    

Sample shuffled output:样本洗牌输出:

test? this, a is. is; test! this: a also. 
test? a is. this, is; test! a this: also. 
is. test? a this, test! a this: also. is; 

Option 2: Combine all elements in list into single element, then shuffle words and combine into a sentence.选项2:将列表中的所有元素组合成一个元素,然后打乱单词并组合成一个句子。

import random
from random import shuffle

count = 0
sentence = ''
new_text = []
text_combined = []
text = ["this, is. a test?", "this: is; also. a test!"]

while count < len(text):
    new_text.append(text[count].split())
    count += 1

for i in new_text:
    for j in i:
        text_combined.append(j)

shuffled_list = random.sample(text_combined, len(text_combined))        

for i in shuffled_list:
    sentence += i + ' '
     
print(sentence)

Sample Ouput:样本输出:

this, is; also. a this: is. a test? test! 
test! is. this: test? a this, a also. is; 
is. a a is; also. test! test? this, this:

I think you are simply using the wrong function of re for your purpose.我认为您只是出于您的目的使用了错误的re功能。 split() excludes your separator, but you can use another function eg findall() to manually select all words you want. split()不包括您的分隔符,但您可以使用另一个函数,例如findall()手动选择您想要的所有单词。 For example with the following code I can create your desired output:例如,使用以下代码,我可以创建您想要的输出:

import re
from random import shuffle

clause_split_content = []

text = ["this, is. a test?", "this: is; also. a test!"]

for i in text:
    words_with_seperator = re.findall(r'([^,;:".?!]*[,;:".?!])\s?', i)
    clause_split_content.extend(words_with_seperator)
    
shuffle(clause_split_content)
print(*clause_split_content, sep=' ')

Output:输出:

this, this: is. also. a test! a test? is;

The pattern ([^,;:".?!]*[,;:".?!])\s?模式([^,;:".?!]*[,;:".?!])\s? simply takes all characters that are not a separator until a separator is seen.只取所有不是分隔符的字符,直到看到分隔符。 These characters are all in the matching group, which creates your result.这些字符都在匹配组中,这会创建您的结果。 The \s? \s? is only to get rid of the space characters in between the words.只是为了摆脱单词之间的空格字符。

Here's a way to do what you've asked:这是一种执行您所要求的方法:

import re
from random import shuffle
text = ["this, is. a test?", "this: is; also. a test!"]
content = [y for x in text for y in re.findall(r'([^,;:".?!]*[,;:".?!])', x)]
shuffle(content)
print(*content, sep=' ')

Output:输出:

 is;  is.  also.  a test? this,  a test! this:

Explanation:解释:

  • the regex pattern r'([^,;:".?!]*[,;:".?!])' matches 0 or more non-separator characters followed by a separator character, and findall() returns a list of all such non-overlapping matches正则表达式模式r'([^,;:".?!]*[,;:".?!])'匹配 0 个或多个非分隔符后跟一个分隔符,并且findall()返回一个列表所有此类非重叠匹配
  • the list comprehension iterates over the input strings in list text and has an inner loop that iterates over the findall results for each input string, so that we create a single list of every matched pattern within every string.列表推导迭代列表text中的输入字符串,并有一个内部循环迭代每个输入字符串的findall结果,因此我们为每个字符串中的每个匹配模式创建一个列表。
  • shuffle and print are as in your original code. shuffleprint与您的原始代码一样。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM