如何从字符串中删除标点符号，然后稍后再将其添加回相同的索引？

Question

I want the program to let's say if my word_str is "This is 'Cambridge University' for example." 我想让程序说我的word_str是否为“例如，这是'剑桥大学'。 It will keep the first and last letter of the word and scramble up the inside of the word if the length of the word is greater than 3 chars long. 如果单词的长度大于3个字符，它将保留单词的第一个和最后一个字母，并向上打乱单词的内部。 My problem is that it shuffles words with punctuation at the beginning or end of word incorrectly. 我的问题是，它会在单词的开头或结尾错误地用标点符号对单词进行混洗。 I need it to shuffle so that the punctuation remains in the correct index and then keeps the first and last letter of the word and shuffles the inside of the word adding punctuation at the end if there is one. 我需要对其进行混洗，以便标点符号保留在正确的索引中，然后保留单词的第一个和最后一个字母，并混洗单词的内部，如果有的话，在末尾添加标点符号。 Any ideas? 有任何想法吗？

def scramble_word(word_str):
char = ".,!?';:"
import random
if len(word_str) <= 3:
    return word_str + ' '
else:
    word_str = word_str.strip(char)
    word_str = list(word_str)
    scramble = word_str[1:-1]
    random.shuffle(scramble)
    scramble = ''.join(scramble)
    word_str = ''.join(word_str)
    new_word = word_str[0] + scramble + word_str[-1]
    return new_word + ' '

Answer 1

Using regular expressions: 使用正则表达式：

import random
import re

random.seed(1234) #remove this in production, just for replication of my results

def shuffle_word(m):
    word = m.group()
    inner = ''.join(random.sample(word[1:-1], len(word) - 2))
    return '%s%s%s' % (word[0], inner, word[-1])

s = """This is 'Cambridge University' for example."""

print re.sub(r'\b\w{3}\w+\b', shuffle_word, s)

Which prints 哪些印刷品

Tihs is 'Cadibrgme Uinrtvsiey' for exlampe.

re.sub allows you to pass it a function (which accepts an regex match object) instead of a replacement string. re.sub允许您向其传递一个函数（接受正则表达式匹配对象）而不是替换字符串。

EDIT - without regex 编辑-不使用正则表达式

from StringIO import StringIO

def shuffle_word(m):
    inner = ''.join(random.sample(m[1:-1], len(m) - 2))
    return '%s%s%s' % (m[0], inner, m[-1])

def scramble(text)
    sio = StringIO(text)
    accum = []
    start = None
    while sio.tell() < sio.len:
        char = sio.read(1)
        if start is None:
            if char.isalnum():
                start = sio.tell() - 1
            else:
                accum.append(char)
        elif not char.isalnum():
            end = sio.tell() - 1
            sio.seek(start)
            accum.append(shuffle_word(sio.read(end - start)))
            print accum[-1]
            start = None
    else:
        if start is not None:
            sio.seek(start)
            word = sio.read()
            if len(word) > 3:
                accum.append(shuffle_word(sio.read()))
            else:
                accum.append(word)

    return ''.join(accum)

s = """This is 'Cambridge University' for example."""
print scramble(s)

Answer 2

Extremly easy with a regex: 使用正则表达式极其简单：

import re
import random

s = ('Pitcairn Islands, Saint Helena, '
     'Ascension and Tristan da Cunha, '
     'Saint Kitts and Nevis, '
     'Saint Vincent and the Grenadines, Singapore')

reg = re.compile('(?<=[a-zA-Z])[a-zA-Z]{2,}(?=[a-zA-Z])')

def ripl(m):
    g = list(m.group())
    random.shuffle(g)
    return ''.join(g)

print reg.sub(ripl,s)

result 结果

Piictran Islands, Sanit Heelna, Asnioecsn and Tiastrn da Cunha, Sniat Ktits and Neivs, Snait Vnnceit and the Giearndens, Snoiaprge

如何从字符串中删除标点符号，然后稍后再将其添加回相同的索引？

问题描述

2 个解决方案

解决方案1
5 2013-03-11 22:27:56

EDIT - without regex 编辑-不使用正则表达式

解决方案2
1 2013-03-11 22:41:02

如何从字符串中删除标点符号，然后稍后再将其添加回相同的索引？

问题描述

2 个解决方案

解决方案1 5 2013-03-11 22:27:56

EDIT - without regex 编辑-不使用正则表达式

解决方案2 1 2013-03-11 22:41:02

解决方案1
5 2013-03-11 22:27:56

解决方案2
1 2013-03-11 22:41:02