从 Python 中的字符串中删除特定的重复字符

Question

How i can delete specific duplicated characters from a string only if they goes one after one in Python?仅当它们在 Python 中一个接一个时，我如何才能从字符串中删除特定的重复字符？ For example:例如：

A have string一个有字符串

string = "Hello _my name is __Alex"

I need to delete duplicate _ only if they goes one after one __ and get string like this:仅当它们一个接一个 __ 并获得如下字符串时，我才需要删除重复的 _ ：

string = "Hello _my name is _Alex"

If i use set i got this:如果我使用 set 我得到了这个：

string = "_yoiHAemnasxl"

Answer 1

(Big edit: oops, I missed that you only want to de-deuplicate certain characters and not others. Retrofitting solutions...) （大编辑：哎呀，我错过了您只想对某些字符进行去重而不是对其他字符进行去重。改造解决方案......）

I assume you have a string that represents all the characters you want to de-duplicate.我假设您有一个字符串来表示您想要去重复的所有字符。 Let's call it to_remove , and say that it's equal to "_.-".让我们称它为to_remove ，并说它等于“_.-”。 So only underscores, periods, and hyphens will be de-duplicated.因此，只有下划线、句点和连字符会被去重。

You could use a regex to match multiple successive repeats of a character, and replace them with a single character.您可以使用正则表达式来匹配一个字符的多个连续重复，并将它们替换为单个字符。

>>> import re
>>> to_remove = "_.-"
>>> s = "Hello... _my name -- is __Alex"
>>> pattern = "(?P<char>[" + re.escape(to_remove) + "])(?P=char)+"
>>> re.sub(pattern, r"\1", s)
'Hello. _my name - is _Alex'

Quick breakdown:快速分解：

?P<char> assigns the symbolic name char to the first group. ?P<char>将符号名称char分配给第一个组。
we put to_remove inside the character matching set, [] .我们将to_remove放在字符匹配集[] 。 It's necessary to call re.escape because hyphens and other characters may have special meaning inside the set otherwise.有必要调用 re.escape 因为连字符和其他字符可能在集合中具有特殊含义，否则。
(?P=char) refers back to the character matched by the named group "char". (?P=char)指回与命名组“char”匹配的字符。
The + matches one or more repetitions of that character. +匹配该字符的一个或多个重复项。

So in aggregate, this means "match any character from to_remove that appears more than once in a row".所以to_remove ，这意味着“匹配to_remove出现多次的任何字符”。 The second argument to sub , r"\\1" , then replaces that match with the first group, which is only one character long. sub的第二个参数r"\\1"然后将该匹配替换为第一个组，该组只有一个字符长。

Alternative approach: write a generator expression that takes only characters that don't match the character preceding them.替代方法：编写一个生成器表达式，该表达式仅采用与其前面的字符不匹配的字符。

>>> "".join(s[i] for i in range(len(s)) if i == 0 or not (s[i-1] == s[i] and s[i] in to_remove))
'Hello. _my name - is _Alex'

Alternative approach #2: use groupby to identify consecutive identical character groups, then join the values together, using to_remove membership testing to decide how many values should be added..替代方法#2：使用groupby来识别连续的相同字符组，然后将值连接在一起，使用to_remove成员资格测试来决定应该添加多少个值。

>>> import itertools
>>> "".join(k if k in to_remove else "".join(v) for k,v in itertools.groupby(s, lambda c: c))
'Hello. _my name - is _Alex'

Alternative approach #3: call re.sub once for each member of to_remove.替代方法#3：为to_remove 的每个成员调用re.sub一次。 A bit expensive if to_remove contains a lot of characters.如果to_remove包含很多字符，则有点贵。

>>> for c in to_remove:
...     s = re.sub(rf"({re.escape(c)})\1+", r"\1", s)
...
>>> s
'Hello. _my name - is _Alex'

Answer 2

Simple re.sub() approach:简单的re.sub()方法：

import re

s = "Hello _my name is __Alex aa"
result = re.sub(r'(\S)\1+', '\\1', s)

print(result)

\\S - any non-whitespace character \\S - 任何非空白字符
\\1+ - backreference to the 1st parenthesized captured group (one or more occurrences) \\1+ - 对第一个带括号的捕获组的反向引用（出现一次或多次）

The output:输出：

Helo _my name is _Alex a

从 Python 中的字符串中删除特定的重复字符

问题描述

2 个解决方案

解决方案1
3 已采纳 2018-04-06 14:56:49

解决方案2
2 2018-04-06 14:55:20

从 Python 中的字符串中删除特定的重复字符

问题描述

2 个解决方案

解决方案1 3 已采纳 2018-04-06 14:56:49

解决方案2 2 2018-04-06 14:55:20

解决方案1
3 已采纳 2018-04-06 14:56:49

解决方案2
2 2018-04-06 14:55:20