简体   繁体   English

替换字符串中的特定字符

[英]Replacing a specific character in a string

This may be very basic, but I've been struggling with it. 这可能是非常基本的,但是我一直在努力。

I have something like: 我有类似的东西:

one = ['L', 'K', 'M']

two = ['P', 'N', 'S']

I also have a string, let's say it's something like "LooK at Me", which I want to transform into "PooN at Se". 我也有一个字符串,比如说“ LooK at Me”,我想将其转换为“ PooN at Se”。 My idea is to loop though each letter of the string, and loop though the first list, compare the two, and if they're a match, to simply replace the letter in the string that is a match with something from the list one with it's pair from list two. 我的想法是遍历字符串的每个字母,并遍历第一个列表,比较两者,如果它们是匹配的,则简单地将匹配的字符串替换为列表中的某个字母,用是清单二中的一对。

The looping proves to be very inefficient, as I am working with large texts. 由于我正在处理大型文本,因此循环效率非常低。

The string I am accessing are in fact rows in a pandas dataframe: 我访问的字符串实际上是熊猫数据框中的行:

data = pd.read_csv('train.txt', delimiter='\\t', header=None, names=['category', 'text'], dtype=str)

and print data.head() gives something like: print data.head()给出如下内容:

0 MUSIC Today at the recording studio, John... 1 POLITICS The tensions inside the government have... 2 NEWS The new pictures of NASA show...

I separate the texts like 我将文本分开

text = data['text']

The trick here is, I am actually working with text written in Cyrillic, and I can't use any of the functions to lower the upper case letters, which is my goal. 这里的诀窍是,我实际上正在处理用西里尔字母编写的文本,并且我无法使用任何函数来降低大写字母,这是我的目标。 The best I've up with is the problem I introduced at the top, to simply locate each upper case letter and replace it with it's lower case equivalent. 我遇到的最好的问题是我在顶部介绍的问题,只需找到每个大写字母并将其替换为小写字母即可。

Any words of advice? 有什么建议吗?

It seems you need replace : 看来您需要replace

print (data)
                                                text
0         MUSIC  Today at the recording studio, John
1  POLITICS  The tensions inside the government have
2                NEWS  The new pictures of NASA show

one = ['L', 'K', 'M']
two = ['P', 'N', 'S']
data['text'] = data['text'].replace(one, two, regex=True)
print (data)
                                                text
0         SUSIC  Today at the recording studio, John
1  POPITICS  The tensions inside the government have
2                NEWS  The new pictures of NASA show
#use list comprehension
''.join([e if e not in one else two[one.index(e)] for i,e in enumerate(s)])
Out[523]: 'PooN at Se'

You can create a translation table, and use the translate method. 您可以创建一个转换表,并使用translate方法。

translation_table = str.maketrans("ABÉПЯ", "abéпя")

text = "Éléphant Язы́к"

text.translate(translation_table)
# 'éléphant язы́к'

We use maketrans to create the translation table. 我们使用maketrans创建转换表。 We use it with to parameters, two strings of equal length. 我们将它与to参数一起使用,即两个长度相等的字符串。 Then we use the translate method of our string. 然后,我们使用字符串的翻译方法。

I'd use vectorized .str.translate() method, which is designed for such things: 我将使用vectorized .str.translate()方法,该方法专为此类事情设计:

In [62]: one = ['S','o','a']

In [63]: two = ['$', '0', '@']

In [64]: tran_tab = str.maketrans(''.join(one), ''.join(two))

In [65]: data.text.str.translate(tran_tab)
Out[65]:
0           MU$IC  T0d@y @t the rec0rding studi0, J0hn
1    POLITIC$  The tensi0ns inside the g0vernment h@ve
2                  NEW$  The new pictures 0f NA$A sh0w
Name: text, dtype: object

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM