简体   繁体   中英

Replacing a specific character in a string

This may be very basic, but I've been struggling with it.

I have something like:

one = ['L', 'K', 'M']

two = ['P', 'N', 'S']

I also have a string, let's say it's something like "LooK at Me", which I want to transform into "PooN at Se". My idea is to loop though each letter of the string, and loop though the first list, compare the two, and if they're a match, to simply replace the letter in the string that is a match with something from the list one with it's pair from list two.

The looping proves to be very inefficient, as I am working with large texts.

The string I am accessing are in fact rows in a pandas dataframe:

data = pd.read_csv('train.txt', delimiter='\\t', header=None, names=['category', 'text'], dtype=str)

and print data.head() gives something like:

0 MUSIC Today at the recording studio, John... 1 POLITICS The tensions inside the government have... 2 NEWS The new pictures of NASA show...

I separate the texts like

text = data['text']

The trick here is, I am actually working with text written in Cyrillic, and I can't use any of the functions to lower the upper case letters, which is my goal. The best I've up with is the problem I introduced at the top, to simply locate each upper case letter and replace it with it's lower case equivalent.

Any words of advice?

It seems you need replace :

print (data)
                                                text
0         MUSIC  Today at the recording studio, John
1  POLITICS  The tensions inside the government have
2                NEWS  The new pictures of NASA show

one = ['L', 'K', 'M']
two = ['P', 'N', 'S']
data['text'] = data['text'].replace(one, two, regex=True)
print (data)
                                                text
0         SUSIC  Today at the recording studio, John
1  POPITICS  The tensions inside the government have
2                NEWS  The new pictures of NASA show
#use list comprehension
''.join([e if e not in one else two[one.index(e)] for i,e in enumerate(s)])
Out[523]: 'PooN at Se'

You can create a translation table, and use the translate method.

translation_table = str.maketrans("ABÉПЯ", "abéпя")

text = "Éléphant Язы́к"

text.translate(translation_table)
# 'éléphant язы́к'

We use maketrans to create the translation table. We use it with to parameters, two strings of equal length. Then we use the translate method of our string.

I'd use vectorized .str.translate() method, which is designed for such things:

In [62]: one = ['S','o','a']

In [63]: two = ['$', '0', '@']

In [64]: tran_tab = str.maketrans(''.join(one), ''.join(two))

In [65]: data.text.str.translate(tran_tab)
Out[65]:
0           MU$IC  T0d@y @t the rec0rding studi0, J0hn
1    POLITIC$  The tensi0ns inside the g0vernment h@ve
2                  NEW$  The new pictures 0f NA$A sh0w
Name: text, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM