[英]Python: How to replace a lots of strings
I'm trying to replace a lots of strings (only three strings example but I have thousands strings actually) to other strings defined on "replaceWord".我正在尝试将大量字符串(仅三个字符串示例,但实际上我有数千个字符串)替换为在“replaceWord”上定义的其他字符串。
However,code i wrote dose not work as I expected.然而,我写的代码并没有像我预期的那样工作。
After running script, output is as below:运行脚本后,output如下:
before after
0 test1234 test1234
1 test1234 test1234
2 test1234 1349
3 test1234 test1234
4 test1234 test1234
I need output as below;我需要 output 如下;
before after
1 test1234 1349
2 test9012 te1210st
3 test5678 8579
4 april I was born August
5 mcdonalds i like checkin
script脚本
import os.path, time, re
import pandas as pd
import csv
body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"
replaceWord = [
["test9012","te1210st"],
["test5678","8579"],
["test1234","1349"],
["april","August"],
["mcdonalds","chicken"],
]
cols = ['before','after']
df = pd.DataFrame(index=[], columns=cols)
for word in replaceWord:
body01_after = re.sub(word[0], word[1], body01_before)
body02_after = re.sub(word[0], word[1], body02_before)
body03_after = re.sub(word[0], word[1], body03_before)
body04_after = re.sub(word[0], word[1], body04_before)
body05_after = re.sub(word[0], word[1], body05_before)
df=df.append({'before':body01_before,'after':body01_after}, ignore_index=True)
#df.head()
print(df)
df.to_csv('test_replace.csv')
Use regular expressions to capture the non-digits (\D+)
as the first group and the digits (\d+)
as the second group.使用正则表达式将非数字(\D+)
捕获为第一组,将数字(\d+)
捕获为第二组。 replace the text by starting with the second group \2
then first group \1
通过从第二组\2
然后第一组\1
开始替换文本
df['after'] = df['before'].str.replace(r'(\D+)(\d+)', r'\2\1', regex = True)
df
before after
1 test1234 1234test
2 test9012 9012test
3 test5678 5678test
Seems that you do not have the dataset.似乎您没有数据集。 You have variables:你有变量:
body01_before="test1234"
body02_before="test9012"
body03_before="test5678"
body04_before="i like mcdonalds"
body05_before="I was born april"
replaceWord = [
["test9012","te1210st"],
["test5678","8579"],
["test1234","1349"],
["april","August"],
["mcdonalds","chicken"],
]
# Gather the variables in a list
vars = re.findall('body0\\d[^,]+', ','.join(globals().keys()))
df = pd.DataFrame(vars, columns = ['before_1'])
# Obtain the values of the variable
df['before'] = df['before_1'].apply(lambda x:eval(x))
# replacement function
repl = lambda x: x[0] if (rp:=dict(replaceWord).get(x[0])) is None else rp
# Do the replacement
df['after'] = df['before'].str.replace('(\\w+)',repl, regex= True)
df
before_1 before after
0 body01_before test1234 1349
1 body02_before test9012 te1210st
2 body03_before test5678 8579
3 body04_before i like mcdonalds i like chicken
4 body05_before I was born april I was born August
Does this suit your purpose?这符合您的目的吗?
words = ["test9012", "test5678", "test1234"]
updated = []
for word in words:
for i, char in enumerate(word):
if 47 < ord(char) < 58: # the character codes for digits 1-9
updated.append(f"{word[i:]}{word[:i]}")
break
print(updated)
The code prints: ['9012test', '5678test', '1234test']
代码打印: ['9012test', '5678test', '1234test']
As I understand, you have a list of strings and a mapping dictionary in the form of: {oldString1: newString1, oldString2: newString2, ...} that you want to use to replace the original list of strings.据我了解,您有一个字符串列表和一个映射字典,其形式为: {oldString1: newString1, oldString2: newString2, ...}您想要用来替换原始字符串列表。 The fastest (and maybe most Pythonic) approach I can think of is to simply save your mapping dictionary as a Python dict
.我能想到的最快(也许是最 Pythonic)的方法是将映射字典简单地保存为 Python dict
。 For example:例如:
mapping = {
"test9012":"9012test",
"test5678","5678test",
"test1234","1234test",
}
If your list of strings is stored as a Python list, you can get the replaced list with the following code:如果您的字符串列表存储为 Python 列表,您可以使用以下代码获取替换列表:
new_list = [mapping.get(key=old_string, default=old_string) for old_string in old_list]
Note : We use mapping.get()
with default=old_string
so that the function return the old_string
in case it is not in the mapping dictionary.注意:我们将mapping.get()
与default=old_string
一起使用,以便 function 返回old_string
,以防它不在映射字典中。
If your list of strings is stored in a Pandas Series (or a column of a Pandas DataFrame), you can quickly replace the strings with:如果您的字符串列表存储在 Pandas 系列(或 Pandas DataFrame 的列)中,您可以快速将字符串替换为:
new_list = old_list.map(mapping, na_action='ignore')
Note : We set na_action='ignore'
so that the function return the old_string
in case it is not in the mapping dictionary.注意:我们设置na_action='ignore'
以便 function 返回old_string
,以防它不在映射字典中。
You can use regex to match the pattern.您可以使用正则表达式来匹配模式。
import os.path, time, re
import pandas as pd
import csv
words = ["test9012", "test5678", "test1234"]
for word in words:
textOnlyMatch = re.match("(([a-z]|[A-Z])*)", word)
textOnly = textOnlyMatch.group(0) // take the entire match group
numberPart = word.split(textOnly)[1] // take string of number only
result = numberPart + textOnly
df = df.append({'before':word,'after':result}, ignore_index=True)
#df.head()
print(df)
df.to_csv('test_replace.csv')
So by using regex match you can separate the alphabet only and the number only part.因此,通过使用正则表达式匹配,您可以仅分隔字母和仅数字部分。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.