[英]Replacing specific characters in python list
I have a list named university_towns.txt
which has a list as follows: 我有一个名为
university_towns.txt
的列表,其列表如下:
['Alabama[edit]\n',
'Auburn (Auburn University)[1]\n',
'Florence (University of North Alabama)\n',
'Jacksonville (Jacksonville State University)[2]\n',
'Livingston (University of West Alabama)[2]\n',
'Montevallo (University of Montevallo)[2]\n',
'Troy (Troy University)[2]\n',
'Tuscaloosa (University of Alabama, Stillman College, Shelton State)[3] [4]\n',
'Tuskegee (Tuskegee University)[5]\n']
I want to clean this text file such that all the characters in parentheses are replaced by '' . 我想清除此文本文件,以便将括号中的所有字符替换为。 So, I want my text file to look like:
所以,我希望我的文本文件看起来像:
['Alabama',
'Auburn',
'Florence',
'Jacksonville',
'Livingston',
'Montevallo',
'Troy',
'Tuscaloosa,
'Tuskegee',
'Alaska',
'Fairbanks',
'Arizonan',
'Flagstaff',
'Tempe',
'Tucson']
I am trying to do this as follows: 我正在尝试这样做,如下所示:
import pandas as pd
import numpy as np
file = open('university_towns.txt','r')
lines = files.readlines()
for i in range(0,len(file)):
lines[i] = lines[i].replace('[edit]','')
lines[i] = lines[i].replace(r' \(.*\)','')
With this, I am able to remove '[edit]'
but I am not able to remove the string in '( )'
. 这样,我可以删除
'[edit]'
但不能删除'( )'
中的字符串。
You may use regex
along with list comprehension expression as: 您可以将
regex
与列表理解 regex
一起使用:
import re
new_list = [re.match('\w+', i).group(0) for i in my_list]
# match for word ^ ^ returns first word
where my_list
is the original list
mentioned in question. 其中
my_list
是所提及的原始list
。 Final value hold by new_list
will be: new_list
保留的最终值将是:
['Alabama',
'Auburn',
'Florence',
'Jacksonville',
'Livingston',
'Montevallo',
'Troy',
'Tuscaloosa',
'Tuskegee']
The replace
method on a string replaces an actual substring. 字符串上的
replace
方法将替换实际的子字符串。 You need to use regex: 您需要使用正则表达式:
import re
#...
line[i] = re.sub(r' (.*)', '', line[i])
A simple regex should do the trick. 一个简单的正则表达式就可以解决问题。
import re
output = [re.split(r'[[(]', s)[0].strip() for s in your_list]
You can use re.sub
instead of replace
您可以使用
re.sub
而不是replace
import re
# your code here
lines[i] = re.sub(r' \(.*\)','', lines[i])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.