[英]Removing characters from a txt file using Python
I'm writing a program in python that will request a user to input a file name, open the file, and count the number of M's and F's and tally it as a ratio. 我正在用python编写一个程序,该程序将要求用户输入文件名,打开文件并计算M和F的数量,并将其作为比率。 I can get it to do that, and remove whitespace, but I can't figure out how to remove characters that are not M or F. I want to remove all incorrect characters and write them in a new file. 我可以做到这一点,并删除空格,但是我不知道如何删除非M或F的字符。我想删除所有不正确的字符并将其写入新文件。 Here's what I have so far 这是我到目前为止的
fname = raw_input('Please enter the file name: ') #Requests input from user
try: #Makes sure the file input is valid
fhand = open(fname)
except:
print 'Error. Invalid file name entered.'
exit()
else:
fhand = open(fname, 'r') #opens the file for reading
entireFile = fhand.read()
fhand.close()
entireFile.split() #Removes whitespace
''.join(entireFile) #Rejoins the characters
entireFile = entireFile.upper() #Converts all characters to capitals letters
males = entireFile.count('M')
print males
females = entireFile.count('F')
print females
males = float(males)
females = float(females)
length = males + females
print length
length = float(length)
totalMales = (males / length) * 1
totalFemales = (females / length) * 1
print "There are %", totalMales, " males and %", totalFemales, " in the file."
the easiest way is to use regex: 最简单的方法是使用正则表达式:
import re
data = re.findall(r'[FM]', entirefile)
and if you use r'[FMfm]'
you don't need to upper case all the file, the regex will catch all upper and lower case. 如果使用r'[FMfm]'
,则不需要将所有文件都大写,则正则表达式将捕获所有大写和小写字母。
and this will return to you all the F's
and M's
, and no need to remove white spaces
at all. 这将返回所有F's
和M's
,而根本不需要删除white spaces
。
example: 例:
entirefile = "MLKMADG FKFLJKASDM LKMASDLKMADF MASDLDF"
data = ['M', 'M', 'F', 'F', 'M', 'M', 'M', 'F', 'M', 'F']
and you can do whatever you want with this list. 您可以使用此列表执行任何操作。
hope this helps. 希望这可以帮助。
m,f,other = [],[],[]
for ch in entierFile:
if ch == "M":m.append(ch)
elif ch == "F":f.append(ch)
else: other.append(ch)
print len(m) + " Males, "+len(f)+" Females"
print "Other:",other
Use a regular expression to extract all characters that are not M or F: 使用正则表达式提取所有非M或F的字符:
import re
remainder = re.sub(r'M|F', '', entireFile)
with open('new_file', 'wb') as f:
f.write(remainder)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.