[英]How to remove all non-alphabetic characters from a string?
I have been working on a program which will take a hex file, and if the file name starts with "CID", then it should remove the first 104 characters, and after that point there is a few words.我一直在开发一个程序,它将采用一个十六进制文件,如果文件名以“CID”开头,那么它应该删除前 104 个字符,在那之后有几个字。 I also want to remove everything after the words, but the problem is the part I want to isolate varies in length.
我也想删除单词后的所有内容,但问题是我要隔离的部分长度不同。
My code is currently like this:我的代码目前是这样的:
y = 0
import os
files = os.listdir(".")
filenames = []
for names in files:
if names.endswith(".uexp"):
filenames.append(names)
y +=1
print(y)
print(filenames)
for x in range(1,y):
filenamestart = (filenames[x][0:3])
print(filenamestart)
if filenamestart == "CID":
openFile = open(filenames[x],'r')
fileContents = (openFile.read())
ItemName = (fileContents[104:])
print(ItemName)
Input Example file (pulled from HxD):输入示例文件(从 HxD 中提取):
.........................ýÿÿÿ................E.................!...1AC9816A4D34966936605BB7EFBC0841.....Sun Tan Specialist.................9.................!...9658361F4EFF6B98FF153898E58C9D52.....Outfit.................D.................!...F37BE72345271144C16FECAFE6A46F2A.....Don't get burned............................................................................................................................Áƒ*ž
I have got it working to remove the first 104 characters, but I would also like to remove the characters after 'Sun Tan Specialist', which will differ in length, so I am left with only that part.我已经开始删除前 104 个字符,但我还想删除“Sun Tan Specialist”之后的字符,它们的长度会有所不同,所以我只剩下那部分。
I appreciate any help that anyone can give me.我感谢任何人能给我的任何帮助。
One way to remove non-alphabetic characters in a string is to use regular expressions [ 1 ].删除字符串中非字母字符的一种方法是使用正则表达式 [ 1 ]。
>>> import re
>>> re.sub(r'[^a-z]', '', "lol123\t")
'lol'
EDIT编辑
The first argument r'[^az]'
is the pattern that captures what will removed (here, by replacing it by an empty string ''
).第一个参数
r'[^az]'
是捕获将要删除r'[^az]'
内容的模式(此处,通过将其替换为空字符串''
)。 The square brackets are used to denote a category (the pattern will match anything in this category), the ^
is a "not" operator and the az
denotes all the small caps alphabetiv characters.方括号用于表示类别(该模式将匹配该类别中的任何内容),
^
是“非”运算符,而az
表示所有小型大写字母字符。 More information here:更多信息在这里:
https://docs.python.org/3/library/re.html#regular-expression-syntax https://docs.python.org/3/library/re.html#regular-expression-syntax
So for instance, to keep also capital letters and spaces it would be:因此,例如,要保留大写字母和空格,它将是:
>>> re.sub(r'[^a-zA-Z ]', '', 'Lol !this *is* a3 -test\t12378')
'Lol this is a test'
However from the data you give in your question the exact process you need seems to be a bit more complicated than just "getting rid of non-alphabetical characters".但是,从您在问题中提供的数据来看,您需要的确切过程似乎比“摆脱非字母字符”要复杂一些。
You can use filter
:您可以使用
filter
:
import string
print(''.join(filter(lambda character: character in string.ascii_letters + string.digits, '(ABC), DEF!'))) # => ABCDEF
You mentioned in a comment that you got the string down to Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned
你在评论中提到你把绳子归结为
Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned
Essentially your goal at this point is to remove any uppercase letter that isn't immediately followed by a lowercase letter because Upper Lower indicates the start of a phrase.从本质上讲,此时您的目标是删除所有未紧跟小写字母的大写字母,因为 Upper Lower 表示短语的开头。 You can use a for loop to do this.
您可以使用 for 循环来执行此操作。
import re
h = "Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned"
output = ""
for i in range(0, len(h)):
# Keep spaces
if h[i] is " ":
output += h[i]
# Start of a phrase found, so separate with space and store character
elif h[i].isupper() and h[i+1].islower():
output += " " + h[i]
# We want all lowercase characters
elif h[i].islower():
output += h[i]
# [1:] because we appended a space to the start of every word
print output[1:]
# If you dont care about Outfit since it is always there, remove it
print output[1:].replace("Outfit", "")
Output:输出:
Sun Tan Specialist Outfit Dont get burned Sun Tan 专家套装 不要被烫伤
Sun Tan Specialist Dont get burned Sun Tan 专家 不要被烫伤
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.