如何从字符串中删除所有非字母字符？

Question

I have been working on a program which will take a hex file, and if the file name starts with "CID", then it should remove the first 104 characters, and after that point there is a few words.我一直在开发一个程序，它将采用一个十六进制文件，如果文件名以“CID”开头，那么它应该删除前 104 个字符，在那之后有几个字。 I also want to remove everything after the words, but the problem is the part I want to isolate varies in length.我也想删除单词后的所有内容，但问题是我要隔离的部分长度不同。

My code is currently like this:我的代码目前是这样的：

y = 0
import os
files = os.listdir(".")

filenames = []
for names in files:
    if names.endswith(".uexp"):
        filenames.append(names)
        y +=1
        print(y)
print(filenames)

for x in range(1,y):
    filenamestart = (filenames[x][0:3])
    print(filenamestart)
    if filenamestart == "CID":
        openFile = open(filenames[x],'r')
        fileContents = (openFile.read())
        ItemName = (fileContents[104:])
        print(ItemName)

Input Example file (pulled from HxD):输入示例文件（从 HxD 中提取）：

.........................ýÿÿÿ................E.................!...1AC9816A4D34966936605BB7EFBC0841.....Sun Tan Specialist.................9.................!...9658361F4EFF6B98FF153898E58C9D52.....Outfit.................D.................!...F37BE72345271144C16FECAFE6A46F2A.....Don't get burned............................................................................................................................Áƒ*ž

I have got it working to remove the first 104 characters, but I would also like to remove the characters after 'Sun Tan Specialist', which will differ in length, so I am left with only that part.我已经开始删除前 104 个字符，但我还想删除“Sun Tan Specialist”之后的字符，它们的长度会有所不同，所以我只剩下那部分。

I appreciate any help that anyone can give me.我感谢任何人能给我的任何帮助。

Answer 1

One way to remove non-alphabetic characters in a string is to use regular expressions [ 1 ].删除字符串中非字母字符的一种方法是使用正则表达式 [ 1 ]。

>>> import re
>>> re.sub(r'[^a-z]', '', "lol123\t")
'lol'

EDIT编辑

The first argument r'[^az]' is the pattern that captures what will removed (here, by replacing it by an empty string '' ).第一个参数r'[^az]'是捕获将要删除r'[^az]'内容的模式（此处，通过将其替换为空字符串'' ）。 The square brackets are used to denote a category (the pattern will match anything in this category), the ^ is a "not" operator and the az denotes all the small caps alphabetiv characters.方括号用于表示类别（该模式将匹配该类别中的任何内容）， ^是“非”运算符，而az表示所有小型大写字母字符。 More information here:更多信息在这里：

https://docs.python.org/3/library/re.html#regular-expression-syntax https://docs.python.org/3/library/re.html#regular-expression-syntax

So for instance, to keep also capital letters and spaces it would be:因此，例如，要保留大写字母和空格，它将是：

>>> re.sub(r'[^a-zA-Z ]', '', 'Lol !this *is* a3 -test\t12378')
'Lol this is a test'

However from the data you give in your question the exact process you need seems to be a bit more complicated than just "getting rid of non-alphabetical characters".但是，从您在问题中提供的数据来看，您需要的确切过程似乎比“摆脱非字母字符”要复杂一些。

Answer 2

You can use filter :您可以使用filter ：

import string
print(''.join(filter(lambda character: character in string.ascii_letters + string.digits, '(ABC), DEF!'))) # => ABCDEF

Answer 3

You mentioned in a comment that you got the string down to Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned你在评论中提到你把绳子归结为Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned

Essentially your goal at this point is to remove any uppercase letter that isn't immediately followed by a lowercase letter because Upper Lower indicates the start of a phrase.从本质上讲，此时您的目标是删除所有未紧跟小写字母的大写字母，因为 Upper Lower 表示短语的开头。 You can use a for loop to do this.您可以使用 for 循环来执行此操作。

import re

h =  "Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned"

output = ""
for i in range(0, len(h)):
    # Keep spaces
    if h[i] is " ":
        output += h[i]
    # Start of a phrase found, so separate with space and store character
    elif h[i].isupper() and h[i+1].islower():
        output += " " + h[i]
    # We want all lowercase characters
    elif h[i].islower():
        output += h[i]

# [1:] because we appended a space to the start of every word
 print output[1:]
 # If you dont care about Outfit since it is always there, remove it
 print output[1:].replace("Outfit", "")

Output:输出：

Sun Tan Specialist Outfit Dont get burned Sun Tan 专家套装不要被烫伤

Sun Tan Specialist Dont get burned Sun Tan 专家不要被烫伤

如何从字符串中删除所有非字母字符？

问题描述

3 个解决方案

解决方案1
4 2018-07-27 15:33:39

解决方案2
1 2021-04-21 22:50:52

解决方案3
0 2018-07-27 16:40:58

如何从字符串中删除所有非字母字符？

问题描述

3 个解决方案

解决方案1 4 2018-07-27 15:33:39

解决方案2 1 2021-04-21 22:50:52

解决方案3 0 2018-07-27 16:40:58

解决方案1
4 2018-07-27 15:33:39

解决方案2
1 2021-04-21 22:50:52

解决方案3
0 2018-07-27 16:40:58