简体   繁体   English

如何从字符串中删除所有非字母字符?

[英]How to remove all non-alphabetic characters from a string?

I have been working on a program which will take a hex file, and if the file name starts with "CID", then it should remove the first 104 characters, and after that point there is a few words.我一直在开发一个程序,它将采用一个十六进制文件,如果文件名以“CID”开头,那么它应该删除前 104 个字符,在那之后有几个字。 I also want to remove everything after the words, but the problem is the part I want to isolate varies in length.我也想删除单词后的所有内容,但问题是我要隔离的部分长度不同。

My code is currently like this:我的代码目前是这样的:

y = 0
import os
files = os.listdir(".")

filenames = []
for names in files:
    if names.endswith(".uexp"):
        filenames.append(names)
        y +=1
        print(y)
print(filenames)

for x in range(1,y):
    filenamestart = (filenames[x][0:3])
    print(filenamestart)
    if filenamestart == "CID":
        openFile = open(filenames[x],'r')
        fileContents = (openFile.read())
        ItemName = (fileContents[104:])
        print(ItemName)

Input Example file (pulled from HxD):输入示例文件(从 HxD 中提取):

.........................ýÿÿÿ................E.................!...1AC9816A4D34966936605BB7EFBC0841.....Sun Tan Specialist.................9.................!...9658361F4EFF6B98FF153898E58C9D52.....Outfit.................D.................!...F37BE72345271144C16FECAFE6A46F2A.....Don't get burned............................................................................................................................Áƒ*ž

I have got it working to remove the first 104 characters, but I would also like to remove the characters after 'Sun Tan Specialist', which will differ in length, so I am left with only that part.我已经开始删除前 104 个字符,但我还想删除“Sun Tan Specialist”之后的字符,它们的长度会有所不同,所以我只剩下那部分。

I appreciate any help that anyone can give me.我感谢任何人能给我的任何帮助。

One way to remove non-alphabetic characters in a string is to use regular expressions [ 1 ].删除字符串中非字母字符的一种方法是使用正则表达式 [ 1 ]。

>>> import re
>>> re.sub(r'[^a-z]', '', "lol123\t")
'lol'

EDIT编辑

The first argument r'[^az]' is the pattern that captures what will removed (here, by replacing it by an empty string '' ).第一个参数r'[^az]'是捕获将要删除r'[^az]'内容的模式(此处,通过将其替换为空字符串'' )。 The square brackets are used to denote a category (the pattern will match anything in this category), the ^ is a "not" operator and the az denotes all the small caps alphabetiv characters.方括号用于表示类别(该模式将匹配该类别中的任何内容), ^是“非”运算符,而az表示所有小型大写字母字符。 More information here:更多信息在这里:

https://docs.python.org/3/library/re.html#regular-expression-syntax https://docs.python.org/3/library/re.html#regular-expression-syntax

So for instance, to keep also capital letters and spaces it would be:因此,例如,要保留大写字母和空格,它将是:

>>> re.sub(r'[^a-zA-Z ]', '', 'Lol !this *is* a3 -test\t12378')
'Lol this is a test'

However from the data you give in your question the exact process you need seems to be a bit more complicated than just "getting rid of non-alphabetical characters".但是,从您在问题中提供的数据来看,您需要的确切过程似乎比“摆脱非字母字符”要复杂一些。

You can use filter :您可以使用filter

import string
print(''.join(filter(lambda character: character in string.ascii_letters + string.digits, '(ABC), DEF!'))) # => ABCDEF

You mentioned in a comment that you got the string down to Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned你在评论中提到你把绳子归结为Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned

Essentially your goal at this point is to remove any uppercase letter that isn't immediately followed by a lowercase letter because Upper Lower indicates the start of a phrase.从本质上讲,此时您的目标是删除所有未紧跟小写字母的大写字母,因为 Upper Lower 表示短语的开头。 You can use a for loop to do this.您可以使用 for 循环来执行此操作。

import re

h =  "Sun Tan SpecialistFEFFBFFECDOutfitDFBECFECAFEAFADont get burned"

output = ""
for i in range(0, len(h)):
    # Keep spaces
    if h[i] is " ":
        output += h[i]
    # Start of a phrase found, so separate with space and store character
    elif h[i].isupper() and h[i+1].islower():
        output += " " + h[i]
    # We want all lowercase characters
    elif h[i].islower():
        output += h[i]

# [1:] because we appended a space to the start of every word
 print output[1:]
 # If you dont care about Outfit since it is always there, remove it
 print output[1:].replace("Outfit", "")

Output:输出:

Sun Tan Specialist Outfit Dont get burned Sun Tan 专家套装 不要被烫伤

Sun Tan Specialist Dont get burned Sun Tan 专家 不要被烫伤

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 删除所有非字母字符,保留字符串中的空格 - Remove All Non-Alphabetic Characters, Preserve Spaces in String 从列表中删除非字母字符并保持结构 - Remove non-alphabetic characters from a list of lists and maintain structure 如何在python中替换字符串中的非字母和数字字符 - How to replace non-alphabetic AND numeric characters in a string in python ascii 代码是否包含所有非字母字符? - Does the ascii code contain all non-alphabetic characters? RegEx-如何在字符串中的非字母字符和字母字符之间留空格 - RegEx- How to make a blank space between a non-alphabetic char and an alphabetic char in a string 如何从字符串中删除除数字字符之外的所有字母字符? 尝试了所有现有的答案 - How do you remove all the alphabetic characters from the string except the numerical characters? tried all the present answers python打印非字母ASCII字符的怪异行为 - Bizarre behavior of python printing non-alphabetic ASCII characters 如何使用isalpha()将非字母字符替换为空格? - How can I use isalpha() to replace non-alphabetic characters with white spaces? 如何使用for循环忽略字符串中的每个非字母字符并添加到新字符串中? Python 3 - How to use a for loop to ignore every non-alphabetic character in a string and add to a new string? Python 3 如何删除每个包含非字母字符的单词 - How to remove every word with non alphabetic characters
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM