简体   繁体   English

如何用python中的空格替换所有那些特殊字符?

[英]How to replace all those Special Characters with white spaces in python?

How to replace all those special characters with white spaces in python ? 如何用python中的空格替换所有这些特殊字符?

I have a list of names of a company . 我有一个公司名单。 . .

Ex:-[myfiles.txt] 例如: - [myfiles.txt]

MY company.INC 我的公司.C

Old Wine pvt 老酒列兵

master-minds ltd 大师思想公司

"apex-labs ltd" “apex-labs ltd”

"India-New corp" “印度新公司”

Indo-American pvt/ltd Indo-American pvt / ltd

Here, as per the above example . 这里,按照上面的例子。 . . I need all the special characters[-,",/,.] in the file myfiles.txt must be replaced with a single white space and saved into another text file myfiles1.txt . 我需要文件中的所有特殊字符[ - ,“,/,。] myfiles.txt必须替换为单个空格并保存到另一个文本文件myfiles1.txt

Can anyone please help me out? 有人可以帮帮我吗?

Assuming you mean to change everything non-alphanumeric, you can do this on the command line: 假设您要更改所有非字母数字的内容,可以在命令行中执行此操作:

cat foo.txt | sed "s/[^A-Za-z0-99]/ /g" > bar.txt

Or in Python with the re module: 或者在带有re模块的Python中:

import re
original_string = open('foo.txt').read()
new_string = re.sub('[^a-zA-Z0-9\n\.]', ' ', original_string)
open('bar.txt', 'w').write(new_string)
import string

specials = '-"/.' #etc
trans = string.maketrans(specials, ' '*len(specials))
#for line in file
cleanline = line.translate(trans)

eg 例如

>>> line = "Indo-American pvt/ltd"
>>> line.translate(trans)
'Indo American pvt ltd'
import re
strs = "how much for the maple syrup? $20.99? That's ricidulous!!!"
strs = re.sub(r'[?|$|.|!]',r'',strs) #for remove particular special char
strs = re.sub(r'[^a-zA-Z0-9 ]',r'',strs) #for remove all characters
strs=''.join(c if c not in map(str,range(0,10)) else '' for c in strs) #for remove numbers
strs = re.sub('  ',' ',strs) #for remove extra spaces
print(strs) 

Ans: how much for the maple syrup Thats ricidulous

While maketrans is the fastes way to do it, I never remerber the syntax. 虽然maketrans是最好的方法,但我从不重写语法。 Since speed is rarely an issue and I know regular expression, I would tend to do this: 由于速度很少是一个问题,我知道正则表达式,我倾向于这样做:

>>> line = "-[myfiles.txt] MY company.INC"
>>> import re
>>> re.sub(r'[^a-zA-Z0-9]', ' ',line)
'  myfiles txt  MY company INC'

This has the additional benefit of declaring the character you accept instead of the one you reject, which feels easier in this case. 这具有额外的好处,即声明您接受的角色而不是您拒绝的角色,在这种情况下感觉更容易。

Of couse if you are using non ASCII caracters you'll have to go back to removing the characters you reject. 如果你使用非ASCII字符,你必须回去删除你拒绝的字符。 If there are just punctuations sign, you can do: 如果只有标点符号,您可以执行以下操作:

>>> import string
>>> chars = re.escape(string.punctuation)
>>> re.sub(r'['+chars+']', ' ',line)
'  myfiles txt  MY company INC'

But you'll notice 但你会注意到的

At first i thought to provide a string.maketrans/translate example, but maybe you are using some utf-8 encoded strings and the ord() sorted translate-table will blow in your face, so i thought about another solution: 起初我想提供一个string.maketrans / translate示例,但也许你正在使用一些utf-8编码的字符串,并且ord()排序的translate-table会吹在你的脸上,所以我想到了另一个解决方案:

conversion = '-"/.'
text =  f.read()
newtext = ''
for c in text:
    newtext += ' ' if c in conversion else c

It's not the fastest way, but easy to grasp and modify. 这不是最快的方式,但易于掌握和修改。

So if your text is non-ascii you could decode conversion and the text-strings to unicode and afterwards reencode in whichever encoding you want to. 因此,如果您的文本是非ascii,您可以解码conversion ,将文本字符串解码为unicode,然后以您想要的任何编码重新编码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM