简体   繁体   English

Python清理句子中的单词

[英]Python cleaning words in a sentence

I am trying to write a function that accepts a string (sentence) and then cleans it and returns all alphabets, numbers and a hypen. 我正在尝试编写一个接受字符串(句子)然后清除它并返回所有字母,数字和连字符的函数。 however the code seems to error. 但是代码似乎出错。 Kindly know what I am doing wrong here. 请知道我在这里做错了什么。

Example: Blake D'souza is an !d!0t 示例:Blake D'souza是一个!d!0t
Should return: Blake D'souza is an d0t 应该返回:Blake D'souza是d0t

Python: 蟒蛇:

def remove_unw2anted(str):
    str = ''.join([c for c in str if c in 'ABCDEFGHIJKLNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890\''])
    return str

def clean_sentence(s):
    lst = [word for word in s.split()]
    #print lst
    for items in lst:
        cleaned = remove_unw2anted(items)
    return cleaned

s = 'Blake D\'souza is an !d!0t'
print clean_sentence(s)

You only return last cleaned word! 您只返回最后清除的单词!

Should be: 应该:

def clean_sentence(s):
    lst = [word for word in s.split()]

    lst_cleaned = []
    for items in lst:
        lst_cleaned.append(remove_unw2anted(items))
    return ' '.join(lst_cleaned)

A shorter method could be this: 较短的方法可能是这样的:

def is_ok(c):
    return c.isalnum() or c in " '"

def clean_sentence(s):
    return filter(is_ok, s)

s = "Blake D'souza is an !d!0t"
print clean_sentence(s)

A variation using string.translate which has the benefit ? 使用string.translate的变体有好处吗? of being easy to extend and is part of string . 易于扩展,是string一部分。

import string

allchars = string.maketrans('','')

tokeep = string.letters + string.digits + '-'

toremove = allchars.translate(None, tokeep)

s = "Blake D'souza is an !d!0t"

print s.translate(None, toremove)

Output: 输出:

BlakeDsouzaisand0t

The OP said only keep characters, digits and hyphen - perhaps they meant keep whitespace as well? OP表示仅保留字符,数字和连字符-也许它们也意味着保留空格?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM