Python清理句子中的单词

Question

I am trying to write a function that accepts a string (sentence) and then cleans it and returns all alphabets, numbers and a hypen. 我正在尝试编写一个接受字符串（句子）然后清除它并返回所有字母，数字和连字符的函数。 however the code seems to error. 但是代码似乎出错。 Kindly know what I am doing wrong here. 请知道我在这里做错了什么。

Example: Blake D'souza is an !d!0t 示例：Blake D'souza是一个！d！0t
Should return: Blake D'souza is an d0t 应该返回：Blake D'souza是d0t

Python: 蟒蛇：

def remove_unw2anted(str):
    str = ''.join([c for c in str if c in 'ABCDEFGHIJKLNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz1234567890\''])
    return str

def clean_sentence(s):
    lst = [word for word in s.split()]
    #print lst
    for items in lst:
        cleaned = remove_unw2anted(items)
    return cleaned

s = 'Blake D\'souza is an !d!0t'
print clean_sentence(s)

Answer 1

You only return last cleaned word! 您只返回最后清除的单词！

Should be: 应该：

def clean_sentence(s):
    lst = [word for word in s.split()]

    lst_cleaned = []
    for items in lst:
        lst_cleaned.append(remove_unw2anted(items))
    return ' '.join(lst_cleaned)

A shorter method could be this: 较短的方法可能是这样的：

def is_ok(c):
    return c.isalnum() or c in " '"

def clean_sentence(s):
    return filter(is_ok, s)

s = "Blake D'souza is an !d!0t"
print clean_sentence(s)

Answer 2

A variation using string.translate which has the benefit ? 使用string.translate的变体有好处吗？ of being easy to extend and is part of string . 易于扩展，是string一部分。

import string

allchars = string.maketrans('','')

tokeep = string.letters + string.digits + '-'

toremove = allchars.translate(None, tokeep)

s = "Blake D'souza is an !d!0t"

print s.translate(None, toremove)

Output: 输出：

BlakeDsouzaisand0t

The OP said only keep characters, digits and hyphen - perhaps they meant keep whitespace as well? OP表示仅保留字符，数字和连字符-也许它们也意味着保留空格？

Python清理句子中的单词

问题描述

2 个解决方案

解决方案1
5 已采纳 2013-02-02 15:35:21

解决方案2
1 2013-02-02 19:14:09

Python清理句子中的单词

问题描述

2 个解决方案

解决方案1 5 已采纳 2013-02-02 15:35:21

解决方案2 1 2013-02-02 19:14:09

解决方案1
5 已采纳 2013-02-02 15:35:21

解决方案2
1 2013-02-02 19:14:09