简体   繁体   English

如何提取用户句子并从中创建单词列表?

[英]How to take a user sentence and create a list of words out of it?

I'm unsure what the user will enter but I want to break their input sentence up into words in a list 我不确定用户会输入什么,但是我想将输入的句子分解为列表中的单词

User_input = raw_input("Please enter a search criterion: ")
User_Input_list[""]

# input example: steve at the office

# compiling the regular expression:
keyword = re.compile(r"\b[aA-zZ]\b")
     for word in User_input:
         User_Input_list.append(word?)

# going by thin put example input I'd want
# User_Input_list["steve", "at" , "the" , "office"] 

I'm unsure how to split the input up into separate words? 我不确定如何将输入分成多个单词? I will give cookies for help! 我会给饼干寻求帮助!

User_Input_list = User_input.split()

The easiest solution is probably to use split : 最简单的解决方案可能是使用split

>>> "steve at the office".split()
['steve', 'at', 'the', 'office']

But this won't remove punctuation, which may or may not be a problem for you: 但这不会消除标点符号,这可能对您造成或可能不会造成问题:

>>> "steve at the office.".split()
['steve', 'at', 'the', 'office.']

You could use re.split() to only pluck out letters: 您可以使用re.split()仅提取字母:

>>> re.split('\W+', 'steve at the office.')
['steve', 'at', 'the', 'office', '']

But as you can see above you might end up with empty entries to deal with, and things worse when you have more subtle punctuation: 但是正如您在上面看到的那样,您可能最终会得到空的条目要处理,而当您使用更细微的标点符号时,情况会更糟:

>>> re.split("\W+", "steve isn't at the office.")
['steve', 'isn', 't', 'at', 'the', 'office', '']

So you could do some work here to pick a better regular expression, but you'll need to decide how you want to handle text like steve isn't at the 'the office' . 因此,您可以在此处进行一些工作,以选择更好的正则表达式,但是您需要确定如何处理文本,例如steve isn't at the 'the office'

So to select the right solution for you, you'll have to think about what input you'll get and what output you want. 因此,要为您选择正确的解决方案,您必须考虑将要获得的输入和所需的输出。

Basicaly, Basicaly,

you should do this: 你应该做这个:

User_Input_list = User_input.split(' ')

and that's it... 就是这样...

User_input = raw_input("Please enter a search criterion: ")
User_Input_list = User_input.split(" ")

see: 看到:

http://docs.python.org/library/stdtypes.html http://docs.python.org/library/stdtypes.html

请执行下列操作

User_input = raw_input("Please enter a search criterion: ")

User_Input_list = User_input.split()

You found re already, there is a nice example of splitting a string: 您已经找到了,有一个很好的拆分字符串的示例:

re.split('\W+', 'Words, words, words.')

Like this you get all words, all punctuation removed. 这样,您将删除所有单词,删除所有标点符号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM