如何在Python列表中将单词反令牌化回原始形式

Question

libOfSentences = ["Get help with the display",
                 "Display is not working properly", "I need some help"]
#removing stopwords

for i in libOfSentences:
     sentence = word_tokenize(j) #tokenize each individual word
     sentence = filter(lambda x: x not in string.punctuation, sentence) 
     cleaned_text = filter(lambda x: x not in stop_words, sentence) 

     removedStopwordsList = " ".join(cleaned_text)

removedStopwordsList has now joined the sentences back together but I want to keep it in a list. removedStopwordsList现在将句子重新组合在一起，但我想将其保留在列表中。 The desired output is like this: 所需的输出是这样的：

["Get help display", "Display not working properly", "I need some help"]

I want to have removedStopwordsList still be a list I can loop through for example 我想removedStopwordsList仍然是我可以循环浏览的列表

removedStopwordsList[0]

gives me 给我

"G D I"

right now but I want removedStopwordsList[0] 现在，但是我想removedStopwordsList[0]

to output 输出

"Get help display"

The join function is what is stopping this from occurring right now but I can't find a better workaround. 加入功能是阻止这种情况立即发生的方法，但是我找不到更好的解决方法。

Answer 1

I want to have removedStopwordsList still be a list 我想删除StopwordsList仍然是列表

Then just make it a list instead of making it a string: 然后，仅使其成为列表，而不是使其成为字符串：

removedStopwordsList = list(cleaned_text)

Although you can do this even more simply by using a list comprehension instead of calling filter : 尽管您可以使用列表理解而不是调用filter来更简单地执行此操作：

removedStopwordsList = [x for x in sentence if x not in stop_words]

map and filter are great when you have a function you want to call on each element, but when you have an arbitrary expression, which you have to wrap up in lambda to turn into a function call, it's simpler and more readable to just use a list comprehension or generator expression. 当您具有要在每个元素上调用的函数时， map和filter很棒，但是当您具有任意表达式（必须将其包装在lambda才能转换为函数调用）时，仅使用a就更简单易读列出理解或生成器表达式。

And you can similarly simplify the previous line. 您可以类似地简化上一行。 So: 所以：

for i in libOfSentences:
    sentence = word_tokenize(j) #tokenize each individual word
    sentence = (x for x in sentence if x not in string.punctuation)
    removedStopwordsList = [x for x in sentence if x not in stop_words]

If you need to have the joined-up string around as well, that's fine; 如果您还需要连接字符串，那很好。 you can have a second variable: 您可以有另一个变量：

removedStopwordsString = " ".join(removedStopwordsList)

If you really want a single object that can behave both ways, it wouldn't be hard to write such a class, but it would just be ugly. 如果您真的想要一个可以同时运行的对象，那么编写这样的类并不难，但是这很丑陋。 And under the covers, it's just going to have a self.list_of_words and self.joined_string that it delegates to anyway. 而且在幕后，它只会拥有一个self.list_of_words和self.joined_string委托给它。 So, what would be the point? 那么，有什么意义呢？

At any rate, I doubt you need to keep the string around. 无论如何，我怀疑您是否需要保留字符串。 If you ever want to print it out, you can just join it on the fly: 如果您想打印出来，可以随时join ：

print(" ".join(removedStopwordsList))

… or even expand it into separate printables: …甚至将其扩展为单独的可打印内容：

print(*removeStopwordsList)

If you're trying to gather all of those lists into one big list, you have to actually write code to do that. 如果您试图将所有这些列表收集到一个大列表中，则必须实际编写代码来做到这一点。 Obviously if you do removeStopwordsList = <anything> each time through the loop, you're just replacing it each time through. 显然，如果您在循环中每次都执行removeStopwordsList = <anything> ，则每次都将其替换。 You need to append that to some bigger list if you want to keep all those lists around. 如果要保留所有这些列表，则需要将其append到更大的列表中。 For example: 例如：

listOfLists = []
for i in libOfSentences:
    sentence = word_tokenize(j) #tokenize each individual word
    sentence = (x for x in sentence if x not in string.punctuation)
    removedStopwordsList = [x for x in sentence if x not in stop_words]
    listOfLists.append(removedStopwordsList)

And now, if you print out listOfLists , it'll be a list of two lists of words; 现在，如果您打印出listOfLists ，它将是两个单词列表的列表； listOfLists[0] will be the first list; listOfLists[0]将是第一个列表； listOfLists[0][0] will b the first word of the first list; listOfLists[0][0]将成为第一个列表的第一个单词； etc. 等等

如何在Python列表中将单词反令牌化回原始形式

问题描述

1 个解决方案

解决方案1
1 2018-06-21 21:53:58

如何在Python列表中将单词反令牌化回原始形式

问题描述

1 个解决方案

解决方案1 1 2018-06-21 21:53:58

解决方案1
1 2018-06-21 21:53:58