[英]How to remove a string of a certain length from a nested list?
我有一个嵌套的字符串列表,由不同长度的列表组成的语料库 。 我只想保留长度大于2的字符串。
关于如何从嵌套列表中删除元素的类似问题? 我尝试了所有可以让我指出条件长度> 2的答案。
corpus = list(r_corpus('teeny.txt'))
print('initial corpus here ',corpus)
#Current attempt
[[ subelt for subelt in elt if len(subelt) >2 ] for elt in corpus]
#previous attempt 1
##for thing in corpus:
## [y for y in thing if len(y)>2]
#previous attempt 2
##for sentence in corpus:
## sentence = [x for x in sentence if len(x) > 2 ]
print('\n\n corpus here without any string of length 2 or smaller',corpus)
这是当前尝试的输出,与前两次尝试的输出相同。
初始语料库在这里
[['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'],
['abstract'],
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'],
['this', 'section', 'reviews', 'previous', 'works', 'in'],
['subjectivity', 'detection', 'is'],
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]
长度为2 或更短的任何字符串的主体
[['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'],
['abstract'],
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'],
['this', 'section', 'reviews', 'previous', 'works', 'in'],
['subjectivity', 'detection', 'is'],
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]
不带任何长度为2或更小的字符串的第二版语料库的最快方法:
语料库,没有任何长度为2或更小的字符串
[['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'],
['abstract'],
['this', 'paper', 'presents', 'method', 'for', 'identifying'],
['this', 'section', 'reviews', 'previous', 'works'],
['subjectivity', 'detection'],
['work','similar','ours', 'but', 'different']]
谢谢。
@Vera ,您可以尝试下面的代码。 它使用了诸如列表理解 , lambda函数 , map() , filter等概念。
使用列表理解 , lambda函数 , map() , filter() , reduce()等是一种Python方式,可以更轻松,高效和简洁地解决问题。
您可以检查List comprehension和map(),filter(),reduce(),lambda函数等,以查看与这些概念相关的给定示例并进行解释。
import json
corpus = [['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'],
['abstract'],
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'],
['this', 'section', 'reviews', 'previous', 'works', 'in'],
['subjectivity', 'detection', 'is'],
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]
new_corpus = list( map(lambda words: list(filter(lambda word: len(word)> 2, words)), corpus))
# Pretty printing list of lists of words of length > 2
print(json.dumps(new_corpus, indent=2))
"""
[
[
"extracting",
"opinions"
],
[
"soo",
"min",
"kim",
"and"
],
[
"abstract"
],
[
"this",
"paper",
"presents",
"method",
"for",
"identifying"
],
[
"this",
"section",
"reviews",
"previous",
"works"
],
[
"subjectivity",
"detection"
],
[
"work",
"similar",
"ours",
"but",
"different"
]
]
"""
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.