繁体   English   中英

如何从嵌套列表中删除一定长度的字符串?

[英]How to remove a string of a certain length from a nested list?

我有一个嵌套的字符串列表,由不同长度的列表组成的语料库 我只想保留长度大于2的字符串。

关于如何从嵌套列表中删除元素的类似问题 我尝试了所有可以让我指出条件长度> 2的答案。

corpus = list(r_corpus('teeny.txt'))
print('initial corpus here ',corpus)

#Current attempt
[[ subelt for subelt in elt if len(subelt) >2 ] for elt in corpus] 

#previous attempt 1
##for thing in corpus:
##    [y for y in thing if len(y)>2]

#previous attempt 2
##for sentence in corpus:
##    sentence = [x for x in sentence if len(x) > 2 ]

print('\n\n corpus here without any string of length 2 or smaller',corpus)

这是当前尝试的输出,与前两次尝试的输出相同。

初始语料库在这里

[['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'],
['abstract'],
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'], 
['this', 'section', 'reviews', 'previous', 'works', 'in'], 
['subjectivity', 'detection', 'is'], 
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]

长度为2 或更短的任何字符串的主体

[['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'], 
['abstract'], 
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'], 
['this', 'section', 'reviews', 'previous', 'works', 'in'], 
['subjectivity', 'detection', 'is'], 
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]

我需要的

不带任何长度为2或更小的字符串的第二版语料库的最快方法:

语料库,没有任何长度为2或更小的字符串

[['extracting', 'opinions'], 
['soo', 'min', 'kim', 'and'], 
['abstract'], 
['this', 'paper', 'presents', 'method', 'for', 'identifying'], 
['this', 'section', 'reviews', 'previous', 'works'],
['subjectivity', 'detection'],
['work','similar','ours', 'but', 'different']]

谢谢。

@Vera ,您可以尝试下面的代码。 它使用了诸如列表理解lambda函数map()filter等概念。

使用列表理解lambda函数map()filter()reduce()等是一种Python方式,可以更轻松,高效和简洁地解决问题。

您可以检查List comprehensionmap(),filter(),reduce(),lambda函数等,以查看与这些概念相关的给定示例并进行解释。

import json

corpus = [['extracting', 'opinions'],
['soo', 'min', 'kim', 'and'],
['abstract'],
['this', 'paper', 'presents', 'method', 'for', 'identifying', 'an'], 
['this', 'section', 'reviews', 'previous', 'works', 'in'], 
['subjectivity', 'detection', 'is'], 
['work', 'is', 'similar', 'to', 'ours', 'but', 'different']]

new_corpus = list( map(lambda words: list(filter(lambda word: len(word)> 2, words)), corpus))

# Pretty printing list of lists of words of length > 2
print(json.dumps(new_corpus, indent=2))

"""
[
  [
    "extracting",
    "opinions"
  ],
  [
    "soo",
    "min",
    "kim",
    "and"
  ],
  [
    "abstract"
  ],
  [
    "this",
    "paper",
    "presents",
    "method",
    "for",
    "identifying"
  ],
 [
    "this",
    "section",
    "reviews",
    "previous",
    "works"
  ],
  [
    "subjectivity",
    "detection"
  ],
  [
    "work",
    "similar",
    "ours",
    "but",
    "different"
  ]
]
"""

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM