繁体   English   中英

从另一个列表中存在的两个列表中查找2个项目的Python方法

[英]Pythonic way to find 2 items from two lists existing in a another list

我有一些twitter数据,我将文本分为优雅和Python形式的带有快乐表情和悲伤表情的文本,如下所示:

happy_set = [":)",":-)","=)",":D",":-D","=D"]
sad_set = [":(",":-(","=("]

happy = [tweet.split() for tweet in data for face in happy_set if face in tweet]
sad = [tweet.split() for tweet in data for face in sad_set if face in tweet]

但是,这happy_set ,可能是在单个推文中同时找到了来自happy_setsad_set的图释。 确保happy清单仅包含来自happy_set释,反之亦然的happy_set什么?

您可以尝试使用集合,特别是set.isdisjoint 检查快乐鸣叫中的令牌集是否与sad_set不相交。 如果是这样,它肯定属于happy

happy_set = set([":)",":-)","=)",":D",":-D","=D"])
sad_set = set([":(",":-(","=("])

# happy is your existing set of potentially happy tweets. To remove any tweets with sad tokens...
happy = [tweet for tweet in happy if sad_set.isdisjoint(set(tweet.split()))]

我会使用lambdas:

>>> is_happy = lambda tweet: any(map(lambda x: x in happy_set, tweet.split()))
>>> is_sad = lambda tweet: any(map(lambda x: x in sad_set, tweet.split()))

>>> data = ["Hi, I am sad :( but don't worry =D", "Happy day :-)", "Boooh :-("]
>>> filter(lambda tweet: is_happy(tweet) and not is_sad(tweet), data)
['Happy day :-)']
>>> filter(lambda tweet: is_sad(tweet) and not is_happy(tweet), data)
['Boooh :-(']

这样可以避免创建data中间副本。

而且,如果data确实很大,则可以从itertools包中的ifilter替换filter以获取迭代器而不是列表。

您在找吗?

happy_set = set([":)",":-)","=)",":D",":-D","=D"])
sad_set = set([":(",":-(","=("])

happy_maybe_sad = [tweet.split() for tweet in data for face in happy_set if face in tweet]
sad_maybe_happy = [tweet.split() for tweet in data for face in sad_set if face in tweet]

happy = [item for item in happy_maybe_sad if not in sad_maybe_happy]
sad = [item for item in sad_maybe_happy if not in happy_maybe_sad]

为了happy...sad... ,我坚持使用列表解决方案,因为该项目的订单可能是相关的。 如果不是,最好使用set()来提高性能。 是加法,集合已提供基本集合操作 (联合,交集等)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM