[英]Matching each element in a list to each element in a list of a list, using Python
我試圖找到使用 Python 將列表中的每個元素與列表列表中的每個元素進行匹配的最有效方法。 例如輸入:
>>>myList = [['hi', 'no', 'bye', 'Global', 24],['morning', 'X', 'place'],['so', 'large', 'mall','test'], ['hi', 'X', 'place', 'bye']]
>>>check_against_myLIst = ['bye','place','hi','australia']
現在我不確定最好的方法是使用 map 函數、for 循環還是任何其他 python 數據分析方法。
輸出需要轉換為數據幀,例如 Output 。
Index Value Result
0 ['hi', 'no', 'bye', 'Global', 24] True
1 ['morning', 'X', 'place'] True
2 ['so', 'large', 'mall','test'] False
3 ['hi', 'X', 'place', 'bye'] True
謝謝 !
您可以將結果存儲在列表中並從中構造一個數據框
result = [False]*len(myList)
for n, _list in enumerate(myList):
if [i for i in _list if i in check_against_myLIst]:
result[n] = True
您可以創建一個函數來測試列表集和比較列表之間的交集。
給定的
import pandas as pd
cmp = ["bye","place","hi","australia"]
lst = [
["hi", "no", "bye", "Global", 24],
["morning", "X", "place"],
["so", "large", "mall","test"],
["hi", "X", "place", "bye"]
]
代碼
def is_in(nested, compare):
"""Return a tuple of (row, bool), True if the compared list intersects."""
compared = set(compare)
return [(x, bool(set(x) & compared)) for x in nested]
bool_lst = is_in(lst, cmp)
bool_lst
輸出
[(['hi', 'no', 'bye', 'Global', 24], True),
(['morning', 'X', 'place'], True),
(['so', 'large', 'mall', 'test'], False),
(['hi', 'X', 'place', 'bye'], True)]
它看起來與您的輸出相似。 從這里,我們只需要制作一個 DataFrame:
df = pd.DataFrame(bool_lst, columns=["Value", "Result"])
df.rename_axis("Index")
輸出
后者都可以簡化為一行:
pd.DataFrame([(x, bool(set(x) & set(cmp))) for x in lst], columns=["Value", "Result"]).rename_axis("Index")
首先,從myList
創建集合,用於 O(1) 成員資格測試。
>>> myList = [['hi', 'no', 'bye', 'Global', 24],['morning', 'X', 'place'],['so', 'large', 'mall','test'], ['hi', 'X', 'place', 'bye']]
>>> checks = ['bye','place','hi','australia'] # renamed from check_against_myLIst
>>> sets = map(set, myList)
采用高效的any
檢查,以找出是否有元素checks
是在一個給定的。 (與計算交集相反, any
是惰性的。)
>>> result = [(lst, any(s in set_ for s in check)) for lst, set_ in zip(myList, sets)]
構建數據框。
>>> df = pd.DataFrame(result, columns=['Value', 'Result'])
>>> df.index.name = 'Index'
df
Value Result
Index
0 [hi, no, bye, Global, 24] True
1 [morning, X, place] True
2 [so, large, mall, test] False
3 [hi, X, place, bye] True
在這里,您有關於如何在不使用熊貓的情況下獲得它的方法的示例。 無論如何,讓我解釋另一個觀點:
# list
my_list = (['hi', 'no', 'bye', 'Global', 24],['morning', 'X', 'place'],['so', 'large', 'mall','test','TESSTTTT'], ['hi', 'X', 'place', 'bye'])
# check list
check_against_myLIst = ('bye','place','hi','australia')
# Function to find intersection of two arrays
def interSection(index, arr1,arr2):
result = 'false'
output = list(filter(lambda x: x in arr1, arr2))
if output:
result = 'true'
print 'index',"\t"*1,'Value',"\t"*6,'Result'
print index,"\t"*1,arr1,"\t"*4,result
print ''
# Driver program
if __name__ == "__main__":
count = 0
for arrayItem in my_list:
interSection(count, arrayItem,check_against_myLIst)
count += 1
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.