如何合並兩個列表？保留相同的列表元素以進行集合操作

Question

我一直在畫維恩圖，編碼循環和不同的集合（symmetrical_differences，並集，交集，isdisjoint），在一兩天的大部分時間里按行號枚舉，試圖找出如何在代碼中實現這一點。

a = [1, 2, 2, 3] # <-------------|
b = [1, 2, 3, 3, 4] # <----------| Do not need to be in order.
result = [1, 2, 2, 3, 3, 4] # <--|

要么：

A = [1,'d','d',3,'x','y']
B = [1,'d',3,3,'z']
result =  [1,'d','d',3,3,'x','y','z']

編輯：

不嘗試做a + b = [1、2、2、2、2、3、3、3、4]

嘗試做類似的事情：

a - b = [2]

b - a = [3，4]

a ∩ b = [1,2,3]

所以

[a - b] + [b - a] + a ∩ b = [ [a - b] + [b - a] + a ∩ b ]？

我不確定在這里。

我有兩個電子表格，每個電子表格都有幾千行。 我想按列類型比較兩個電子表格。

我已經從每一列中創建了要比較/合並的列表。

def returnLineList(fn):
    with open(fn,'r') as f:
        lines = f.readlines()
    line_list = []
    for line in lines:
        line = line.split('\t')
        line_list.append(line)
    return line_list

def returnHeaderIndexDictionary(titles):
    tmp_dict = {}
    for x in titles:
        tmp_dict.update({x:titles.index(x)})
    return tmp_dict

def returnColumn(index, l):
    column = []
    for row in l:
        column.append(row[index])
    return column

def enumList(column):
    tmp_list = []
    for row, item in enumerate(column):
        tmp_list.append([row,item])
    return tmp_list

def compareAndMergeEnumerated(L1,L2):
    less = []
    more = []
    same = []
    for row1,item1 in enumerate(L1):
        for row2,item2 in enumerate(L2):
            if item1 in item2:
                count1 = L1.count(item1)
                count2 = L2.count(item2)
                dif = count1 - count2
                if dif != 0:
                    if dif < 0:
                        less.append(["dif:"+str(dif),[item1,row1],[item2,row2]])
                    if dif > 0:
                        more.append(["dif:"+str(dif),[item1,row1],[item2,row2]])
                else:
                    same.append(["dif:"+str(dif),[item1,row1],[item2,row2]])
                break
    return less,more,same,len(less+more+same),len(L1),len(L2)

def main():
    unsorted_lines = returnLineList('unsorted.csv')
    manifested_lines = returnLineList('manifested.csv')

    indexU = returnHeaderIndexDictionary(unsorted_lines[0])
    indexM = returnHeaderIndexDictionary(manifested_lines[0])

    u_j_column = returnColumn(indexU['jnumber'],unsorted_lines)
    m_j_column = returnColumn(indexM['jnumber'],manifested_lines)

    print(compareAndMergeEnumerated(u_j_column,m_j_column))

if __name__ == '__main__':
    main()

更新：

from collections import OrderedDict
A = [1,'d','d',3,'x','y']
B = [1,'d',3,3,'z']
M = A + B
R = [1,'d','d',3,3,'x','y','z']


ACount = {}
AL = lambda x: ACount.update({str(x):A.count(x)})
[AL(x) for x in A]

BCount = {}
BL = lambda x: BCount.update({str(x):B.count(x)})
[BL(x) for x in B]

MCount = {}
ML = lambda x: MCount.update({str(x):M.count(x)})
[ML(x) for x in M]


RCount = {}
RL = lambda x: RCount.update({str(x):R.count(x)})
[RL(x) for x in R]


print('^sym_difAB',set(A) ^ set(B)) # set(A).symmetric_difference(set(B))
print('^sym_difBA',set(B) ^ set(A)) # set(A).symmetric_difference(set(B))
print('|union    ',set(A) | set(B)) # set(A).union(set(B))
print('&intersect',set(A) & set(B)) # set(A).intersection(set(B))
print('-dif AB   ',set(A) - set(B)) # set(A).difference(set(B))
print('-dif BA   ',set(B) - set(A)) 
print('<=subsetAB',set(A) <= set(B)) # set(A).issubset(set(B))
print('<=subsetBA',set(B) <= set(A)) # set(B).issubset(set(A))
print('>=supsetAB',set(A) >= set(B)) # set(A).issuperset(set(B))
print('>=supsetBA',set(B) >= set(A)) # set(B).issuperset(set(A))

print(sorted(A + [x for x in (set(A) ^ set(B))]))
#[1, 3, 'd', 'd', 'x', 'x', 'y', 'y', 'z']

print(sorted(B + [x for x in (set(A) ^ set(B))]))
#[1, 3, 3, 'd', 'x', 'y', 'z', 'z']
cA = lambda y: A.count(y)
cB = lambda y: B.count(y)
cM = lambda y: M.count(y)
cR = lambda y: R.count(y)
print(sorted([[y,cA(y)] for y in (set(A) ^ set(B))]))
#[['x', 1], ['y', 1], ['z', 0]]

print(sorted([[y,cB(y)] for y in (set(A) ^ set(B))]))
#[['x', 0], ['y', 0], ['z', 1]]

print(sorted([[y,cA(y)] for y in A]))
print(sorted([[y,cB(y)] for y in B]))
print(sorted([[y,cM(y)] for y in M]))
print(sorted([[y,cR(y)] for y in R]))
#[[1, 1], [3, 1], ['d', 2], ['d', 2], ['x', 1], ['y', 1]]
#[[1, 1], [3, 2], [3, 2], ['d', 1], ['z', 1]]
#[[1, 2], [1, 2], [3, 3], [3, 3], [3, 3], ['d', 3], ['d', 3], ['d', 3], ['x', 1], ['y', 1], ['z', 1]]
#[[1, 1], [3, 2], [3, 2], ['d', 2], ['d', 2], ['x', 1], ['y', 1], ['z', 1]]

cAL = sorted([[y,cA(y)] for y in A])

在此處輸入圖片說明

更新：2

基本上我認為是時候該學習了：

熊貓

它看起來像是聚合，分組和求和的組合。

Answer 1

不需要學習熊貓了！ （盡管它是一個非常出色的庫。）我不確定我是否能完全理解您的問題，但是我不確定該collections.Counter數據類型被設計為袋/多集。 實現的運算符之一是您可能需要的“或”。 閱讀此代碼示例中的注釋，看它是否符合您的需求：

a = [1, 2, 2, 3]
b = [1, 2, 3, 3, 4]

from collections import Counter

# A Counter data type counts the elements fed to it and holds
# them in a dict-like type.

a_counts = Counter(a) # {1: 1, 2: 2, 3: 1}
b_counts = Counter(b) # {1: 1, 2: 1, 3: 2, 4: 1}

# The union of two Counter types is the max of each value
# in the (key, value) pairs in each Counter. Similar to
# {(key, max(a_counts[key], b_counts[key])) for key in ...}

result_counts = a_counts | b_counts

# Return an iterator over the keys repeating each as many times as its count.

result = list(result_counts.elements())

# Result:
# [1, 2, 2, 3, 3, 4]

Answer 2

因此，您要問如何刪除重復的元素並保留唯一的元素？ 您肯定需要為此設置：

當你這樣說：

(a - b) + (b - a)

你想要的是這個

set(a) ^ set(b)

這是兩者的對稱差異。

如果您的元素是列表，則將無法對它們進行散列（set元素的先決條件），因此您需要將它們轉換為元組：

set(tuple(i) for i in a) ^ set(tuple(i) for i in b)

編輯

現在，您已經編輯了問題，您似乎正在尋找以下內容：

(a - b) + (b - a) + a ∩ b

這是兩個集合的並集（假設您用+表示集合的並集，否則將意味着交集，這將是空集，並且這種歧義是集合不支持+運算符的原因）：

set(tuple(i) for i in a) | set(tuple(i) for i in b)

上面的代碼使用就位函數union返回與my_set的最終結果等效的結果：

my_set = set(tuple(i) for i in a) 
my_set.union(tuple(i) for i in b)

Answer 3

經過進一步的審查（現在我已經回到家，並嘗試使用Python解釋器），我了解到您要嘗試執行的操作，但這與刪除重復項的標題相矛盾。 我看到您正在將每個其他元素視為一個新的索引唯一項。

這在概念上類似於修飾，排序，未修飾的模式，只是用“連接”或“設置操作”代替術語“排序”。

所以這是一個設置，首先導入itertools以便我們可以將每個相似的元素分組並將它們枚舉為一組：

import itertools

def indexed_set(a_list):
    '''
    assuming given a sorted list, 
    groupby like items, 
    and index from 0 for each group
    return a set of tuples with like items and their index for set operations
    '''
    return set((like, like_index) for _like, like_iter in itertools.groupby(a_list)
                          for like_index, like in enumerate(like_iter))

稍后，我們需要將帶有索引的集合轉換回列表：

def remove_index_return_list(an_indexed_set):
    '''
    given a set of two-length tuples (or other iterables)
    drop the index and 
    return a sorted list of the items 
    (sorted by str() for comparison of mixed types)
    '''
    return sorted((item for item, _like_index in an_indexed_set), key=str)

最后，我們需要我們的數據（取自您提供的數據）：

a = [1, 2, 2, 3] 
b = [1, 2, 3, 3, 4] 
expected_result = [1, 2, 2, 3, 3, 4]

這是我建議的用法：

a_indexed = indexed_set(a)
b_indexed = indexed_set(b)
actual_result = remove_index_return_list(a_indexed | b_indexed)

assert expected_result == actual_result

不會引發AssertionError，並且

print(actual_result)

打印：

[1, 2, 2, 3, 3, 4]

編輯：由於我使函數處理混合大小寫，所以我想演示：

c = [1,'d','d',3,'x','y']
d = [1,'d',3,3,'z']
expected_result =  [1,'d','d',3,3,'x','y','z']
c_indexed = indexed_set(c)
d_indexed = indexed_set(d)
actual_result = remove_index_return_list(c_indexed | d_indexed)
assert actual_result == expected_result

而且我們看到的並沒有我們所期望的完全一樣，但是由於排序的原因，結果非常接近：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AssertionError
>>> actual_result
[1, 3, 3, 'd', 'd', 'x', 'y', 'z']
>>> expected_result
[1, 'd', 'd', 3, 3, 'x', 'y', 'z']

Answer 4

我認為問題陳述中的測試用例還不夠，例如，假設

a = [1,2,2,3,2,2,3] b = [1,2,2,3,3,4,3,3,5]

我們應該將兩者合並為[1、2、2、2、2、3、3、4、3、3、5]還是[1、2、2、3、3、4、5]？ 這肯定會改變您要實現的算法。

如何合並兩個列表？保留相同的列表元素以進行集合操作

問題描述

編輯：

更新：

更新：2

4 個解決方案

解決方案1
4 2014-05-31 00:29:08

解決方案2
1 2014-05-28 23:38:35

解決方案3
1 已采納 2014-05-29 03:05:51

解決方案4
0 2014-05-29 00:45:53

如何合並兩個列表？ 保留相同的列表元素以進行集合操作

問題描述

編輯：

更新：

更新：2

4 個解決方案

解決方案1 4 2014-05-31 00:29:08

解決方案2 1 2014-05-28 23:38:35

解決方案3 1 已采納 2014-05-29 03:05:51

解決方案4 0 2014-05-29 00:45:53

如何合並兩個列表？保留相同的列表元素以進行集合操作

解決方案1
4 2014-05-31 00:29:08

解決方案2
1 2014-05-28 23:38:35

解決方案3
1 已采納 2014-05-29 03:05:51

解決方案4
0 2014-05-29 00:45:53