简体   繁体   English

比较列表中的两个二维列表,并打印不同的行。 但没有一列

[英]compare two 2d list in list, and print different rows. But without one column

the following:以下:

gnucashumsaetze = [
 ['2020-11-27', 'Essen', '4.53'],
 ['2020-11-27', 'Essen', '10.67'],
 ['2020-11-30', 'Essen', '4.80'],
 ['2020-11-30', 'Lebensmittel', '2.78'],
 ['2020-11-30', 'Essen', '2.31'],
 ['2020-11-30', 'Kosmetik', '5.58'],
 ['2020-12-01', 'Essen', '11.23'],
]

onlineumsaetze = [
['2020-11-27', 'EDEKA ERNST HAUPTBAHNH  / MUENCHEN', '4.53']
['2020-11-27', 'Netto Marken-Discount  / Ingolstadt', '10.67']
['2020-11-30', 'MUELLER GMBH & CO.KG  / NUERNBERG', '4.80']
['2020-11-30', 'Netto Marken-Discount  / Frankfurt', '2.31']
['2020-11-30', 'Rossmann 2380  / Ingolstadt', '5.58']
['2020-11-30', 'ALIEXPRESS.COM  / Luxembourg', '22.46']
['2020-12-01', 'EDEKA BRAUN  / INGOLSTADT', '11.23']
['2020-12-02', 'EDEKA ERNST HAUPTBAHNH  / MUENCHEN', '7.03']
]

i would like to compare two 2d lists and output the different.我想比较两个二维列表和 output 的不同之处。 But the second column (row[1]) should not be compared.但不应比较第二列 (row[1])。 Like this:像这样:

['2020-11-30', 'ALIEXPRESS.COM  / Luxembourg', '22.46']
['2020-12-01', 'EDEKA BRAUN  / INGOLSTADT', '11.23']
['2020-12-02', 'EDEKA ERNST HAUPTBAHNH  / MUENCHEN', '7.03']

what I have already tried is this;我已经尝试过的是这个; unfortunately a catastrophe:不幸的是一场灾难:

fehlende_rows = (set((row[0] for row in onlineumsaetze),(row[2] for row in onlineumsaetze)) - set((row[0] for row in gnucashumsaetze),(row[2] for row in gnucashumsaetze)))
print(fehlende_rows)

I find it really helpful to write out the full loop first, and then condense it down to a list-comprehension if possible.我发现首先写出完整的循环非常有帮助,然后如果可能的话将其浓缩为一个列表理解。

Probably the best way to do this would be to iterate over gnucashumsaetze and create a string->set dictionary that has dates as keys and the numbers as elements of the set.可能最好的方法是遍历gnucashumsaetze并创建一个字符串->集合字典,其中日期作为键,数字作为集合的元素。

gnucashumsaetze_dict = {}
for g in gnucashumsaetze:
    date, val = g[0], g[2]
    # Maybe you want to do val = float(g[2]) instead?
    if date not in gnucashumsaetze_dict:
        gnucashumsaetze_dict[date] = set()
    gnucashumsaetze_dict[date].add(val)

gnucashumsaetze_dict is now: gnucashumsaetze_dict现在是:

{'2020-11-27': {'10.67', '4.53'},
 '2020-11-30': {'2.31', '2.78', '4.80', '5.58'},
 '2020-12-01': {'11.23'}}

Then, iterate over each row in onlineumsaetze , and append it to the new list only if the required condition is satisfied.然后,仅当满足所需条件时,迭代onlineumsaetze和 append 中的每一行到新列表。

new_onlineumsaetze = []
for o in onlineumsaetze:
    date, val = o[0], o[2]
    # if date is not in gnucashumsaetze_dict, return empty set
    vals = gnucashumsaetze_dict.get(date, set()) 
    if val not in vals:
        new_onlineumsaetze.append(o)

new_onlineumsaetze is now: new_onlineumsaetze现在是:

[['2020-11-30', 'ALIEXPRESS.COM  / Luxembourg', '22.46'],
 ['2020-12-02', 'EDEKA ERNST HAUPTBAHNH  / MUENCHEN', '7.03']]

The ['2020-12-01', 'EDEKA BRAUN / INGOLSTADT', '11.23'] row is skipped because gnucashumsaetze has an entry for ['2020-12-01', 'Essen', '11.23'] ['2020-12-01', 'EDEKA BRAUN / INGOLSTADT', '11.23']行被跳过,因为gnucashumsaetze['2020-12-01', 'Essen', '11.23']的条目

Now that you've written it as a regular for-loop, it's easier to condense this down to a list-comprehension.现在您已经将其编写为常规的 for 循环,将其压缩为列表理解会更容易。

new_onlineumsaetze = [o for o in onlineumsaetze if o[2] not in gnucashumsaetze_dict.get(o[0], set())]

For Solving this I will use list comprehension为了解决这个问题,我将使用列表理解

first create two sets using only column0 and column2首先仅使用 column0 和 column2 创建两个集合

gnucashumsaetze_set = set([(row[0], row[2]) for row in gnucashumsaetze])
onlineumsaetze_set = set([(row[0], row[2]) for row in onlineumsaetze])

Then we get the difference of this two sets然后我们得到这两组的差异

diff_ = onlineumsaetze_set.difference(gnucashumsaetze_set)

for the final result we look for the rows in onlineumsaetze that matches in column0 and column2 with the data we got.对于最终结果,我们在 onlineumsaetze 中查找与 column0 和 column2 中的数据匹配的行。

res = [row for row in onlineumsaetze if (row[0], row[2]) in diff_]

print(res)

the result结果

[['2020-11-30', 'ALIEXPRESS.COM  / Luxembourg', '22.46'], ['2020-12-02', 'EDEKA ERNST HAUPTBAHNH  / MUENCHEN', '7.03']]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM