如何将 2 个 Excel 列使用 DataFrame 然后 output 与另一个 ZC1D81AF5835844B4EZDC 文件进行比较？

Question

I have this Excel file .我有这个 Excel 文件。 Here is the screenshot.这是屏幕截图。

I want to compare the dataset column with unique-pitch column, and then put the output to the Excel file again.我想将dataset列与unique-pitch列进行比较，然后将 output 再次放入 Excel 文件。 The comparison is in this scenario:比较是在这种情况下：

Search for intersection (data match between dataset column with unique-pitch column).搜索交集（数据dataset列与unique-pitch列之间的数据匹配）。
Search for data existed in dataset that is not existed in unique-pitch (difference 1).搜索存在于dataset但不存在于unique-pitch （差异 1）中的数据。
Search for data not existed in dataset that is existed in unique-pitch (difference 2).搜索存在于unique-pitch中的dataset不存在的数据（差异 2）。

I am using row no.我正在使用行号。 0 for this example, and the rule used in this comparison is same throughout the data.此示例中为 0，并且此比较中使用的规则在整个数据中都是相同的。

dataset = [0, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58]
unique-pitch = [0, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64]

# this is the expected output
Scenario 1 result = [0, 54, 55, 56, 57, 58]
Length of Scenario 1 result = 6

Scenario 2 result = [46, 47, 48, 49, 50, 51, 52, 53]
Length of Scenario 2 result = 8

Scenario 3 result = [59, 60, 61, 62, 63, 64]
Length of Scenario 3 result = 6

From what I know now, I can read the Excel file using DataFrame and find the values of 3 scenario.据我所知，我可以使用DataFrame读取 Excel 文件并找到 3 个场景的值。

import pandas as pd
import ast

df = pd.read_excel (r'C:\Users\014_twinkle_twinkle 300 0.0001 dataframe - python.xlsx')
datasets = df['dataset'].tolist()
unique_pitches = df['unique-pitch'].tolist()

i = 0
for dataset in datasets:
    print("Iteration:", i+1)
    dataset = ast.literal_eval(dataset)
    unique_pitch = ast.literal_eval(unique_pitches[i])

    # scenario 1
    scenario1_data = list(set(dataset) & set(unique_pitch))
    scenario1_len = len(scenario1_data)

    # scenario 2
    scenario2_data = list(set(dataset) - set(unique_pitch))
    scenario2_len = len(scenario2_data)

    # scenario 3
    scenario3_data =  list(set(unique_pitch) - set(dataset))
    scenario3_len = len(scenario3_data)
    
    print("Intersection\t\t: ", scenario1_data)
    print("Len Intersection\t: ", scenario1_len)
    print("Difference 1\t\t: ", scenario2_data)
    print("Len difference 1\t: ", scenario2_len)
    print("Difference 2\t\t: ", scenario3_data)
    print("Len difference 2\t: ", scenario3_len)
    print("-"*100)
    i += 1

# how to put those 6 new variables to df?

# to change df to excel
df.to_excel()

In my Excel output, I am expecting this kind of result.在我的 Excel output 中，我期待这种结果。

My question is: how to read and compare the data on each column from DataFrame df , then produce the expected result to an Excel file?我的问题是：如何从DataFrame df读取和比较每一列的数据，然后将预期结果生成到 Excel 文件中？ I read on some other post on Stack Overflow that I should not iterate the DataFrame per row because it is a slow process.我在 Stack Overflow 上的其他帖子上读到，我不应该每行迭代 DataFrame，因为这是一个缓慢的过程。

Answer 1

To start I think it is generally a good idea to first make your code work, and then research faster methods.首先，我认为首先让你的代码工作，然后研究更快的方法通常是一个好主意。

For scenario 1:对于场景 1：

intersection = []
for value in dataset:
    if value in unique_pitch:
        intersection.append(value)
print(intersection)
print(len(intersection))

Scenario 2:场景二：

not_in_unique_pitch = []
for value in dataset:
    if value not in unique_pitch:
        not_in_unique_pitch.append(value)
print(not_in_unique_pitch)
print(len(not_in_unique_pitch))

I know you already fixed scenario 3 but if you want it in the same way:我知道你已经修复了场景 3，但如果你想要它以同样的方式：

not_in_dataset = []
for value in unique_pitch:
    if value not in dataset:
        not_in_dataset.append(value)
print(not_in_dataset)
print(len(not_in_dataset))

Edit answer to your question:编辑您的问题的答案：

import pandas as pd
import ast

df = pd.read_excel('your.xlsx')
datasets = df['dataset'].tolist()
unique_pitches = df['unique_pitch'].tolist()

i = 0
for dataset in datasets:
    print("Iteration:", i+1)
    dataset = ast.literal_eval(dataset)
    unique_pitch = ast.literal_eval(unique_pitches[i])

    # scenario 1
    print(list(set(dataset) & set(unique_pitch)))
    print(len(list(set(dataset) & set(unique_pitch))))

    # scenario 2
    print(list(set(dataset) - set(unique_pitch)))
    print(len(list(set(dataset) - set(unique_pitch))))

    # scenario 3
    print(list(set(unique_pitch) - set(dataset)))
    print(len(list(set(unique_pitch) - set(dataset))))
    i += 1

After edited question: With save to a excel (.xlsx):编辑后的问题：保存到 excel (.xlsx)：

import pandas as pd
import ast

df = pd.read_excel('your.xlsx')
datasets = df['dataset'].tolist()
unique_pitches = df['unique_pitch'].tolist()

i = 0
scenario1_data = []
scenario2_data = []
scenario3_data = []
scenario1_len = []
scenario2_len = []
scenario3_len = []
for dataset in datasets:
    print("Iteration:", i+1)
    dataset = ast.literal_eval(dataset)
    unique_pitch = ast.literal_eval(unique_pitches[i])

    # scenario 1
    scenario1_data.append(list(set(dataset) & set(unique_pitch)))
    scenario1_len.append(len(scenario1_data[i]))

    # scenario 2
    scenario2_data.append(list(set(dataset) - set(unique_pitch)))
    scenario2_len.append(len(scenario2_data[i]))

    # scenario 3
    scenario3_data.append(list(set(unique_pitch) - set(dataset)))
    scenario3_len.append(len(scenario3_data[i]))
    
    i += 1

df['scenario 1 data'] = scenario1_data
df['scenario 2 data'] = scenario2_data
df['scenario 3 data'] = scenario3_data

df['len scenario 1 data'] = scenario1_len
df['len scenario 2 data'] = scenario2_len
df['len scenario 3 data'] = scenario3_len

df.to_excel('output.xlsx')

如何将 2 个 Excel 列使用 DataFrame 然后 output 与另一个 ZC1D81AF5835844B4EZDC 文件进行比较？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-20 10:15:31

如何将 2 个 Excel 列使用 DataFrame 然后 output 与另一个 ZC1D81AF5835844B4EZDC 文件进行比较？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-20 10:15:31

解决方案1
1 已采纳 2021-04-20 10:15:31