简体   繁体   English

如何将 2 个 Excel 列使用 DataFrame 然后 output 与另一个 ZC1D81AF5835844B4EZDC 文件进行比较?

[英]How to compare 2 Excel columns using DataFrame then output it to another Excel file?

I have this Excel file .我有这个 Excel 文件 Here is the screenshot.这是屏幕截图。 Excel文件

I want to compare the dataset column with unique-pitch column, and then put the output to the Excel file again.我想将dataset列与unique-pitch列进行比较,然后将 output 再次放入 Excel 文件。 The comparison is in this scenario:比较是在这种情况下:

  1. Search for intersection (data match between dataset column with unique-pitch column).搜索交集(数据dataset列与unique-pitch列之间的数据匹配)。
  2. Search for data existed in dataset that is not existed in unique-pitch (difference 1).搜索存在于dataset但不存在于unique-pitch (差异 1)中的数据。
  3. Search for data not existed in dataset that is existed in unique-pitch (difference 2).搜索存在于unique-pitch中的dataset不存在的数据(差异 2)。

I am using row no.我正在使用行号。 0 for this example, and the rule used in this comparison is same throughout the data.此示例中为 0,并且此比较中使用的规则在整个数据中都是相同的。

dataset = [0, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58]
unique-pitch = [0, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64]

# this is the expected output
Scenario 1 result = [0, 54, 55, 56, 57, 58]
Length of Scenario 1 result = 6

Scenario 2 result = [46, 47, 48, 49, 50, 51, 52, 53]
Length of Scenario 2 result = 8

Scenario 3 result = [59, 60, 61, 62, 63, 64]
Length of Scenario 3 result = 6

From what I know now, I can read the Excel file using DataFrame and find the values of 3 scenario.据我所知,我可以使用DataFrame读取 Excel 文件并找到 3 个场景的值。

import pandas as pd
import ast

df = pd.read_excel (r'C:\Users\014_twinkle_twinkle 300 0.0001 dataframe - python.xlsx')
datasets = df['dataset'].tolist()
unique_pitches = df['unique-pitch'].tolist()

i = 0
for dataset in datasets:
    print("Iteration:", i+1)
    dataset = ast.literal_eval(dataset)
    unique_pitch = ast.literal_eval(unique_pitches[i])

    # scenario 1
    scenario1_data = list(set(dataset) & set(unique_pitch))
    scenario1_len = len(scenario1_data)

    # scenario 2
    scenario2_data = list(set(dataset) - set(unique_pitch))
    scenario2_len = len(scenario2_data)

    # scenario 3
    scenario3_data =  list(set(unique_pitch) - set(dataset))
    scenario3_len = len(scenario3_data)
    
    print("Intersection\t\t: ", scenario1_data)
    print("Len Intersection\t: ", scenario1_len)
    print("Difference 1\t\t: ", scenario2_data)
    print("Len difference 1\t: ", scenario2_len)
    print("Difference 2\t\t: ", scenario3_data)
    print("Len difference 2\t: ", scenario3_len)
    print("-"*100)
    i += 1

# how to put those 6 new variables to df?

# to change df to excel
df.to_excel()

In my Excel output, I am expecting this kind of result.在我的 Excel output 中,我期待这种结果。 期望的输出

My question is: how to read and compare the data on each column from DataFrame df , then produce the expected result to an Excel file?我的问题是:如何从DataFrame df读取和比较每一列的数据,然后将预期结果生成到 Excel 文件中? I read on some other post on Stack Overflow that I should not iterate the DataFrame per row because it is a slow process.我在 Stack Overflow 上的其他帖子上读到,我不应该每行迭代 DataFrame,因为这是一个缓慢的过程。

To start I think it is generally a good idea to first make your code work, and then research faster methods.首先,我认为首先让你的代码工作,然后研究更快的方法通常是一个好主意。

For scenario 1:对于场景 1:

intersection = []
for value in dataset:
    if value in unique_pitch:
        intersection.append(value)
print(intersection)
print(len(intersection))

Scenario 2:场景二:

not_in_unique_pitch = []
for value in dataset:
    if value not in unique_pitch:
        not_in_unique_pitch.append(value)
print(not_in_unique_pitch)
print(len(not_in_unique_pitch))

I know you already fixed scenario 3 but if you want it in the same way:我知道你已经修复了场景 3,但如果你想要它以同样的方式:

not_in_dataset = []
for value in unique_pitch:
    if value not in dataset:
        not_in_dataset.append(value)
print(not_in_dataset)
print(len(not_in_dataset))

Edit answer to your question:编辑您的问题的答案:

import pandas as pd
import ast

df = pd.read_excel('your.xlsx')
datasets = df['dataset'].tolist()
unique_pitches = df['unique_pitch'].tolist()

i = 0
for dataset in datasets:
    print("Iteration:", i+1)
    dataset = ast.literal_eval(dataset)
    unique_pitch = ast.literal_eval(unique_pitches[i])

    # scenario 1
    print(list(set(dataset) & set(unique_pitch)))
    print(len(list(set(dataset) & set(unique_pitch))))

    # scenario 2
    print(list(set(dataset) - set(unique_pitch)))
    print(len(list(set(dataset) - set(unique_pitch))))

    # scenario 3
    print(list(set(unique_pitch) - set(dataset)))
    print(len(list(set(unique_pitch) - set(dataset))))
    i += 1

After edited question: With save to a excel (.xlsx):编辑后的问题:保存到 excel (.xlsx):

import pandas as pd
import ast

df = pd.read_excel('your.xlsx')
datasets = df['dataset'].tolist()
unique_pitches = df['unique_pitch'].tolist()

i = 0
scenario1_data = []
scenario2_data = []
scenario3_data = []
scenario1_len = []
scenario2_len = []
scenario3_len = []
for dataset in datasets:
    print("Iteration:", i+1)
    dataset = ast.literal_eval(dataset)
    unique_pitch = ast.literal_eval(unique_pitches[i])

    # scenario 1
    scenario1_data.append(list(set(dataset) & set(unique_pitch)))
    scenario1_len.append(len(scenario1_data[i]))

    # scenario 2
    scenario2_data.append(list(set(dataset) - set(unique_pitch)))
    scenario2_len.append(len(scenario2_data[i]))

    # scenario 3
    scenario3_data.append(list(set(unique_pitch) - set(dataset)))
    scenario3_len.append(len(scenario3_data[i]))
    
    i += 1

df['scenario 1 data'] = scenario1_data
df['scenario 2 data'] = scenario2_data
df['scenario 3 data'] = scenario3_data

df['len scenario 1 data'] = scenario1_len
df['len scenario 2 data'] = scenario2_len
df['len scenario 3 data'] = scenario3_len

df.to_excel('output.xlsx')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 openpyxl 将一个 excel 文件的列值与 Python 中另一个 excel 文件的列值进行比较? - How to compare column values of one excel file to the column values of another excel file in Python using openpyxl? 如何将 output dataframe 值转换为 Excel 文件? [Python] - How to output dataframe values to an Excel file? [Python] 使用 python 比较 excel 中的 2 列 - Compare 2 columns in excel using python 如何使用python将excel文件的所有行和列组合到另一个excel文件的单个单元格中? - How to combine all rows and columns of an excel file into a single cell of another excel file using python? 如何使用 DataFrame 编辑 Excel 文件并将其另存为 Excel 文件? - How to edit Excel file using DataFrame and save it back as Excel file? 筛选一个Excel文件并将结果输出到另一个Excel - Filter an Excel file and output the result into another Excel 使用 python 使用 excel 文件初始化 dataframe 列 - Initialize dataframe columns using an excel file using python 使用 pandas 和 output 将 Excel 表中的列取消隐藏到 Z6A8064B5DF4794555700553 - Unhide columns in an Excel sheet with pandas and output into a dataframe 如何将数据框信息的输出保存到Excel或文本文件 - how to save output from dataframe info to file a excel or text file 如何使用 Python 比较 Excel 中的多个列? - How do you compare multiple columns in Excel using Python?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM