[英]How to compare 2 Excel columns using DataFrame then output it to another Excel file?
I have this Excel file .我有这个 Excel 文件。 Here is the screenshot.
这是屏幕截图。
I want to compare the dataset
column with unique-pitch
column, and then put the output to the Excel file again.我想将
dataset
列与unique-pitch
列进行比较,然后将 output 再次放入 Excel 文件。 The comparison is in this scenario:比较是在这种情况下:
dataset
column with unique-pitch
column).dataset
列与unique-pitch
列之间的数据匹配)。dataset
that is not existed in unique-pitch
(difference 1).dataset
但不存在于unique-pitch
(差异 1)中的数据。dataset
that is existed in unique-pitch
(difference 2).unique-pitch
中的dataset
不存在的数据(差异 2)。 I am using row no.我正在使用行号。 0 for this example, and the rule used in this comparison is same throughout the data.
此示例中为 0,并且此比较中使用的规则在整个数据中都是相同的。
dataset = [0, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58]
unique-pitch = [0, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64]
# this is the expected output
Scenario 1 result = [0, 54, 55, 56, 57, 58]
Length of Scenario 1 result = 6
Scenario 2 result = [46, 47, 48, 49, 50, 51, 52, 53]
Length of Scenario 2 result = 8
Scenario 3 result = [59, 60, 61, 62, 63, 64]
Length of Scenario 3 result = 6
From what I know now, I can read the Excel file using DataFrame and find the values of 3 scenario.据我所知,我可以使用DataFrame读取 Excel 文件并找到 3 个场景的值。
import pandas as pd
import ast
df = pd.read_excel (r'C:\Users\014_twinkle_twinkle 300 0.0001 dataframe - python.xlsx')
datasets = df['dataset'].tolist()
unique_pitches = df['unique-pitch'].tolist()
i = 0
for dataset in datasets:
print("Iteration:", i+1)
dataset = ast.literal_eval(dataset)
unique_pitch = ast.literal_eval(unique_pitches[i])
# scenario 1
scenario1_data = list(set(dataset) & set(unique_pitch))
scenario1_len = len(scenario1_data)
# scenario 2
scenario2_data = list(set(dataset) - set(unique_pitch))
scenario2_len = len(scenario2_data)
# scenario 3
scenario3_data = list(set(unique_pitch) - set(dataset))
scenario3_len = len(scenario3_data)
print("Intersection\t\t: ", scenario1_data)
print("Len Intersection\t: ", scenario1_len)
print("Difference 1\t\t: ", scenario2_data)
print("Len difference 1\t: ", scenario2_len)
print("Difference 2\t\t: ", scenario3_data)
print("Len difference 2\t: ", scenario3_len)
print("-"*100)
i += 1
# how to put those 6 new variables to df?
# to change df to excel
df.to_excel()
In my Excel output, I am expecting this kind of result.在我的 Excel output 中,我期待这种结果。
My question is: how to read and compare the data on each column from DataFrame df
, then produce the expected result to an Excel file?我的问题是:如何从DataFrame
df
读取和比较每一列的数据,然后将预期结果生成到 Excel 文件中? I read on some other post on Stack Overflow that I should not iterate the DataFrame per row because it is a slow process.我在 Stack Overflow 上的其他帖子上读到,我不应该每行迭代 DataFrame,因为这是一个缓慢的过程。
To start I think it is generally a good idea to first make your code work, and then research faster methods.首先,我认为首先让你的代码工作,然后研究更快的方法通常是一个好主意。
For scenario 1:对于场景 1:
intersection = []
for value in dataset:
if value in unique_pitch:
intersection.append(value)
print(intersection)
print(len(intersection))
Scenario 2:场景二:
not_in_unique_pitch = []
for value in dataset:
if value not in unique_pitch:
not_in_unique_pitch.append(value)
print(not_in_unique_pitch)
print(len(not_in_unique_pitch))
I know you already fixed scenario 3 but if you want it in the same way:我知道你已经修复了场景 3,但如果你想要它以同样的方式:
not_in_dataset = []
for value in unique_pitch:
if value not in dataset:
not_in_dataset.append(value)
print(not_in_dataset)
print(len(not_in_dataset))
Edit answer to your question:编辑您的问题的答案:
import pandas as pd
import ast
df = pd.read_excel('your.xlsx')
datasets = df['dataset'].tolist()
unique_pitches = df['unique_pitch'].tolist()
i = 0
for dataset in datasets:
print("Iteration:", i+1)
dataset = ast.literal_eval(dataset)
unique_pitch = ast.literal_eval(unique_pitches[i])
# scenario 1
print(list(set(dataset) & set(unique_pitch)))
print(len(list(set(dataset) & set(unique_pitch))))
# scenario 2
print(list(set(dataset) - set(unique_pitch)))
print(len(list(set(dataset) - set(unique_pitch))))
# scenario 3
print(list(set(unique_pitch) - set(dataset)))
print(len(list(set(unique_pitch) - set(dataset))))
i += 1
After edited question: With save to a excel (.xlsx):编辑后的问题:保存到 excel (.xlsx):
import pandas as pd
import ast
df = pd.read_excel('your.xlsx')
datasets = df['dataset'].tolist()
unique_pitches = df['unique_pitch'].tolist()
i = 0
scenario1_data = []
scenario2_data = []
scenario3_data = []
scenario1_len = []
scenario2_len = []
scenario3_len = []
for dataset in datasets:
print("Iteration:", i+1)
dataset = ast.literal_eval(dataset)
unique_pitch = ast.literal_eval(unique_pitches[i])
# scenario 1
scenario1_data.append(list(set(dataset) & set(unique_pitch)))
scenario1_len.append(len(scenario1_data[i]))
# scenario 2
scenario2_data.append(list(set(dataset) - set(unique_pitch)))
scenario2_len.append(len(scenario2_data[i]))
# scenario 3
scenario3_data.append(list(set(unique_pitch) - set(dataset)))
scenario3_len.append(len(scenario3_data[i]))
i += 1
df['scenario 1 data'] = scenario1_data
df['scenario 2 data'] = scenario2_data
df['scenario 3 data'] = scenario3_data
df['len scenario 1 data'] = scenario1_len
df['len scenario 2 data'] = scenario2_len
df['len scenario 3 data'] = scenario3_len
df.to_excel('output.xlsx')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.