[英]how to match two values in the same column of a pandas dataframe
I have two data frames as below 我有以下两个数据框
PD 106352 00253 01-02-2018 0.73
PD 108181 00253 20-12-2017 13.91
PD 108222 00253 01-08-2017 -2,227.50
PD 108224 00253 01-08-2017 -4,455.00
PD 108848 00253 25-07-2017 -2,342.13
PD 108852 00253 25-06-2018 1,764.16
PD 108860 00253 12-07-2017 -3,144.81
PD 108871 00253 01-07-2017 -144.17
PD 109455 00253 01-07-2017 -271.25
PD 109472 00253 04-07-2017 -389.00
and 和
PV 73006 00253 01-09-2017 16,956.25
PV 73006 00253 01-09-2017 2,227.50
PV 73006 00253 01-09-2017 2,227.50
PV 75499 00253 01-07-2017 30,351.42
PV 75645 00253 03-07-2017 34,468.29
PV 82899 00253 12-12-2017 2,342.40
I tried making a list of of the fifth column of both dataframes, compare them, if match found take out the index and used loc to set the result column.but no successfull. 我尝试制作两个数据帧第五列的列表,将它们进行比较,如果找到匹配项,则取出索引并使用loc设置结果列。但是没有成功。
I want to compare the 5th column of both dataframes and match the absolute value ignoring the sign and if 1:1 match found i want to add a column and comment it as nill and if 1:n matches found i want to comment only 1:1 out of them as nill and leave others in the result column as blank 我想比较两个数据帧的第五列,并忽略符号匹配绝对值,如果找到1:1匹配,我想添加一列并将其注释为nill,如果找到1:n匹配,我只想注释1:其中1个为nill,结果栏中的其他空白
I want to do something like below 我想做下面的事情
PD 108222 00253 01-08-2017 -2,227.50 Nill
PV 73006 00253 01-09-2017 2,227.50 Nill
PV 73006 00253 01-09-2017 2,227.50
Please look at the below code, this is something which i could come up really quick, i think it should solve your problem. 请看下面的代码,这是我可以很快提出的,我认为它应该可以解决您的问题。
import pandas as pd
#creating data
data_a = pd.read_csv('data_a.csv', sep=',', header=None)
data_a[4]=data_a[4].abs()
data_b = pd.read_csv('data_b.csv', sep=',', header=None)
#converting to list
a=data_a[4].tolist()
b=data_b[4].tolist()
#Removing duplicates and preserving the order so you get 1:1 and not 1:N
b1=[el for i, el in enumerate(b) if el not in b[:i]]
#getting indices of matching values in tow datasets
abc=[i for i, item in enumerate(a) if item in b1]
deg=[i for i, item in enumerate(b1) if item in a]
#Creating blank new column
data_a[5]=''
data_b[5]=''
#Filling matching locations with Nill
data_a.iloc[abc,5] = 'Nill'
data_b.iloc[deg,5] = 'Nill'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.