[英]Python panda search for value in a df from another df
I've got two data frames:-我有两个数据框:-
Df1 Df1
Time V1 V2
02:00 D3F3 0041
02:01 DD34 0040
Df2 Df2
FileName V1 V2
1111.txt D3F3 0041
2222.txt 0000 0040
Basically I want to compare the v1 v2 columns and if they match print the row time from df1 and the row from df2 filename.基本上我想比较 v1 v2 列,如果它们匹配打印来自 df1 的行时间和来自 df2 文件名的行。 So far all i can find is the到目前为止,我能找到的是
isin()伊辛()
, which simply gives you a boolean output. ,它只是给你一个 boolean output。
So the output would be:所以 output 将是:
1111.txt 02:00 1111.txt 02:00
I started using dataframes because i though i could query the two df's on the V1 / V2 values but I can't see a way.我开始使用数据框,因为我虽然可以查询 V1 / V2 值上的两个 df,但我看不到方法。 Any pointers would be much appreciated任何指针将不胜感激
Use merge
on the dataframe columns that you want to have the same values.在您希望具有相同值的 dataframe 列上使用merge
。 You can then drop the rows with NaN values, as those will not have matching values.然后,您可以删除具有 NaN 值的行,因为这些行没有匹配的值。 From there, you can print the merged dataframes values however you see fit.从那里,您可以打印合并的数据帧值,但您认为合适。
df1 = pd.DataFrame({'Time': ['8a', '10p'], 'V1': [1, 2], 'V2': [3, 4]})
df2 = pd.DataFrame({'fn': ['8.txt', '10.txt'], 'V1': [3, 2], 'V2': [3, 4]})
df1.merge(df2, on=['V1', 'V2'], how='outer').dropna()
=== Output: === === Output:===
Time V1 V2 fn
1 10p 2 4 10.txt
The most intuitive solution is: 1) iterate the V1 column in DF1;最直观的解决方案是:1)迭代DF1中的V1列; 2) for each item in this column, check if this item exists in the V1 column of DF2; 2)对于该列中的每一项,检查DF2的V1列中是否存在该项; 3) if the item exists in DF2's V1, then find the index of that item in the DF2 and then you would be able to find the file name. 3)如果该项目存在于DF2的V1中,则在DF2中找到该项目的索引,然后您就可以找到文件名。
You can try using pd.concat
.您可以尝试使用pd.concat
。
On this case it would be like:在这种情况下,它会像:
pd.concat([df1, df2.reindex(df1.index)], axis=1)
It will create a new dataframe with all the values, but in case there are some values that doesn't match in both dataframes, it'll return NaN
.它将使用所有值创建一个新的 dataframe ,但如果两个数据帧中的某些值不匹配,它将返回NaN
。 If you doesn't want this to happen you must use this:如果你不希望这种情况发生,你必须使用这个:
pd.concat([df1, df4], axis=1, join='inner')
If you wanna learn a bit more, use pydata: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html如果您想了解更多信息,请使用 pydata: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
You can use merge option with inner join您可以将合并选项与内部连接一起使用
df2.merge(df1,how="inner",on=["V1","V2"])[["FileName","Time"]]
While I think Eric's solution is more pythonic, if your only aim is to print the rows on which df1 and df2 have v1 and v2 values the same, provided the two dataframes are of the same length, you can do the following:虽然我认为 Eric 的解决方案更 Pythonic,但如果您的唯一目标是打印 df1 和 df2 具有相同 v1 和 v2 值的行,只要两个数据帧的长度相同,您可以执行以下操作:
for row in range(len(df1)):
if (df1.iloc[row,1:] == df2.iloc[row,1:]).all() == True:
print(df1.iloc[row], df2.iloc[row])
Try this:尝试这个:
client = boto3.client('s3')
obj = client.get_object(Bucket='', Key='')
data = obj['Body'].read()
df1 = pd.read_excel(io.BytesIO(data), sheet_name='0')
df2 = pd.read_excel(io.BytesIO(data), sheet_name='1')
head = df2.columns[0]
print(head)
data = df1.iloc[[8],[0]].values[0]
print(data)
print(df2)
df2.columns = df2.iloc[0]
df2 = df2.drop(labels=0, axis=0)
df2['Head'] = head
df2['ID'] = pd.Series([data,data])
print(df2)
df2.to_csv('test.csv',index=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.