简体   繁体   English

Python panda 从另一个df中搜索df中的值

[英]Python panda search for value in a df from another df

I've got two data frames:-我有两个数据框:-

Df1 Df1

Time   V1    V2
02:00  D3F3  0041
02:01  DD34  0040

Df2 Df2

 FileName  V1    V2
   1111.txt  D3F3  0041
   2222.txt  0000  0040

Basically I want to compare the v1 v2 columns and if they match print the row time from df1 and the row from df2 filename.基本上我想比较 v1 v2 列,如果它们匹配打印来自 df1 的行时间和来自 df2 文件名的行。 So far all i can find is the到目前为止,我能找到的是

isin()伊辛()

, which simply gives you a boolean output. ,它只是给你一个 boolean output。

So the output would be:所以 output 将是:

1111.txt 02:00 1111.txt 02:00

I started using dataframes because i though i could query the two df's on the V1 / V2 values but I can't see a way.我开始使用数据框,因为我虽然可以查询 V1 / V2 值上的两个 df,但我看不到方法。 Any pointers would be much appreciated任何指针将不胜感激

Use merge on the dataframe columns that you want to have the same values.在您希望具有相同值的 dataframe 列上使用merge You can then drop the rows with NaN values, as those will not have matching values.然后,您可以删除具有 NaN 值的行,因为这些行没有匹配的值。 From there, you can print the merged dataframes values however you see fit.从那里,您可以打印合并的数据帧值,但您认为合适。

df1 = pd.DataFrame({'Time': ['8a', '10p'], 'V1': [1, 2], 'V2': [3, 4]})
df2 = pd.DataFrame({'fn': ['8.txt', '10.txt'], 'V1': [3, 2], 'V2': [3, 4]})

df1.merge(df2, on=['V1', 'V2'], how='outer').dropna()

=== Output: === === Output:===

  Time  V1  V2      fn
1  10p   2   4  10.txt

The most intuitive solution is: 1) iterate the V1 column in DF1;最直观的解决方案是:1)迭代DF1中的V1列; 2) for each item in this column, check if this item exists in the V1 column of DF2; 2)对于该列中的每一项,检查DF2的V1列中是否存在该项; 3) if the item exists in DF2's V1, then find the index of that item in the DF2 and then you would be able to find the file name. 3)如果该项目存在于DF2的V1中,则在DF2中找到该项目的索引,然后您就可以找到文件名。

You can try using pd.concat .您可以尝试使用pd.concat

On this case it would be like:在这种情况下,它会像:

pd.concat([df1, df2.reindex(df1.index)], axis=1)

It will create a new dataframe with all the values, but in case there are some values that doesn't match in both dataframes, it'll return NaN .它将使用所有值创建一个新的 dataframe ,但如果两个数据帧中的某些值不匹配,它将返回NaN If you doesn't want this to happen you must use this:如果你不希望这种情况发生,你必须使用这个:

pd.concat([df1, df4], axis=1, join='inner')

If you wanna learn a bit more, use pydata: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html如果您想了解更多信息,请使用 pydata: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

You can use merge option with inner join您可以将合并选项与内部连接一起使用

    df2.merge(df1,how="inner",on=["V1","V2"])[["FileName","Time"]]

While I think Eric's solution is more pythonic, if your only aim is to print the rows on which df1 and df2 have v1 and v2 values the same, provided the two dataframes are of the same length, you can do the following:虽然我认为 Eric 的解决方案更 Pythonic,但如果您的唯一目标是打印 df1 和 df2 具有相同 v1 和 v2 值的行,只要两个数据帧的长度相同,您可以执行以下操作:

for row in range(len(df1)):
    if (df1.iloc[row,1:] == df2.iloc[row,1:]).all() == True: 
        print(df1.iloc[row], df2.iloc[row])

Try this:尝试这个:

client = boto3.client('s3')

obj = client.get_object(Bucket='', Key='')
data = obj['Body'].read()
df1 = pd.read_excel(io.BytesIO(data), sheet_name='0')
df2 = pd.read_excel(io.BytesIO(data), sheet_name='1')

head = df2.columns[0]
print(head)

data = df1.iloc[[8],[0]].values[0]
print(data)

print(df2)
df2.columns = df2.iloc[0]
df2 = df2.drop(labels=0, axis=0)
df2['Head'] = head
df2['ID'] = pd.Series([data,data])

print(df2)
df2.to_csv('test.csv',index=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM