简体   繁体   中英

Python panda search for value in a df from another df

I've got two data frames:-

Df1

Time   V1    V2
02:00  D3F3  0041
02:01  DD34  0040

Df2

 FileName  V1    V2
   1111.txt  D3F3  0041
   2222.txt  0000  0040

Basically I want to compare the v1 v2 columns and if they match print the row time from df1 and the row from df2 filename. So far all i can find is the

isin()

, which simply gives you a boolean output.

So the output would be:

1111.txt 02:00

I started using dataframes because i though i could query the two df's on the V1 / V2 values but I can't see a way. Any pointers would be much appreciated

Use merge on the dataframe columns that you want to have the same values. You can then drop the rows with NaN values, as those will not have matching values. From there, you can print the merged dataframes values however you see fit.

df1 = pd.DataFrame({'Time': ['8a', '10p'], 'V1': [1, 2], 'V2': [3, 4]})
df2 = pd.DataFrame({'fn': ['8.txt', '10.txt'], 'V1': [3, 2], 'V2': [3, 4]})

df1.merge(df2, on=['V1', 'V2'], how='outer').dropna()

=== Output: ===

  Time  V1  V2      fn
1  10p   2   4  10.txt

The most intuitive solution is: 1) iterate the V1 column in DF1; 2) for each item in this column, check if this item exists in the V1 column of DF2; 3) if the item exists in DF2's V1, then find the index of that item in the DF2 and then you would be able to find the file name.

You can try using pd.concat .

On this case it would be like:

pd.concat([df1, df2.reindex(df1.index)], axis=1)

It will create a new dataframe with all the values, but in case there are some values that doesn't match in both dataframes, it'll return NaN . If you doesn't want this to happen you must use this:

pd.concat([df1, df4], axis=1, join='inner')

If you wanna learn a bit more, use pydata: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html

You can use merge option with inner join

    df2.merge(df1,how="inner",on=["V1","V2"])[["FileName","Time"]]

While I think Eric's solution is more pythonic, if your only aim is to print the rows on which df1 and df2 have v1 and v2 values the same, provided the two dataframes are of the same length, you can do the following:

for row in range(len(df1)):
    if (df1.iloc[row,1:] == df2.iloc[row,1:]).all() == True: 
        print(df1.iloc[row], df2.iloc[row])

Try this:

client = boto3.client('s3')

obj = client.get_object(Bucket='', Key='')
data = obj['Body'].read()
df1 = pd.read_excel(io.BytesIO(data), sheet_name='0')
df2 = pd.read_excel(io.BytesIO(data), sheet_name='1')

head = df2.columns[0]
print(head)

data = df1.iloc[[8],[0]].values[0]
print(data)

print(df2)
df2.columns = df2.iloc[0]
df2 = df2.drop(labels=0, axis=0)
df2['Head'] = head
df2['ID'] = pd.Series([data,data])

print(df2)
df2.to_csv('test.csv',index=False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM