简体   繁体   中英

Cartesian product of two dataframe in python

I have two dataframe and how can I remove the similar row from the cartesian dataset

 **DF1:**
    Index    Name
    0       xyz
    1       abc
    2       def


    **DF2:**
    Index    Name
    0       xyz
    1       abc
    2       xyz


    **Expected Output**
    (0,0),**(0,2)**
    (1,1)

I want to combine only the indexes whose Name column is same however I don't want to display the repeated combination. That is, when I do cartesian index (0,2) and (2,0) will give me same result. So I want to show only one row.

Updated:

I already have a cartesian dataframe as input which is (0,0),(0,2),(1,1),(2,0)

What I want is, from this input dataframe I want to remove the duplicate (2,0). And I have around 100 rows in the dataframe, so want to loop through as well.

Assuming df1 and df2 have a single column "Name" and that "Index" is the index, and that you want a list of tuples with the matching indexes, as appear in the question, you can do:

df1 = pd.DataFrame({'Name': ['xyz', 'abc', 'def']})
df2 = pd.DataFrame({'Name': ['xyz', 'abc', 'xyz']})
df3 = df1.reset_index().merge(df2.reset_index(), on='Name', how='inner')
list_of_tuples = [tuple(item) for item in df3[['index_x', 'index_y']].values]
list_of_tuples 
# OUTPUT: [(0, 0), (0, 2), (1, 1)]

And if "Index" is a column name, just drop the reset_index() commands.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM