简体   繁体   中英

How do I compare and merge three pandas Data Frames?

A Little bit of Background:

I have three DOORS Modules (A, B, & C) that trace to each other like so:

A --> B
A --> C

B --> C
B <-- A

C <-- A
C <-- B

I can easily capture this 'tracing' by exporting out the ID's of other modules that the current module traces to. For example, A's exported table might look like so:

# A Table

|   A   |   B   |   C   |
=========================
|  A_1  |  B_1  |  C_1  |
-------------------------
|  A_2  |       |  C_3  |
-------------------------
|  A_3  |  B_4  |       |
|       |  B_5  |       |
-------------------------

While B and C would look like this:

# B Table                       # C Table

|   A   |   B   |   C   |       |   A   |   B   |   C   |
=========================       =========================
|  A_1  |  B_1  |  C_1  |       |  A_1  |  B_1  |  C_1  |
-------------------------       -------------------------
|       |  B_2  |  C_3  |       |  A_2  |       |  C_3  |
-------------------------       |  A_4  |  B_2  |       |
|  A_3  |  B_4  |       |       -------------------------
-------------------------       
|  A_3  |  B_5  |       |       
-------------------------       

Because the tracing between modules might not be complete, I'm looking to find "gaps" in the tables. For example, A might trace to C and B might trace to C but not to each other.

The problem:

I've been able to capture into a Python DataFrames each table. I'm looking to do two things:

  1. Identify missing traces:

    For example, Table A's A_2 has a trace to C_3. Table B's B_2 has a trace to C_3. However, A_2 and B_2 are not traced to each other. This is a missing trace.

  2. Merge these results into a single Data Frame instead of three.

I think the most difficult part of your task is to define what a missing link is. You might want to devote some time in order to assess various possible configurations since it's not really so straightforward as it might seem (or, on the contrary, it might be pretty simple).

For instance, if table A contains A1,B1, B contains B1,C1, and C contains A1,C1, then how many missing link are here? or none at all? how would it differ if any table contained A1,B1,C1?

Another example: [A1,B1], [B1,C2], [B2,C2]. How many missing links are here?

You can easily make many other not so simply to answer examples.

And when you rigorously define what a missing link is, you can create (perhaps, easily) an algorithm of finding them in your tables, no matter how are they structured: in 3 tables or just in one, which can be formed with a join, append or side-to-side concatenation from original tables.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM