简体   繁体   中英

Derive Python Dataframe based on condition on Multiple columns

i am New to Python, any lead how to proceed would be very much helpful. thank you in advance I have a dataframe like this as input

sec pos J   NJ  SJ  Key           order 
S   5               202301201413    3
    5   1   0   1   202301201413    4
            0       202301201413    5
            0   1   202301201413    6
E   5   1   1   1   202301201413    7
    N       0       202301201413    8
S   3       1   1   202301201413    11
                1   202301201413    12
E   3   1   0   1   202301201413    13
S   4       1   1   202301201425    1
E   4   1   1   0   202301201425    2
S   2   0   1   1   202301201425    4
E   2   0           202301201425    5
S   5   1   1   1   202301201425    7
        0           202301201425    8
E   5   1   1       202301201425    9
    N   1   1   1   202301201436    1
S   2       1       202301201436    2
            1   1   202301201436    3
E   2   1   1   1   202301201436    4
S   3       1       202301201436    5
E   3   1   1       202301201436    6

description about the above table

  1. Sec column has 'S' - start and 'E' - end of testing
  2. Pos coumnn has position for which testing is done
  3. J, NJ, SJ columns have output for different parameter
  4. Key column has ID for which testing is done.

0 is Fault in above table and 1 is Good

I need to transform this data some thing like below table

Key             J   NJ  SJ  N   2   3   4   5
202301201413    1   0   1   2   -   2   -   2
202301201425    0   1   0   -   1   -   3   1
202301201436    1   1   1   0   0   0   -   -
  1. if there is any 0 for the key in J, NJ, SJ in input table it has to reflected here in output table column J, NJ, SJ

  2. Column [5~10] ie {N,1,2,3,4,5} are pos values in input table which need to be mapped with values based on below condition

    NO 0 and atleast 1 is available 0 J is 0 1 NJ is 0 2 SJ is 0 3 J+NJ is 0 4 NJ+SJ is 0 5 SJ+J is 0 6 ALL are 0 7 ALL Null 8

You could do something like the following:

def format_result(ser):
    if np.all(np.isnan(ser)):
        return np.nan

    if np.any(ser == 0):
        return 0
    else:
        return 1

def level_result(ser):
    if np.all(np.isnan(ser)):
        return 8
    elif np.all(ser != 0):
        return 0
    
    J, NJ, SJ = ser[["J", "NJ", "SJ"]] == 0
    idx = J + 2* NJ + 4* SJ
    idx = {3:4, 4:3, 5:6, 6:5}.get(idx, idx)

    return idx

other_columns = np.sort(df.pos.dropna().unique())

(
    df
    .groupby(["Key", "pos"])
    .agg({
        "J": format_result,
        "NJ": format_result,
        "SJ": format_result,
    })
    .assign(result=lambda df: df.apply(level_result, axis=1))
    .reset_index()
    .set_index(["Key", "J", "NJ", "SJ", "pos"])
    .unstack(level=-1)
    .droplevel(0, axis=1)
    .reset_index()
    .groupby("Key").agg({
        "J": format_result,
        "NJ": format_result,
        "SJ": format_result,
        **{k: "max" for k in other_columns}
    })
)
|          Key |   J |   NJ |   SJ |   2 |   3 |   4 |   5 |   N |
|-------------:|----:|-----:|-----:|----:|----:|----:|----:|----:|
| 202301201413 |   1 |    0 |    1 | nan |   2 | nan |   2 |   2 |
| 202301201425 |   0 |    1 |    0 |   1 | nan |   3 |   0 | nan |
| 202301201436 |   1 |    1 |    1 |   0 |   0 | nan | nan |   0 |

It's admittedly not the most performant, but it's late where I am located and I can't think of a shorter solution just now.

Also note that your column with index [202301201425, 5] is different compared to your example. For this group, the only values are [1, nan] in which case you would expect 0 according to your formula; however your example has a 1 there.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM