Need to use > or < to compare dataframe column in pandas

Question

I am trying to compare two values in one DataFrame, Z_Score_Raw:

    ENST00000547849 ENST00000587894
0   -1.3099506  21.56600492

to the numbers that correspond to the ENST headers in another DataFrame, Increased_Bio:

    ENST00000547849High_Avg ENST00000587894 High_Avg     
                                                       ENST00000547849 Low_Avg ENST00000587894 Low_Avg
    0.0026421609368421000   -0.0457525087368421     
                                                        -0.040015074588235300   -0.04140853107142860

So, Basically I need to compare the ENST00000547849 containing -1.3099506 with both the high avg and the low avg of ENST00000547849, as well as with the ENST00000587894 column. If the high_avg < Z_Score_Raw then I must return 1, if it is > Z_Score raw I return 0.

How can this be done? The main part is comparing the one number to both of the scores and then returning a number after.

here's what I'm working with so far:

 for x in Z_score_raw:
     Z_Score_List.append(list(Z_score_raw[x]))

 for x in single_z_score:
     for i in range(Z_Score_List):
         print(single_z_score[x].item())
         if (single_z_score[x].item() < Z_Score_List[i]):
             df_new[x+'avg'] = 1
         elif(single_z_score[x].item() > Z_Score_List[i]):
             df_new[x+'avg'] = 0

Answer 1

This code sets the data up as a single tidy dataframe and then calculates the operation in a vectorized way:

import numpy as np
import pandas as pd

Z_Score_Raw = pd.DataFrame({"ENST00000547849": [-1.3099506],
                            "ENST00000587894": [21.56600492]})

Increased_Bio = pd.DataFrame({"ENST00000547849 High_Avg": [0.0026421609368421000],
                              "ENST00000587894 High_Avg": [-0.0457525087368421],
                              "ENST00000547849 Low_Avg": [-0.040015074588235300],
                              "ENST00000587894 Low_Avg": [-0.04140853107142860]})

tidy_data = (
    # Tidying the data
    Increased_Bio.melt()
    .assign(id=lambda x: x['variable'].str.split(' ', expand=True)[0],
            variable=lambda x: x['variable'].str.split(' ', expand=True)[1])
    .pivot(index="id", columns="variable", values="value").reset_index()
    .merge(Z_Score_Raw.melt(var_name="id", value_name="raw_score"))
    # Calculating the values of interest
    .assign(
        out_of_range=lambda x: np.where((x["raw_score"] > x["low_avg"]) & (x["raw_score"] < x["high_avg"]), 0, 1)))

The output will look like this:

                id  High_Avg   Low_Avg  raw_score  out_of_range
0  ENST00000547849  0.002642 -0.040015  -1.309951             1
1  ENST00000587894 -0.045753 -0.041409  21.566005             1

Need to use > or < to compare dataframe column in pandas

Question

1 answers

solution1
0 ACCPTED 2021-06-22 21:00:53

Need to use > or < to compare dataframe column in pandas

Question

1 answers

solution1 0 ACCPTED 2021-06-22 21:00:53

solution1
0 ACCPTED 2021-06-22 21:00:53