简体   繁体   中英

How to correctly loop through range and list to create Pandas Dataframe?

I have function get_differences which output is dictionary and looks like below. Numbers in this case are not relevant, it is just example of the output generated by this function:

get_differences(column = 'column_A', percent = 10)

{'Feature': 'column_A',
 'Pos_obs_10%': -0.98,
 'Pos_obs_target': 1,
 'Pos_obs_-10%': -1.23}

To get Pandas Dataframe with all columns I were doing like this:

full_output = []

for col in df.columns: 
    output = get_differences(column = col, percent = 10) 
    full_output.append(output) 

df_output = pd.DataFrame(full_output)

By executing this code my results looks like this:

     Feature       Pos_obs_-10% Pos_obs_target  Pos_obs_10%
0   column_A       -0.98         -1.96         -0.98
1   column_B       -0.23          0.00          0.55
2   column_C        1.55         -2.94          4.90
3   column_D       -0.23          0.98         -0.98

Which is also correct. But I would like to get results from this function in Pandas Dataframe for every column and range of percent. For example for 10, 50 and 100%.

My desired output is:

     Feature   Pos_obs_-100$  Pos_obs_-50  Pos_obs_-10%  Pos_obs_target  Pos_obs_10%  Pos_obs_50%  Pos_obs_100%
0   column_A       -0.98         -1.96         -0.98       -0.98         -1.96         -0.98           -0.98
1   column_B       -0.23          0.00          0.55       -0.98         -1.96         -0.98           -0.98
2   column_C        1.55         -2.94          4.90       -0.98         -1.96         -0.98           -0.98
3   column_D       -0.23          0.98         -0.98       -0.98         -1.96         -0.98           -0.98

Numbers here are also random just to show example output.When I tried loop like this:

percentage = range(1,5)
full_output_acrylamide = []

for n in percentage:
    for col in df.columns:
         output = get_differences(column = col, percent = n) 
         full_output.append(output) 

df_output = pd.DataFrame(full_output)
         

I got a lot of NaN in DataFrame and columns were repeating, something like this:

Feature            Pos_obs_-100$  Pos_obs_-50  Pos_obs_-10%  Pos_obs_target  Pos_obs_10%  Pos_obs_50%  Pos_obs_100%
    0   column_A       0.00           NaN          -0.98       -0.98         -1.96         NaN             -0.98
    1   column_B       -2.96          NaN           0.55       -0.98         -1.96         NaN             -0.98
    2   column_C       0.00           NaN           4.90       -0.98         -1.96         NaN             -0.98
    3   column_D       -0.98          NaN          -0.98       -0.98         -1.96         NaN             -0.98
    4   column_A       -0.98          -0.12         NaN        -0.98         NaN           -0.98           -0.98
    5   column_B       -0.23          0.55          NaN        -0.98         NaN           -0.98           -0.98
    6   column_C        1.55           4.90         NaN        -0.98         NaN           -0.98           -0.98
    7   column_D       -0.23          -0.98         NaN        -0.98         NaN           -0.98           -0.98

Create DataFrame in inner loop, append to another list and last use concat :

percentage = range(1,5)
dfs = []
for n in percentage:
    L = []
    for col in df.columns:
         output = get_differences(column = col, percent = n) 
         L.append(output) 
    dfs.append(pd.DataFrame(L))
    
df_output = pd.concat(dfs, axis=1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM