简体   繁体   中英

How to create new column based on iteration of existing columns in pandas?

I have a dataframe,

     foo   column1 column2 ..... column9999
0     5      0.8      0.01
1     10     0.9      0.01
2     15     0.2      1.2
3     8      0.12     0.5
4     74     0.78     0.7
.      ...     ...

Based on this existing columns, I want to create new column.
If I go one by one, it would be like this,

df["A1"] = df.foo[df["column1"] > 0.1].rank(ascending=False)
df.A1.fillna(value=0, inplace=True)
df['new_A1'] = (1+df['A1'])
df['log_A1'] = np.log(df.['new_A1'])

But, I don't want to write down all columns(>900 columns).
How can I iterate and create new columns?
Thanks in advance!

Here's a cleaned up version of what I think you are trying to do:

# Include only variables with the "column" stub
cols = [c for c in df.columns if 'column' in c]

for i, c in enumerate(cols):
    a = f"A{i+1}"
    df[a] = 1 + df.loc[df[c] > 0.1, 'foo'].rank(ascending=False)
    df[f'log_{a}'] = np.log(df[a]).fillna(value=0)

I'm assuming that you didn't need the variable new_A# column and was just using it as an intermediate column for the log calculation.

You can iterate through the different column names and perform the +1 and the log operations. When you use df.columns , you then receive a list of the different column headers. So you can do something like this for example:

for index, column in enumerate(df.columns):
  df['new_A' + str(index)] = (1+df[column])
  df['log_A' + str(index)] = np.log(df['new_A' + str(index)])

You can add the rest of the operations too inside the same loop.

Hope it helps

You can just do:

import pandas as pd
import numpy as np


df = pd.read_csv('something.csv')


a = ['A'+str(i) for i in range(1, len(df.columns.values))]
b = [x for x in df.columns.values if x != 'foo']
to_create = list(zip(b, a))
for create in to_create:
    df[create[1]] = df.foo[df[create[0]] > 0.1].rank(ascending=False)
    df['new_'+create[1]] = (1+df[create[1]])
    df['log_'+create[1]] = np.log(df['new_'+create[1]])

print(df.fillna(value=0))

which outputs:

   foo  column1  column2   A1  new_A1    log_A1   A2  new_A2    log_A2
0    5     0.80     0.01  5.0     6.0  1.791759  0.0     0.0  0.000000
1   10     0.90     0.01  3.0     4.0  1.386294  0.0     0.0  0.000000
2   15     0.20     1.20  2.0     3.0  1.098612  2.0     3.0  1.098612
3    8     0.12     0.50  4.0     5.0  1.609438  3.0     4.0  1.386294
4   74     0.78     0.70  1.0     2.0  0.693147  1.0     2.0  0.693147

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM