简体   繁体   English

如何基于熊猫中现有列的迭代来创建新列?

[英]How to create new column based on iteration of existing columns in pandas?

I have a dataframe, 我有一个数据框

     foo   column1 column2 ..... column9999
0     5      0.8      0.01
1     10     0.9      0.01
2     15     0.2      1.2
3     8      0.12     0.5
4     74     0.78     0.7
.      ...     ...

Based on this existing columns, I want to create new column. 基于此现有列,我想创建一个新列。
If I go one by one, it would be like this, 如果我一个人走,就会是这样,

df["A1"] = df.foo[df["column1"] > 0.1].rank(ascending=False)
df.A1.fillna(value=0, inplace=True)
df['new_A1'] = (1+df['A1'])
df['log_A1'] = np.log(df.['new_A1'])

But, I don't want to write down all columns(>900 columns). 但是,我不想写下所有列(> 900列)。
How can I iterate and create new columns? 如何迭代和创建新列?
Thanks in advance! 提前致谢!

Here's a cleaned up version of what I think you are trying to do: 这是我认为您要执行的操作的清理版本:

# Include only variables with the "column" stub
cols = [c for c in df.columns if 'column' in c]

for i, c in enumerate(cols):
    a = f"A{i+1}"
    df[a] = 1 + df.loc[df[c] > 0.1, 'foo'].rank(ascending=False)
    df[f'log_{a}'] = np.log(df[a]).fillna(value=0)

I'm assuming that you didn't need the variable new_A# column and was just using it as an intermediate column for the log calculation. 我假设您不需要变量new_A#列,而只是将其用作日志计算的中间列。

You can iterate through the different column names and perform the +1 and the log operations. 您可以遍历不同的列名称,并执行+1log操作。 When you use df.columns , you then receive a list of the different column headers. 使用df.columns ,您将收到不同列标题的列表。 So you can do something like this for example: 因此,您可以例如执行以下操作:

for index, column in enumerate(df.columns):
  df['new_A' + str(index)] = (1+df[column])
  df['log_A' + str(index)] = np.log(df['new_A' + str(index)])

You can add the rest of the operations too inside the same loop. 您也可以在同一循环内添加其余操作。

Hope it helps 希望能帮助到你

You can just do: 您可以这样做:

import pandas as pd
import numpy as np


df = pd.read_csv('something.csv')


a = ['A'+str(i) for i in range(1, len(df.columns.values))]
b = [x for x in df.columns.values if x != 'foo']
to_create = list(zip(b, a))
for create in to_create:
    df[create[1]] = df.foo[df[create[0]] > 0.1].rank(ascending=False)
    df['new_'+create[1]] = (1+df[create[1]])
    df['log_'+create[1]] = np.log(df['new_'+create[1]])

print(df.fillna(value=0))

which outputs: 输出:

   foo  column1  column2   A1  new_A1    log_A1   A2  new_A2    log_A2
0    5     0.80     0.01  5.0     6.0  1.791759  0.0     0.0  0.000000
1   10     0.90     0.01  3.0     4.0  1.386294  0.0     0.0  0.000000
2   15     0.20     1.20  2.0     3.0  1.098612  2.0     3.0  1.098612
3    8     0.12     0.50  4.0     5.0  1.609438  3.0     4.0  1.386294
4   74     0.78     0.70  1.0     2.0  0.693147  1.0     2.0  0.693147

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM