如何根据 Pandas DataFrame 中其他列的值创建新列

Question

我是编程和 Pandas 的新手。 因此，请不要严格判断。

在这个表中，我需要添加一个从其他列获得的值的新列。

inp = [{'Date':2003, 'b1':5,'b2':0,'b3':4,'b4':3},{'Date':2003, 'b1':2,'b2':2,'b3':1,'b4':8},{'Date':2004, 'b1':2,'b2':3,'b3':1,'b4':1},{'Date':2004, 'b1':1,'b2':8,'b3':2,'b4':1},{'Date':2005, 'b1':2,'b2':1,'b3':6,'b4':2},{'Date':2006, 'b1':1,'b2':7,'b3':2,'b4':9}]
df = pd.DataFrame(inp)
print (df)

   Date  b1  b2  b3  b4
0  2003   5   0   4   3
1  2003   2   2   1   8
2  2004   2   3   1   1
3  2004   1   8   2   1
4  2005   2   1   6   2
5  2006   1   7   2   9

即，取决于日期。 也就是说，如果"Date" == 2003 - 我需要从b1列中获取值，如果"Date" = 2004 ，那么我需要从b2列中获取值， "Date" = 2004 - b3列等等。 所以新列的值应该是： 5,2,3,8,6,9 。

我有一本对应的字典 smt。 喜欢：

Corr_dict = {2003:'b1',2004:'b2',2005:'b4',2006:'b7'...}

这只是一个例子。 我有一个大数据集，所以我想了解其中的机制。

抱歉问题格式不好。 我将非常感谢任何帮助。

预计 output

   Date  b1  b2  b3  b4  vals
0  2003   5   0   4   3   5.0
1  2003   2   2   1   8   2.0
2  2004   2   3   1   1   3.0
3  2004   1   8   2   1   8.0
4  2005   2   1   6   2   6.0
5  2006   1   7   2   9   9.0

Answer 1

我会使用df.lookup ：

df['Correspond'] = df.lookup(df.index, df['Date'].map(dd))

MCVE：

import pandas as pd

import numpy as np

inp = [{'Date':2003, 'b1':5,'b2':0,'b3':4,'b4':3},{'Date':2003, 'b1':2,'b2':2,'b3':1,'b4':8},{'Date':2004, 'b1':2,'b2':3,'b3':1,'b4':1},{'Date':2004, 'b1':1,'b2':8,'b3':2,'b4':1},{'Date':2005, 'b1':2,'b2':1,'b3':6,'b4':2},{'Date':2006, 'b1':1,'b2':7,'b3':2,'b4':9}]
df = pd.DataFrame(inp)

dd = {2003:'b1', 2004:'b2', 2005:'b3', 2006:'b4'}

df['Correspond'] = df.lookup(df.index, df['Date'].map(dd))
print(df)

output：

   Date  b1  b2  b3  b4  Correspond
0  2003   5   0   4   3           5
1  2003   2   2   1   8           2
2  2004   2   3   1   1           3
3  2004   1   8   2   1           8
4  2005   2   1   6   2           6
5  2006   1   7   2   9           9

Answer 2

IIUC，我会为此写一个 function ：

def extract(df, year):
    min_year = df['Date'].min()
    return df.loc[df['Date']==year, df.columns[year+1 - min_year]]

extract(df, 2003)
# 0    5
# 1    2
# Name: b1, dtype: int64

并且全年作为一个列：

pd.concat(extract(df, year).rename('new_col') for year in df['Date'].unique())

Output：

0    5
1    2
2    3
3    8
4    6
5    9
Name: new_col, dtype: int64

Answer 3

国际大学联盟

s=df.set_index('Date').stack()
df['New']=s[s.index.isin(list(d.items()))].values

Answer 4

一种可能是使用melt ，按Date分组并在Corr_dict中查找以保留相应的值：

melted = df.melt(id_vars='Date')
m = melted.groupby('Date').apply(lambda x: x.variable.eq(Corr_dict[x.name]))
melted.loc[m.values]

    Date variable  value
0   2003       b1      5
1   2003       b1      2
10  2005       b2      1
11  2006       b2      7
19  2003       b4      8

Answer 5

如果您的逻辑更复杂，另一种方法是使用np.select

import numpy as np

col  = df['Date']

conditions = [(col.eq(2003)), (col.eq(2004)),(col.eq(2005)),(col.eq(2006))]

choices = [df['b1'],df['b2'],df['b3'],df['b4']]

df['vals'] = np.select(conditions,choices,default=np.nan)

print(df)


   Date  b1  b2  b3  b4  vals
0  2003   5   0   4   3   5.0
1  2003   2   2   1   8   2.0
2  2004   2   3   1   1   3.0
3  2004   1   8   2   1   8.0
4  2005   2   1   6   2   6.0
5  2006   1   7   2   9   9.0

Answer 6

这是您问题的直接解决方案

import numpy as np

# initialize the new column
df['b5'] = np.nan
df['b5'] = df['b5'].astype('Int64')

# modifiy your df in-place row by row
for idx, row in df.iterrows():
    date = row['Date']
    value = Corr_dict[date]
    df.at[idx, 'b5'] = row[value]

Output

    Date    b1  b2  b3  b4  b5
0   2003    5   0   4   3   5
1   2003    2   2   1   8   2
2   2004    2   3   1   1   3
3   2004    1   8   2   1   8
4   2005    2   1   6   2   2
5   2006    1   7   2   9   2

Answer 7

另一种方式：使用 map() 方法，因为您可以进行一些计算或数据更改。

import pandas as pd

dict = {'a' : ['a1', 'a2', 'a3'], 'b' : ['b1', 'b2', 'b3']}
df = pd.DataFrame(dict)

def third_column(param):
    # Here you can do some importans thinks with your new column data. 
    return param + "_created"

df['new_column'] = df['a'].map(third_column)

再见。

如何根据 Pandas DataFrame 中其他列的值创建新列

问题描述

7 个解决方案

解决方案1
6 已采纳 2020-04-09 14:20:36

解决方案2
2 2020-04-09 14:16:32

解决方案3
2 2020-04-09 14:17:57

解决方案4
2 2020-04-09 14:18:12

解决方案5
1 2020-04-09 14:20:05

解决方案6
1 2020-04-09 14:23:52

解决方案7
1

如何根据 Pandas DataFrame 中其他列的值创建新列

问题描述

7 个解决方案

解决方案1 6 已采纳 2020-04-09 14:20:36

解决方案2 2 2020-04-09 14:16:32

解决方案3 2 2020-04-09 14:17:57

解决方案4 2 2020-04-09 14:18:12

解决方案5 1 2020-04-09 14:20:05

解决方案6 1 2020-04-09 14:23:52

解决方案7 1

解决方案1
6 已采纳 2020-04-09 14:20:36

解决方案2
2 2020-04-09 14:16:32

解决方案3
2 2020-04-09 14:17:57

解决方案4
2 2020-04-09 14:18:12

解决方案5
1 2020-04-09 14:20:05

解决方案6
1 2020-04-09 14:23:52

解决方案7
1