如何根據 Pandas DataFrame 中其他列的值創建新列

Question

我是編程和 Pandas 的新手。 因此，請不要嚴格判斷。

在這個表中，我需要添加一個從其他列獲得的值的新列。

inp = [{'Date':2003, 'b1':5,'b2':0,'b3':4,'b4':3},{'Date':2003, 'b1':2,'b2':2,'b3':1,'b4':8},{'Date':2004, 'b1':2,'b2':3,'b3':1,'b4':1},{'Date':2004, 'b1':1,'b2':8,'b3':2,'b4':1},{'Date':2005, 'b1':2,'b2':1,'b3':6,'b4':2},{'Date':2006, 'b1':1,'b2':7,'b3':2,'b4':9}]
df = pd.DataFrame(inp)
print (df)

   Date  b1  b2  b3  b4
0  2003   5   0   4   3
1  2003   2   2   1   8
2  2004   2   3   1   1
3  2004   1   8   2   1
4  2005   2   1   6   2
5  2006   1   7   2   9

即，取決於日期。 也就是說，如果"Date" == 2003 - 我需要從b1列中獲取值，如果"Date" = 2004 ，那么我需要從b2列中獲取值， "Date" = 2004 - b3列等等。 所以新列的值應該是： 5,2,3,8,6,9 。

我有一本對應的字典 smt。 喜歡：

Corr_dict = {2003:'b1',2004:'b2',2005:'b4',2006:'b7'...}

這只是一個例子。 我有一個大數據集，所以我想了解其中的機制。

抱歉問題格式不好。 我將非常感謝任何幫助。

預計 output

   Date  b1  b2  b3  b4  vals
0  2003   5   0   4   3   5.0
1  2003   2   2   1   8   2.0
2  2004   2   3   1   1   3.0
3  2004   1   8   2   1   8.0
4  2005   2   1   6   2   6.0
5  2006   1   7   2   9   9.0

Answer 1

我會使用df.lookup ：

df['Correspond'] = df.lookup(df.index, df['Date'].map(dd))

MCVE：

import pandas as pd

import numpy as np

inp = [{'Date':2003, 'b1':5,'b2':0,'b3':4,'b4':3},{'Date':2003, 'b1':2,'b2':2,'b3':1,'b4':8},{'Date':2004, 'b1':2,'b2':3,'b3':1,'b4':1},{'Date':2004, 'b1':1,'b2':8,'b3':2,'b4':1},{'Date':2005, 'b1':2,'b2':1,'b3':6,'b4':2},{'Date':2006, 'b1':1,'b2':7,'b3':2,'b4':9}]
df = pd.DataFrame(inp)

dd = {2003:'b1', 2004:'b2', 2005:'b3', 2006:'b4'}

df['Correspond'] = df.lookup(df.index, df['Date'].map(dd))
print(df)

output：

   Date  b1  b2  b3  b4  Correspond
0  2003   5   0   4   3           5
1  2003   2   2   1   8           2
2  2004   2   3   1   1           3
3  2004   1   8   2   1           8
4  2005   2   1   6   2           6
5  2006   1   7   2   9           9

Answer 2

IIUC，我會為此寫一個 function ：

def extract(df, year):
    min_year = df['Date'].min()
    return df.loc[df['Date']==year, df.columns[year+1 - min_year]]

extract(df, 2003)
# 0    5
# 1    2
# Name: b1, dtype: int64

並且全年作為一個列：

pd.concat(extract(df, year).rename('new_col') for year in df['Date'].unique())

Output：

0    5
1    2
2    3
3    8
4    6
5    9
Name: new_col, dtype: int64

Answer 3

國際大學聯盟

s=df.set_index('Date').stack()
df['New']=s[s.index.isin(list(d.items()))].values

Answer 4

一種可能是使用melt ，按Date分組並在Corr_dict中查找以保留相應的值：

melted = df.melt(id_vars='Date')
m = melted.groupby('Date').apply(lambda x: x.variable.eq(Corr_dict[x.name]))
melted.loc[m.values]

    Date variable  value
0   2003       b1      5
1   2003       b1      2
10  2005       b2      1
11  2006       b2      7
19  2003       b4      8

Answer 5

如果您的邏輯更復雜，另一種方法是使用np.select

import numpy as np

col  = df['Date']

conditions = [(col.eq(2003)), (col.eq(2004)),(col.eq(2005)),(col.eq(2006))]

choices = [df['b1'],df['b2'],df['b3'],df['b4']]

df['vals'] = np.select(conditions,choices,default=np.nan)

print(df)


   Date  b1  b2  b3  b4  vals
0  2003   5   0   4   3   5.0
1  2003   2   2   1   8   2.0
2  2004   2   3   1   1   3.0
3  2004   1   8   2   1   8.0
4  2005   2   1   6   2   6.0
5  2006   1   7   2   9   9.0

Answer 6

這是您問題的直接解決方案

import numpy as np

# initialize the new column
df['b5'] = np.nan
df['b5'] = df['b5'].astype('Int64')

# modifiy your df in-place row by row
for idx, row in df.iterrows():
    date = row['Date']
    value = Corr_dict[date]
    df.at[idx, 'b5'] = row[value]

Output

    Date    b1  b2  b3  b4  b5
0   2003    5   0   4   3   5
1   2003    2   2   1   8   2
2   2004    2   3   1   1   3
3   2004    1   8   2   1   8
4   2005    2   1   6   2   2
5   2006    1   7   2   9   2

Answer 7

另一種方式：使用 map() 方法，因為您可以進行一些計算或數據更改。

import pandas as pd

dict = {'a' : ['a1', 'a2', 'a3'], 'b' : ['b1', 'b2', 'b3']}
df = pd.DataFrame(dict)

def third_column(param):
    # Here you can do some importans thinks with your new column data. 
    return param + "_created"

df['new_column'] = df['a'].map(third_column)

再見。

如何根據 Pandas DataFrame 中其他列的值創建新列

問題描述

7 個解決方案

解決方案1
6 已采納 2020-04-09 14:20:36

解決方案2
2 2020-04-09 14:16:32

解決方案3
2 2020-04-09 14:17:57

解決方案4
2 2020-04-09 14:18:12

解決方案5
1 2020-04-09 14:20:05

解決方案6
1 2020-04-09 14:23:52

解決方案7
1

如何根據 Pandas DataFrame 中其他列的值創建新列

問題描述

7 個解決方案

解決方案1 6 已采納 2020-04-09 14:20:36

解決方案2 2 2020-04-09 14:16:32

解決方案3 2 2020-04-09 14:17:57

解決方案4 2 2020-04-09 14:18:12

解決方案5 1 2020-04-09 14:20:05

解決方案6 1 2020-04-09 14:23:52

解決方案7 1

解決方案1
6 已采納 2020-04-09 14:20:36

解決方案2
2 2020-04-09 14:16:32

解決方案3
2 2020-04-09 14:17:57

解決方案4
2 2020-04-09 14:18:12

解決方案5
1 2020-04-09 14:20:05

解決方案6
1 2020-04-09 14:23:52

解決方案7
1