Python: pandas DataFrame 基於其他列的新列

Question

我有一個帶有 2 列的df ，如下所示：

        A       B
0  100-00     nan
1  200-00     nan
2   other  300-00
3  100-00    text
4   other     nan

我需要創建列 C ，它將應用如下邏輯：

如果 B 是 nan，那么 A
如果 B 以數字開頭，則 B
其他

我有如下代碼，它工作得很好，但我相信可能有更好、更有效的方法來做到這一點：

C = []
for r in range(df.shape[0]):
    if df['B'].iloc[r] == 'nan':
        C.append(df['A'].iloc[r])
    elif df['B'].iloc[r][:3].isnumeric():
        C.append(df['B'].iloc[r])
    else:
        C.append(df['A'].iloc[r])
df['C'] = C

df
        A       B       C
0  100-00     nan  100-00
1  200-00     nan  200-00
2   other  300-00  300-00
3  100-00    text  100-00
4   other     nan   other

提前感謝您的所有幫助。

Answer 1

如果df.B中的第一個字符是數字測試的，我簡化了df.B的解決方案， Series.str.contains使用正則表達式^表示字符串的開頭， \d表示numpy.where中的數字：

df['C'] = np.where(df['B'].str.contains(r'^\d', na=False), df.B, df.A)
#alternative
#df['C'] = df.B.where(df['B'].str.contains(r'^\d', na=False), df.A)
print (df)
        A       B       C
0  100-00     NaN  100-00
1  200-00     NaN  200-00
2   other  300-00  300-00
3  100-00    text  100-00
4   other     NaN   other

Answer 2

不一定更有效，但更pythonic的方式來做到這一點

import pandas as pd

df = pd.DataFrame({'A': ['100-00', '200-00', 'other', '100-00', 'other'], 'B': ['nan', 'nan', '300-00', 'text', 'nan']})

def label_columnC(row):
    if row['B'] == 'nan':
        return row['A']
    elif row['B'][:3].isnumeric():
        return row['B']
    else:
        return row['A']

df['C'] = df.apply(lambda row: label_columnC(row), axis=1)

Python: pandas DataFrame 基於其他列的新列

問題描述

2 個解決方案

解決方案1
2 已采納 2020-04-27 09:48:31

解決方案2
1 2020-04-27 09:51:29

Python: pandas DataFrame 基於其他列的新列

問題描述

2 個解決方案

解決方案1 2 已采納 2020-04-27 09:48:31

解決方案2 1 2020-04-27 09:51:29

解決方案1
2 已采納 2020-04-27 09:48:31

解決方案2
1 2020-04-27 09:51:29