在 Pandas Z6A8064B5DF479455500553C47C50 上使用 apply 和 lambda function Z6A8064B5DF479455500553C47C50

Question

這是對這個問題的跟進： How to create new column based on substrings in other column in a pandas dataframe?

dataframe結構如下

df = pd.DataFrame({
    'Other input': ['Text A', 'Text B', 'Text C', 'Text D', 'Text E'],
    'Substance': ['(NPK) 20/10/6', NaN, '46%N / O%P2O5 (Urea)', '46%N / O%P2O5 (Urea)', '(NPK) DAP Diammonphosphat; 18/46/0'],
    'value': [0.2, NaN, 0.6, 0.8, .9]
})

    Other Input  substance               value
0   Text A       (NPK) 20/10/6           0.2
1   Text B       NaN                     NaN
2   Text C       46%N / O%P2O5 (Urea)    0.6
3   Text D       46%N / O%P2O5 (Urea)    0.8
4   Text E       (NPK) DAP Diammonphosphat; 18/46/0          0.9

它是通過將兩個 df 與左連接合並創建的，結果發現我的行沒有material和value 。 我需要用短名稱替換該物質，並且在數據集中缺少值之前，以下代碼有效：

test['Short Name'] = test['Substance'].apply(lambda x: 'Urea' if 'Urea' in x else 'DAP' if 'DAP' in x else '(NPK)')

我怎樣才能使這個工作與 NaN （或 0，如果這更容易）？ 是否有等效於na_action=None的東西顯然適用於 applymap？

Answer 1

如果要跳過包含 NaN 的行，只需在apply() dropna()的調用。 這將創建 dataframe 的新臨時副本，其中刪除了所有列中包含 NaN 的所有行。

test['Short Name'] = test.dropna()['Substance'].apply(lambda x: 'Urea' if 'Urea' in x else 'DAP' if 'DAP' in x else '(NPK)')

Output：

>>> test
  Other input                           Substance  value     Te
0      Text A                       (NPK) 20/10/6    0.2  (NPK)
1      Text B                                 NaN    NaN    NaN
2      Text C                46%N / O%P2O5 (Urea)    0.6   Urea
3      Text D                46%N / O%P2O5 (Urea)    0.8   Urea
4      Text E  (NPK) DAP Diammonphosphat; 18/46/0    0.9    DAP

這將起作用，因為將Series對象分配給DataFrame對象會使用它們的索引，並且如果您在添加dropna()后檢查apply()調用的返回值：

>>> test.dropna()['Substance'].apply(lambda x: 'Urea' if 'Urea' in x else 'DAP' if 'DAP' in x else '(NPK)')
0    (NPK)
2     Urea
3     Urea
4      DAP
Name: Substance, dtype: object

注意它是如何從 0 跳到 2 的。那是因為索引 1 處的行被刪除了，但索引沒有更新（在這種情況下我們想要）。

Answer 2

你可以做：

df = df.assign(
    short_name = df.Substance.apply(
        lambda x:
            do_this_if_x_is_not_NaN(x) if x is not np.nan
            else do_this_if_x_is_NaN(x)))

具有功能：

def do_this_if_x_is_not_NaN(x):
    return 'Urea' if 'Urea' in x else 'DAP' if 'DAP' in x else '(NPK)'

def do_this_if_x_is_NaN(x):
    return np.nan # keeping the NaN, or whatever you want to return if x is NaN

df = df.assign(col_name =...)只是表達df['col_name'] =...的另一種方式。

您的df將變為：

  Other input                           Substance  value short_name
0      Text A                       (NPK) 20/10/6    0.2      (NPK)
1      Text B                                 NaN    NaN        NaN
2      Text C                46%N / O%P2O5 (Urea)    0.6       Urea
3      Text D                46%N / O%P2O5 (Urea)    0.8       Urea
4      Text E  (NPK) DAP Diammonphosphat; 18/46/0    0.9        DAP

在 Pandas Z6A8064B5DF479455500553C47C50 上使用 apply 和 lambda function Z6A8064B5DF479455500553C47C50

問題描述

2 個解決方案

解決方案1
1 2021-12-11 18:18:03

解決方案2
0 2021-12-11 18:18:31

在 Pandas Z6A8064B5DF479455500553C47C50 上使用 apply 和 lambda function Z6A8064B5DF479455500553C47C50

問題描述

2 個解決方案

解決方案1 1 2021-12-11 18:18:03

解決方案2 0 2021-12-11 18:18:31

解決方案1
1 2021-12-11 18:18:03

解決方案2
0 2021-12-11 18:18:31