簡體   English   中英

如何在 Pandas DataFrame 中構建嵌套條件邏輯?

[英]How to build nested conditional logic in Pandas DataFrame?

我有一個數據框,其中每一行都包含一個字符串列表。 我編寫了一個函數,對每個字符串執行伯努利型試驗,如果試驗成功,則每個單詞都有一定的概率(此處為 0.5)被刪除。 見下文:

import numpy as np
import pandas as pd

def bernoulli_trial (sublist, prob = 0.5):

    # create mask of trial outcomes per each object in sublist
    mask = np.random.binomial(n=1, p=prob, size=len(sublist))

    # perform transformation on bernoulli successes
    transformed_sublist = [token for delete, token in zip(mask, sublist) if not delete]

    return transformed_sublist

當我傳遞數據幀的每一行時,這按預期工作,如下所示:

df = pd.DataFrame(data={'store': [1,2,3], 'colours': [['red','blue','yellow','green','brown','pink'],
                                                      ['black','white'],
                                                      ['purple','orange','cyan','mauve']]})

df['colours'] = df['colours'].apply(bernoulli_trial)

Out: 
0      [red, green]
1           [black]
2    [orange, cyan]
Name: colours, dtype: object

但是,我現在要做的不是在每個子列表和每個字符串上統一應用該函數,而是應用條件(a)是否將給定的子列表傳遞給函數(是/否),以及(b)哪個將應用該子列表中的字符串(即通過指定我只想測試某些顏色)。

) way to do this.我想我對 (a) 部分有一個可行的解決方案 - 通過將 Bernoulli 函數包裝在一個函數中,該函數檢查是否滿足給定條件(即子列表的長度是否大於 2 個對象?) - 這有效(見下文)但我不確定是否有更有效的(閱讀 )方法來做到這一點。

def sublist_condition_check(sublist):
    if len(sublist) > 2:
        sublist = bernoulli_trial(sublist)
    else:
        sublist = sublist
    return sublist

請注意,任何不滿足條件的子列表都應保持不變。

df['colours'].apply(sublist_condition_check)

Out: 
0      [red, brown]
1    [black, white] # this sublist had only two elements so remains unchanged
2           [mauve]
Name: colours, dtype: object

但是,我完全被困在如何對每個單詞應用條件邏輯上。 舉例來說,我只想將試驗應用於預先指定的顏色列表 ['red','mauve','black'] - 前提是它通過了子列表條件檢查 - 我該怎么做?

我希望實現的偽代碼如下所示:

for sublist in df:
    if len(sublist) > 2:     # check if sublist contains more than two objects
        for colour in sublist:     # cycle through each colour within the sublist
            if colour in ['red','mauve','black']:     
                colour = bernoulli_trial (colour)     # only run bernoulli if colour in list
            else:
                colour = colour     # if colour not in list, colour remains unchanged
        else:
            sublist = sublist     # if sublist <= 2, sublist remains unchanged

我知道對此的字面解釋是行不通的,因為最初的 bernoulli_trial 函數接收的是一個列表而不是單個字符串。 但希望它描述了我想要實現的目標。

不確定關於回答我自己的問題的禮儀,但我想我會提供一些我已經確定的工作解決方案的細節,以防萬一有人遇到類似的情況。

我已經擴展了初始 bernoulli 函數以包含一個額外的 if 語句,該語句基於每個字符串是否滿足包含標准。

# internal function - bernoulli trial for each string in sublist
def bernoulli_trial (sublist, prob = 0.50):

    # set token criteria for performing bernoulli trial
    token_criteria = ['red','black','purple'] # perform trial only on these strings

    # create mask of trial outcomes per each word in sublist
    mask = np.random.binomial(n=1, p=prob, size=len(turn))

    # perform transformation (deletion) on bernoulli successes
    transformed_turn = []
    for token, delete in zip(turn, mask):             
        if token not in token_criteria:
            transformed_turn.append(token)
        else:
            if delete == 0: # retain only those strings not marked for deletion
                transformed_turn.append(token)

    return transformed_sublist

結合問題中描述的sublist_condition_check函數,現在按預期執行

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM