將每個 pandas 行與列表字典和 append 新變量與 dataframe 進行比較

Question

我想檢查 pandas dataframe 字符串列的每一行和 append 如果在字典中找到文本列表的任何元素，則返回 1 的新列。

例子：

# Data
df = pd.DataFrame({'id': [1, 2, 3],
                   'text': ['This sentence may contain reference.',
                'Orange, blue cow','Does the cow operate any heavy machinery?']},
                 columns=['numbers', 'text'])

# Rule dictionary
rule_dict = {'rule1': ['Does', 'the'],
             'rule2':['Sentence','contain'],
             'rule3': ['any', 'reference', 'words']}

# List of variable names to be appended to df
rule_list = ['has_rule1','has_rule2','has_rule3']

# Current for loop
for Key in rule_dict:
    for i in rule_list:
        df[i] = df.text.apply(lambda x: (
            1 if any(ele in x for ele in rule_dict[Key]) == 1 and (len(str(x)) >= 3) 
            else 0))

# Current output, looks to be returning a 1 if text is found in ANY of the lists
df = pd.DataFrame({'id': [1, 2, 3],
                       'text': ['This sentence may contain reference.',
                    'Orange, blue cow','Does the cow operate any heavy machinery?'],
                    'has_rule1': [1,1,1],
                    'has_rule2': [0,0,0],
                    'has_rule3': [1,1,1]},
                     columns=['id', 'text','has_rule1','has_rule2','has_rule3'])

# Anticipated output
df = pd.DataFrame({'id': [1, 2, 3],
                       'text': ['This sentence may contain reference.',
                    'Orange, blue cow','Does the cow operate any heavy machinery?'],
                    'has_rule1': [0,0,1],
                    'has_rule2': [1,0,0],
                    'has_rule3': [1,0,1]},
                     columns=['id', 'text','has_rule1','has_rule2','has_rule3'])

Answer 1

假設您解決了評論中提到的有關 dict 理解的問題，則不應使用嵌套for循環。 相反，使用帶有zip的單個for循環：

for (k,v), n in zip(rule_dict.items(), rule_list):
    pat = rf'\b{"|".join(v)}\b'
    df[n] = df.text.str.contains(pat).astype(int)

Output：

      id  text                                         has_rule1    has_rule2    has_rule3
--  ----  -----------------------------------------  -----------  -----------  -----------
 0     1  This sentence may contain reference.                 0            1            1
 1     2  Orange, blue cow                                     0            0            0
 2     3  Does the cow operate any heavy machinery?            1            0            1

將每個 pandas 行與列表字典和 append 新變量與 dataframe 進行比較

問題描述

1 個解決方案

解決方案1
1 已采納 2020-07-16 19:59:49

將每個 pandas 行與列表字典和 append 新變量與 dataframe 進行比較

問題描述

1 個解決方案

解決方案1 1 已采納 2020-07-16 19:59:49

解決方案1
1 已采納 2020-07-16 19:59:49