Pandas Dataframe - 條件列創建

Question

我正在嘗試基於另一列的條件邏輯創建一個新列。 我嘗試過搜索，卻找不到能解決我問題的任何內容。

我已經將CSV導入到pandas數據幀中，它的結構如下。 我編輯了這篇文章的一些描述，但除此之外一切都是一樣的：

#code used to load dataframe:
df = pd.read_csv(r"C:\filepath\filename.csv")

#output from print(type(df)):
#class 'pandas.core.frame.DataFrame'

#output from print(df.columns.values):
#['Type' 'Trans Date' 'Post Date' 'Description' 'Amount'] 

#output from print(df.columns):
    Index(['Type', 'Trans Date', 'Post Date', 'Description', 'Amount'], dtype='object')
#output from print

Type  Trans Date   Post Date            Description  Amount
0  Sale  01/25/2018  01/25/2018                  DESC1  -13.95

1  Sale  01/25/2018  01/26/2018   AMAZON MKTPLACE PMTS   -6.99

2  Sale  01/24/2018  01/25/2018          SUMMIT BISTRO   -5.85

3  Sale  01/24/2018  01/25/2018                  DESC3   -9.13

4  Sale  01/24/2018  01/26/2018    DYNAMIC VENDING INC   -1.60

然后我寫下面的代碼：

def criteria(row):
    if row.Description.find('SUMMIT BISTRO')>0:
        return 'Lunch'
    elif row.Description.find('AMAZON MKTPLACE PMTS')>0:
        return 'Amazon'
    elif row.Description.find('Aldi')>0:
        return 'Groceries'
    else:
        return 'NotWorking'

df['Category'] = df.apply(criteria, axis=0)

錯誤：

Traceback (most recent call last):
File "C:\Users\Test_BankReconcile2.py", line 44, in <module>
df['Category'] = df.apply(criteria, axis=0)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4262, in apply
ignore_failures=ignore_failures)
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\frame.py", line 4358, in _apply_standard
results[i] = func(v)
  File "C:\Users\OneDrive\Documents\finance\Test_BankReconcile2.py", line 35, in criteria
if row.Description.find('SUMMIT BISTRO')>0:
  File "C:\Users\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3081, in __getattr__
    return object.__getattribute__(self, name)
AttributeError: ("'Series' object has no attribute 'Description'", 'occurred at index Type')

我能夠在一個來自不同銀行的非常相似的csv文件上成功執行同樣的命令（這個例子來自我的信用卡），所以我不知道發生了什么，但我可能需要定義數據幀在某種程度上，我不是在做什么？ 或者可能是我沒有看到的其他非常明顯的東西？ 提前謝謝大家幫我解決這個問題。

Answer 1

是的，您的問題是您需要將axis=1傳遞給.apply ：

In [52]: df
Out[52]:
   Type  Trans Date   Post Date           Description  Amount
0  Sale  01/25/2018  01/25/2018                 DESC1  -13.95
1  Sale  01/25/2018  01/26/2018  AMAZON MKTPLACE PMTS   -6.99
2  Sale  01/24/2018  01/25/2018         SUMMIT BISTRO   -5.85
3  Sale  01/24/2018  01/25/2018                 DESC3   -9.13
4  Sale  01/24/2018  01/26/2018   DYNAMIC VENDING INC   -1.60

In [53]: def criteria(row):
    ...:     if row.Description.find('SUMMIT BISTRO')>0:
    ...:         return 'Lunch'
    ...:     elif row.Description.find('AMAZON MKTPLACE PMTS')>0:
    ...:         return 'Amazon'
    ...:     elif row.Description.find('Aldi')>0:
    ...:         return 'Groceries'
    ...:     else:
    ...:         return 'NotWorking'
    ...:

In [54]: df.apply(criteria, axis=1)
Out[54]:
0    NotWorking
1    NotWorking
2    NotWorking
3    NotWorking
4    NotWorking
dtype: object

第二個問題是你有一個邏輯錯誤，而不是.find(x) > 0你想要.find(x) >= 0 ，或者更好的是some_string in some_other_string

Answer 2

有關更一般的解決方案，請在循環中省略Description ，而是使用df['Description'].apply(criteria)與Series.apply 。

同樣在列使用支票子in 。

def criteria(row):
    if 'SUMMIT BISTRO' in row:
        return 'Lunch'
    elif 'AMAZON MKTPLACE PMTS' in row:
        return 'Amazon'
    elif 'Aldi' in row:
        return 'Groceries'
    else:
        return 'NotWorking'

df['Category'] = df['Description'].apply(criteria)

Pandas Dataframe - 條件列創建

問題描述

2 個解決方案

解決方案1
2 已采納 2018-01-28 20:59:32

解決方案2
1 2018-01-28 21:00:16

Pandas Dataframe - 條件列創建

問題描述

2 個解決方案

解決方案1 2 已采納 2018-01-28 20:59:32

解決方案2 1 2018-01-28 21:00:16

解決方案1
2 已采納 2018-01-28 20:59:32

解決方案2
1 2018-01-28 21:00:16