簡體   English   中英

將字符串與元組列表中的元組進行比較 - python

[英]compare string against tuples in list of tuples - python

嘗試創建具有適當納稅年度的新列Tax_Year ,通過檢查date列中的日期時間是否在單個txYear_的元組元素的邊界內形成...

salesReport  = pd.DataFrame({'date': ['2017-07-02 09:00:00', '2017-07-03 15:00:00', '2018-04-05 15:00:00', 
                                    '2018-12-20 11:00:00', '2019-01-06 14:00:00', '2020-09-06 17:00:00'], 
                            'sales': [100, 339, 98, 1020, 630, 765]})

txYear_0304 = (dt.datetime(2003, 4, 6), dt.datetime(2004, 4, 5))
txYear_0405 = (dt.datetime(2004, 4, 6), dt.datetime(2005, 4, 5))
txYear_0506 = (dt.datetime(2005, 4, 6), dt.datetime(2006, 4, 5))
txYear_0607 = (dt.datetime(2006, 4, 6), dt.datetime(2007, 4, 5))
txYear_0708 = (dt.datetime(2007, 4, 6), dt.datetime(2008, 4, 5))
txYear_0809 = (dt.datetime(2008, 4, 6), dt.datetime(2009, 4, 5))
txYear_0910 = (dt.datetime(2009, 4, 6), dt.datetime(2010, 4, 5))
txYear_1011 = (dt.datetime(2010, 4, 6), dt.datetime(2011, 4, 5))
txYear_1112 = (dt.datetime(2011, 4, 6), dt.datetime(2012, 4, 5))
txYear_1213 = (dt.datetime(2012, 4, 6), dt.datetime(2013, 4, 5))
txYear_1314 = (dt.datetime(2013, 4, 6), dt.datetime(2014, 4, 5))
txYear_1415 = (dt.datetime(2014, 4, 6), dt.datetime(2015, 4, 5))
txYear_1516 = (dt.datetime(2015, 4, 6), dt.datetime(2016, 4, 5))
txYear_1617 = (dt.datetime(2016, 4, 6), dt.datetime(2017, 4, 5))
txYear_1718 = (dt.datetime(2017, 4, 6), dt.datetime(2018, 4, 5))
txYear_1819 = (dt.datetime(2018, 4, 6), dt.datetime(2019, 4, 5))
txYear_1920 = (dt.datetime(2019, 4, 6), dt.datetime(2020, 4, 5))
txYear_2021 = (dt.datetime(2020, 4, 6), dt.datetime(2021, 4, 5))

tax_year = [txYear_0304, txYear_0405, txYear_0506, txYear_0607, txYear_0708, txYear_0809, txYear_0910, txYear_1011, txYear_1112, 
            txYear_1213, txYear_1314, txYear_1415, txYear_1516, txYear_1617, txYear_1718, txYear_1819, txYear_1920,  txYear_2021]

滿足此條件時,我希望變量名稱出現在新列的相應行中

例如

                  date  sales      Tax_Year
0  2017-07-02 09:00:00    100   txYear_1617  
1  2017-07-03 15:00:00    339   txYear_1617
2  2018-04-05 15:00:00     98   txYear_1718 
3  2018-12-20 11:00:00   1020   txYear_1819
4  2019-01-06 14:00:00    630   txYear_1819
5  2020-09-06 17:00:00    765   txYear_2021

我已經使用np.where .... 解決了這個問題。

salesReport['Tax_Year'] = np.where(tax_year[0] <= salesReport['date'] and tax_year[1] >= salesReport['date'], tax_year, np.nan)

但是,我無法解決收到的錯誤...

TypeError: '>=' not supported between instances of 'str' and 'tuple'

此外,我也不確定如何獲取變量名,因為目前我將返回實際的元組內容,這不是我想要的

免責聲明:

我不精通Pandas。 如果有更好的方法來做到這一點,我不會感到驚訝。

我已將tax_years元組列表轉換為字典,並定義了一個獨立的 function 來獲取給定日期時間 object 的納稅年度。 我實際上不是 100% 納稅年度的結束/開始時間,因此比較僅在 MM-DD-YY 上進行,並從 dataframe 中存在的時間戳中刪除時間。

import pandas as pd
import numpy as np
import datetime

tax_years = {
    (datetime.datetime(2003, 4, 6), datetime.datetime(2004, 4, 5)): "TY0304",
    (datetime.datetime(2004, 4, 6), datetime.datetime(2005, 4, 5)): "TY0405",
    (datetime.datetime(2005, 4, 6), datetime.datetime(2006, 4, 5)): "TY0506",
    (datetime.datetime(2006, 4, 6), datetime.datetime(2007, 4, 5)): "TY0607",
    (datetime.datetime(2007, 4, 6), datetime.datetime(2008, 4, 5)): "TY0708",
    (datetime.datetime(2008, 4, 6), datetime.datetime(2009, 4, 5)): "TY0809",
    (datetime.datetime(2009, 4, 6), datetime.datetime(2010, 4, 5)): "TY0910",
    (datetime.datetime(2010, 4, 6), datetime.datetime(2011, 4, 5)): "TY1011",
    (datetime.datetime(2011, 4, 6), datetime.datetime(2012, 4, 5)): "TY1112",
    (datetime.datetime(2012, 4, 6), datetime.datetime(2013, 4, 5)): "TY1213",
    (datetime.datetime(2013, 4, 6), datetime.datetime(2014, 4, 5)): "TY1314",
    (datetime.datetime(2014, 4, 6), datetime.datetime(2015, 4, 5)): "TY1415",
    (datetime.datetime(2015, 4, 6), datetime.datetime(2016, 4, 5)): "TY1516",
    (datetime.datetime(2016, 4, 6), datetime.datetime(2017, 4, 5)): "TY1617",
    (datetime.datetime(2017, 4, 6), datetime.datetime(2018, 4, 5)): "TY1718",
    (datetime.datetime(2018, 4, 6), datetime.datetime(2019, 4, 5)): "TY1819",
    (datetime.datetime(2019, 4, 6), datetime.datetime(2020, 4, 5)): "TY1920",
    (datetime.datetime(2020, 4, 6), datetime.datetime(2021, 4, 5)): "TY2021"
}

salesReport  = pd.DataFrame({'date': ['2017-07-02 09:00:00',
                                      '2017-07-03 15:00:00',
                                      '2018-04-05 15:00:00',
                                      '2018-12-20 11:00:00',
                                      '2019-01-06 14:00:00',
                                      '2020-09-06 17:00:00'], 
                            'sales': [100, 339, 98, 1020, 630, 765]})

salesReport["date"] = pd.to_datetime(salesReport["date"])


def get_tax_year(date):
    for (start, end), tax_year in tax_years.items():
        if start.date() <= date.date() <= end.date():
            return tax_year
    return "null"


salesReport["tax_year"] = [get_tax_year(date) for date in salesReport["date"]]
print(salesReport)

和 output:

                 date  sales tax_year
0 2017-07-02 09:00:00    100   TY1718
1 2017-07-03 15:00:00    339   TY1718
2 2018-04-05 15:00:00     98   TY1718
3 2018-12-20 11:00:00   1020   TY1819
4 2019-01-06 14:00:00    630   TY1819
5 2020-09-06 17:00:00    765   TY2021

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM