[英]How do I iterate over a column in a Pandas.DataFrame and append the result of a function to the same row?
我有一個通過以下Pandas.DataFrame
生成的 Pandas.DataFrame:
Category,Brand,Product Name,Price,Expiration Date, Package ID,Quantity
Cat1,Brand1,Product1,$1000,07/14/2020,XXXXXX,34
我正在嘗試 append 到 CSV 的一列,每行中的 integer 對應於到期日期的多長時間( 4
表示大於 6 個月等)和3
之間。
My issue is that when trying to convert the Expiration Date
column to datetime (using pandas.to_datetime(df['Expiration Date'])
) and then apply my classify_expiration()
function, the types either do not match what the function indicates or it嘗試將 function 應用於index 0
,我認為這是 header (因此與%m/%d/%Y
格式不匹配)。 我已經嘗試將列轉換為分類 function 內的日期時間,以及在.apply()
調用之前的日期時間。 我也嘗試使用timedelta
將到期日期與今天的當前日期進行比較,但它不適用於datetime.date.today()
。
這是我嘗試的第一種方法:
def classify_expiration(row):
one_week = timedelta(weeks=1, days=0, hours=0, minutes=0, seconds=0)
if ((one_week * 0) <= (date.today() - row['Expiration Date']) <= (one_week * 4)):
return 4
這種方式給了我與index 0
處的類型不正確或無法將 function 應用於系列相關的錯誤。
這是我剛剛嘗試過的,它給了我一個AssertionError
:
def days_between(date1, date2):
"""Calculates the number of days between two dates
Keyword arguments:
date1 -- The first date in the subtraction.
date2 -- The second date in the subtraction.
"""
date1 = datetime.strptime(date1, '%m/%d/%Y')
date2 = datetime.strptime(date2, '%m/%d/%Y')
return abs((date2 - date1).days)
def classify_expiration(row):
"""Calculate days/weeks to expiration. Assign quartile based on value.
Keyword arguments:
row -- row in a `pandas.core.frame.DataFrame` object. e.g. `df['A']`
"""
date_today = datetime.strptime(
date.today().strftime('%m/%d/%Y'), '%m/%d/%Y')
if (days_between(row, date_today) <= 30):
return 4
if (31 <= days_between(row, date_today) <= 90):
return 3
if (91 <= days_between(row, date_today) <= 120):
return 2
if (days_between(row, date_today) >= 121):
return 1
這是我嘗試應用 function 的地方:
# Convert column to `datetime` if its current type is str
pd.to_datetime(product_sales['Expiration Date'])
# Applying the `classify_expiration()` function
product_sales['Expiration Quartile'] = product_sales.apply(
lambda row: classify_expiration(row), axis=1
)
我希望 function 到 append 是 DataFrame 的新列,其中包含每行中為到期日期生成的四分位數。 我會得到范圍從AssertionError
, argument 1 must be str, not Series
以及與index 0
相關的各種其他錯誤的錯誤。
您需要刪除轉換為days_between
function 如果返回product_sales['Expiration Date'] = pd.to_datetime(product_sales['Expiration Date'])
然后使用product_sales['Expiration Date'].apply(classify_expiration)
循環標量:
def days_between(date1, date2):
"""Calculates the number of days between two dates
Keyword arguments:
date1 -- The first date in the subtraction.
date2 -- The second date in the subtraction.
"""
return abs((date2 - date1).days)
product_sales['Expiration Date'] = pd.to_datetime(product_sales['Expiration Date'])
product_sales['Expiration Quartile'] = (product_sales['Expiration Date']
.apply(classify_expiration))
print (product_sales)
Category Brand Product Name Price Expiration Date Package ID Quantity \
0 Cat1 Brand1 Product1 $1000 2020-07-14 XXXXXX 34
Expiration Quartile
0 1
Pandas 具有 binnig 的特殊功能,因此您的 function 可以使用cut
:
product_sales['Expiration Date'] = pd.to_datetime(product_sales['Expiration Date'])
product_sales['Expiration Quartile'] = (product_sales['Expiration Date']
.apply(classify_expiration))
s = product_sales['Expiration Date'].sub(pd.to_datetime('today').floor('d')).dt.days
product_sales['Expiration Quartile1'] = pd.cut(s,
bins=[0, 30, 90,120, np.inf],
labels=[4,3,2,1])
print (product_sales)
Category Brand Product Name Price Expiration Date Package ID Quantity \
0 Cat1 Brand1 Product1 $1000 2020-07-14 XXXXXX 34
1 Cat1 Brand1 Product1 $1000 2020-01-13 XXXXXX 34
2 Cat1 Brand1 Product1 $1000 2019-11-01 XXXXXX 34
3 Cat1 Brand1 Product1 $1000 2020-01-15 XXXXXX 34
Expiration Quartile Expiration Quartile1
0 1 1
1 3 3
2 4 4
3 2 2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.