this is my dataframe analytics: glnumber,nom,Year, YerarMonth,nom,amount
4020 Honoraires de consultation,,2018,201809,234294.31000
4020 Honoraires de consultation,,2018,201810,166337.95000
4020 Honoraires de consultation,,2018,201811,250590.67000
4020 Honoraires de consultation,,2018,201812,92206.82000
4020 Honoraires de consultation,,2019,201901,196868.71000
4020 Honoraires de consultation,,2019,201902,148145.20000
4020 Honoraires de consultation,,2019,201903,110973.24000
4020 Honoraires de consultation,,2019,201904,184858.18000
4020 Honoraires de consultation,,2019,201905,119166.87000
4020 Honoraires de consultation,,2019,201906,10428.10000
4020 Honoraires de consultation,,2019,201907,19927.05000
4020 Honoraires de consultation,,2019,201908,-22677.79000
4020 Honoraires de consultation,,2019,201909,-8560.00000
4020 Honoraires de consultation,,2020,202004,-26.25000
4020 Honoraires de consultation,,2020,202007,-0.02000
4020 Honoraires de consultation,,2021,202101,-105.00000
4020 Honoraires de consultation,,2021,202103,104.99000
4020 Honoraires de consultation,Aclient1,2020,202007,9000.00000
4020 Honoraires de consultation,Aclient1,2020,202008,14040.00000
4020 Honoraires de consultation,Aclient1,2020,202010,31185.00000
4020 Honoraires de consultation,Aclient1,2020,202011,14310.00000
4020 Honoraires de consultation,Aclient1,2020,202012,11160.00000
4020 Honoraires de consultation,Aclient1,2021,202101,14490.00000
4020 Honoraires de consultation,Aclient1,2021,202102,14670.00000
4020 Honoraires de consultation,Aclient2,2020,202003,21045.00000
4020 Honoraires de consultation,Aclient2,2020,202004,13340.00000
4020 Honoraires de consultation,Aclient2C,2020,202006,15640.00000
4020 Honoraires de consultation,Aclient2,2020,202008,54165.00000
4020 Honoraires de consultation,Aclient2,2020,202010,51750.00000
4020 Honoraires de consultation,Aclient2,2020,202011,23000.00000
4020 Honoraires de consultation,Aclient2,2020,202012,19550.00000
4020 Honoraires de consultation,Aclient2,2021,202101,21850.00000
4020 Honoraires de consultation,Aclient2,2021,202102,23000.00000
4020 Honoraires de consultation,Aclient3,2020,202001,937.50000
4020 Honoraires de consultation,Aclient2,2020,202003,437.50000
I want to have difference of amount with same gl, same client with previous month
I tried this but does not work
# check frequency by month by gl
analytics = q1.groupby(['glnumber','nom','Year','YearMonth'])[['amount']].sum().reset_index()
# order
#add previous sales to the next row
if analytics['glnumber'] == analytics['glnumber'].shift(1) and analytics['nom'] == analytics['nom'].shift(1):
analytics['prev_$'] = 0
else:
analytics['prev_$'] = analytics['amount'].shift(1)
#drop the null values and calculate the difference
analytics = analytics.dropna()
analytics['diff'] = (analytics['amount'] - analytics['prev_$'])
analytics = analytics.drop(['prev_$'],
axis='columns')
analytics['Perc_diff'] = np.where(analytics['amount']==0,0,analytics['diff']/analytics['amount'])
my if condition is not working due to this error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You need to check for NaN first and then compare. You can do it as follows in a single np.where condition.
import pandas as pd
import numpy as np
from io import StringIO
c = ['glnumber','nom','Year', 'YearMonth','nom_amount']
df = pd.read_csv(StringIO(d), sep = ',', header=None, names = c)
df = df.sort_values(by=['glnumber','nom','YearMonth'])
print (df.iloc[:,1:])
df['diff'] = np.where((((df.glnumber.isnull()) | (df.glnumber.shift(1).isnull()) | (df.glnumber == df.glnumber.shift(1))) &
((df.nom.isnull()) | (df.nom.shift(1).isnull()) | (df.nom == df.nom.shift(1))) &
(df.YearMonth.diff() == 1)), df.nom_amount.diff(), 0)
print (df.iloc[:,1:])
I am checking if glnumber
is null or glnumber.shift(1)
is null. If they are not, then I am doing a comparison of both values to ensure they are same.
Similarly, for df.nom
, checking if df.nom
is null or df.nom.shift(1)
is null. If not, compare both and see if they are same.
Then checking if the difference between the months is 1
as you want previous month only. If you want to exclude this and consider the previous line to be the previous month, thats OK too.
If it meets the condition, then find the difference between the nom_amount
between the two lines. If the condition is not met, then set np.NaN
as the value. Alternate, you can set the else to 0.
The output of this will be:
nom Year YearMonth nom_amount diff
17 Aclient1 2020 202007 9000.00 0.00
18 Aclient1 2020 202008 14040.00 5040.00
19 Aclient1 2020 202010 31185.00 0.00
20 Aclient1 2020 202011 14310.00 -16875.00
21 Aclient1 2020 202012 11160.00 -3150.00
22 Aclient1 2021 202101 14490.00 0.00
23 Aclient1 2021 202102 14670.00 180.00
24 Aclient2 2020 202003 21045.00 0.00
34 Aclient2 2020 202003 437.50 0.00
25 Aclient2 2020 202004 13340.00 12902.50
27 Aclient2 2020 202008 54165.00 0.00
28 Aclient2 2020 202010 51750.00 0.00
29 Aclient2 2020 202011 23000.00 -28750.00
30 Aclient2 2020 202012 19550.00 -3450.00
31 Aclient2 2021 202101 21850.00 0.00
32 Aclient2 2021 202102 23000.00 1150.00
26 Aclient2C 2020 202006 15640.00 0.00
33 Aclient3 2020 202001 937.50 0.00
0 NaN 2018 201809 234294.31 0.00
1 NaN 2018 201810 166337.95 -67956.36
2 NaN 2018 201811 250590.67 84252.72
3 NaN 2018 201812 92206.82 -158383.85
4 NaN 2019 201901 196868.71 0.00
5 NaN 2019 201902 148145.20 -48723.51
6 NaN 2019 201903 110973.24 -37171.96
7 NaN 2019 201904 184858.18 73884.94
8 NaN 2019 201905 119166.87 -65691.31
9 NaN 2019 201906 10428.10 -108738.77
10 NaN 2019 201907 19927.05 9498.95
11 NaN 2019 201908 -22677.79 -42604.84
12 NaN 2019 201909 -8560.00 14117.79
13 NaN 2020 202004 -26.25 0.00
14 NaN 2020 202007 -0.02 0.00
15 NaN 2021 202101 -105.00 0.00
16 NaN 2021 202103 104.99 0.00
Note that if glnumber and nom are NaN
for the second group, then this may result in a small problem. Alternate, you can groupby and do the same.
Groupby will ensure that the glnumber
and nom
are same for comparison.
Your error occurs because if statements in python require a singular True/False or 0/1 condition. What you're trying to pass is a pandas series full of True/False values, which it doesn't know how to process. What I would do is just do the first step on the entire dataframe, and then index the Series using your if statement logic:
analytics = q1.groupby(['glnumber','nom','Year','YearMonth'])[['amount']].sum().reset_index()
analytics['prev_$'] = analytics['amount'].shift(1)
analytics.loc[(analytics['glnumber'] == analytics['glnumber'].shift(1)) & (analytics['nom'] == analytics['nom'].shift(1)),'prev_$'] = 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.