I have to calculate a metric that requires me to find the attributes of the same 'user' from multiple columns. For example, I have two data frames shown below:
calls_per_month.head(10)
user_id month call_date
0 1000 12 16
1 1001 8 27
2 1001 9 49
3 1001 10 65
4 1001 11 64
5 1001 12 56
6 1002 10 11
7 1002 11 55
8 1002 12 47
9 1003 12 149
internet_per_month.head(10)
user_id session_date mb_used
0 1000 12 2000.0
1 1001 8 7000.0
2 1001 9 14000.0
3 1001 10 23000.0
4 1001 11 19000.0
5 1001 12 20000.0
6 1002 10 7000.0
7 1002 11 20000.0
8 1002 12 15000.0
9 1003 12 28000.0
I want to calculate a metric that would look something like this for each user_id for every month they used the internet or made a call: `usage = mb_used + call_date' and it would be a column that would look like ( I have done hand calculation):
user_id month usage
0 1000 12 2016
1 1001 8 7027
2 1001 9 14049
3 1001 10 23065
4 1001 11 19064
5 1001 12 20056
6 1002 10 7011
7 1002 11 20055
8 1002 12 15047
9 1003 12 28149
The head of the above I showed does not show it, but there are some users who did not make a call in a particular month but used data, so I have to account for that, in the sense it should not ignore those users and just add 0 for the data not available.
Should I first do an outer join of the tables? Or is creating a new table not the correct way to do it? Any guidance is appreciated.
Thank you
You should merge or join these first, then do the operation. Here I'm doing a left join
on internet_per_month
(and a call to fillna
); if it's possible that someone made calls but not internet, an outer join would be preferable.
df = pd.merge(
left=internet_per_month,
right=calls_per_month,
how="left",
left_on=["user_id", "session_date"],
right_on=["user_id", "month"],
)
df.fillna(0)
df["usage"] = df["mb_used"] + df["call_date"]
output:
user_id month call_date session_date mb_used usage
0 1000 12 16 12 2000.0 2016.0
1 1001 8 27 8 7000.0 7027.0
2 1001 9 49 9 14000.0 14049.0
3 1001 10 65 10 23000.0 23065.0
4 1001 11 64 11 19000.0 19064.0
5 1001 12 56 12 20000.0 20056.0
6 1002 10 11 10 7000.0 7011.0
7 1002 11 55 11 20000.0 20055.0
8 1002 12 47 12 15000.0 15047.0
9 1003 12 149 12 28000.0 28149.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.