I have a DataFrame like this:
d = {'buy': Series([1., 0., 1., 0., 1., 1., 0., 0., 1., 1., 1., 0., 1., 0.]),
'id': Series([1., 2., 4., 2., 3., 4., 1., 1., 2., 1., 3., 3., 2., 3.]), 'datetime': Series(['01.02.2015',
'01.02.2015', '01.03.2015', '03.01.2015', '06.02.2015', '01.09.2015', '18.03.2015', '02.02.2015', '03.02.2015',
'06.04.2015', '01.04.2015', '03.04.2015', '02.04.2015', '20.03.2015'])}
df = DataFrame(d)
print(df)
buy datetime id
0 1 01.02.2015 1
1 0 01.02.2015 2
2 1 01.03.2015 4
3 0 03.01.2015 2
4 0 06.02.2015 3
5 1 01.09.2015 4
6 0 18.03.2015 1
7 0 02.02.2015 1
8 1 03.02.2015 2
9 1 06.04.2015 1
10 1 01.04.2015 3
11 0 03.04.2015 3
12 1 02.04.2015 2
13 0 20.03.2015 3
Firstly, I group it by 'id' and receive only the latest 'datetime' from each 'id':
df1 = df.sort(columns=['datetime']).drop_duplicates(subset='id', take_last=True)
print(df1)
buy datetime id
5 1 01.09.2015 4
8 1 03.02.2015 2
6 0 18.03.2015 1
13 0 20.03.2015 3
And next I need to sum every id's 'buy' and join the new column (I named it buy_count') with my DataFrame. I have smth like this:
buys = df.groupby(by='id')['buy'].sum()
print(buys)
id
1 2
2 2
3 1
4 2
But I can't insert 'buy_count' to the DataFrame:
df1['buys_count'] = buys
print(df1)
buy datetime id buys_count
5 1 01.09.2015 4 NaN
8 1 03.02.2015 2 NaN
6 0 18.03.2015 1 NaN
13 0 20.03.2015 3 NaN
As I guess there is some trouble with indexes. Tried to change indexes, try use loops, but all were unsuccessful. How can I get this?
You can call map
against 'id' column of df1
and pass buys
to perform a lookup:
In [270]:
df1['buy_count'] = df1['id'].map(buys)
df1
Out[270]:
buy datetime id buy_count
5 1 01.09.2015 4 2
8 1 03.02.2015 2 2
6 0 18.03.2015 1 2
13 0 20.03.2015 3 2
By the way I don't get the same output as you for buys
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.