I have a dataframe that looks something like this:
df = pd.DataFrame({'Name':['a','a','a','a','b','b','b'], 'Year':[1999,1999,1999,2000,1999,2000,2000], 'Name_id':[1,1,1,1,2,2,2]})
Name Name_id Year
0 a 1 1999
1 a 1 1999
2 a 1 1999
3 a 1 2000
4 b 2 1999
5 b 2 2000
6 b 2 2000
What I'd like to have is a new column 'yr_name_id' that increases for each unique Name_id-Year combination and then begins anew with each new Name_id.
Name Name_id Year yr_name_id
0 a 1 1999 1
1 a 1 1999 1
2 a 1 1999 1
3 a 1 2000 2
4 b 2 1999 1
5 b 2 2000 2
6 b 2 2000 2
I've tried a variety of things and looked here , here and at a few posts on groupby and enumerate.
At first I tried creating a unique dictionary after combining Name_id and Year and then using map to assign values, but when I try to combine Name_id and Year as strings via:
df['yr_name_id'] = str(df['Name_id']) + str(df['Year'])
The new column has a non-unique syntax of 0 0 1\\n1 1\\n2 1\\n3 1\\n4 2\\n5 2...
which I don't really understand.
A more promising approach that I think I just need help with the lambda is by using groupby
df['yr_name_id'] = df.groupby(['Name_id', 'Year'])['Name_id'].transform(lambda x: )#unsure from this point
I am very unfamiliar with lambda's so any guidance on how I might do this would be greatly appreciated.
IIUC you can do it this way:
In [99]: df['yr_name_id'] = pd.Categorical(pd.factorize(df['Name_id'].astype(str) + '-' + df['Year'].astype(str))[0] + 1)
In [100]: df
Out[100]:
Name Name_id Year yr_name_id
0 a 1 1999 1
1 a 1 1999 1
2 a 1 1999 1
3 a 1 2000 2
4 b 2 1999 3
5 b 2 2000 4
6 b 2 2000 4
In [101]: df.dtypes
Out[101]:
Name object
Name_id int64
Year int64
yr_name_id category
dtype: object
But looking at your desired DF, it looks like you want to categorize just a Year
column, not a combination of Name_id
+ Year
In [102]: df['yr_name_id'] = pd.Categorical(pd.factorize(df.Year)[0] + 1)
In [103]: df
Out[103]:
Name Name_id Year yr_name_id
0 a 1 1999 1
1 a 1 1999 1
2 a 1 1999 1
3 a 1 2000 2
4 b 2 1999 1
5 b 2 2000 2
6 b 2 2000 2
In [104]: df.dtypes
Out[104]:
Name object
Name_id int64
Year int64
yr_name_id category
dtype: object
Use itertools.count
:
from itertools import count
counter = count(1)
df['yr_name_id'] = (df.groupby(['Name_id', 'Year'])['Name_id']
.transform(lambda x: next(counter)))
Output:
Name Name_id Year yr_name_id
0 a 1 1999 1
1 a 1 1999 1
2 a 1 1999 1
3 a 1 2000 2
4 b 2 1999 3
5 b 2 2000 4
6 b 2 2000 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.