I have constructed a pretty basic dataframe where the column names are years:
import pandas as pd
column_names = [x for x in range(2000,2005)]
df = pd.DataFrame(columns=column_names)
Which of course gives me a dataframe where the column names are years and currently has no entries.
2000 2001 2002 2003 2004
I also have a different dataframe where one column has specific dates and the second column has the corresponding year. I call this dataframe set0 because I will continously add more sets which are then numbered (set1,set2,etc.).
data = {'Date': ['2001-06-08', '2002-05-23', '2002-05-24', '2003-06-23'],
'Year': [2001, 2002, 2002, 2003]}
df2 = pd.DataFrame(data)
Date Year
0 2001-06-08 2001
1 2002-05-23 2002
2 2002-05-24 2002
3 2003-06-23 2003
Now what I want to do is to create something like this: It takes the first dataframe, adds a first column which has a name of a certain dataset, in this case set0. I will then group this dataset by years and if I have an entry for a year I can create this:
set_name 2000 2001 2002 2003 2004
set0 0 1 2 1 0
I have found nothing similar on the web. I have done the grouping but then wasn't able to add the entries in the corresponding columns. Any help or hint is much appreciated!
Does this answer your question?
import pandas as pd
column_names = [x for x in range(2000, 2005)]
df = pd.DataFrame(index=column_names)
data = {
'Date': ['2001-06-08', '2002-05-23', '2002-05-24', '2003-06-23'],
'Year': [2001, 2002, 2002, 2003]
}
df2 = pd.DataFrame(data)
df2_grouped = df2.groupby('Year').count()['Date']
df['set0'] = df2_grouped
df = df.fillna(0).reset_index(names='set_name').pivot_table(columns='set_name')
print(df)
Result:
set_name 2000 2001 2002 2003 2004
set0 0.0 1.0 2.0 1.0 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.