I have 2 dataframes
date sitename Auto_name AutoCount
2012-05-01 chess.com Autobiographer 8
2012-05-05 chess.com Autobiographer 1
2012-05-15 chess.com Autobiographer 3
And
date sitename Stu_name Student count
2012-05-01 chess.com Student 4
2012-05-02 chess.com Student 2
How the output should be
date sitename Autoname AutoCount Stu_name Stu_count
2012-05-01 chess.com Autobiographer 8 Student 4
2012-05-02 chess.com Autobiographer 0 Student 2
2012-05-05 chess.com Autobiographer 1 Student 0
2012-05-15 chess.com Autobiographer 3 Student 0
I want to insert the name and student count from second into first but based on date column. It doesn't look that difficult, but I am not able to figure out this one.
You can eg use the merge
function (see the docs on merging dataframes: http://pandas.pydata.org/pandas-docs/stable/merging.html ). Assuming your dataframes are called df1
and df2
:
In [13]: df = pd.merge(df1, df2, how='outer')
In [14]: df
Out[14]:
date sitename Auto_name AutoCount Stu_name StudentCount
0 2012-05-01 chess.com Autobiographer 8 Student 4
1 2012-05-05 chess.com Autobiographer 1 NaN NaN
2 2012-05-15 chess.com Autobiographer 3 NaN NaN
3 2012-05-02 chess.com NaN NaN Student 2
Above it uses the common columns to merge on (in this case date
and sitename
), but you can also specify the columns with the on
keyword (see docs ).
In a next step you can fill the NaN values as you like. Following your example output, this can be:
In [15]: df.fillna({'Auto_name':'Autobiographer', 'AutoCount':0, 'Stu_name':'Student', 'StudentCount':0})
Out[15]:
date sitename Auto_name AutoCount Stu_name StudentCount
0 2012-05-01 chess.com Autobiographer 8 Student 4
1 2012-05-05 chess.com Autobiographer 1 Student 0
2 2012-05-15 chess.com Autobiographer 3 Student 0
3 2012-05-02 chess.com Autobiographer 0 Student 2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.