简体   繁体   English

在熊猫数据框中基于1或更多列添加/插入值

[英]Adding/inserting values in pandas dataframe based on 1 or more columns

I have 2 dataframes 我有2个数据框

date       sitename  Auto_name                  AutoCount                         
2012-05-01 chess.com Autobiographer               8
2012-05-05 chess.com Autobiographer               1
2012-05-15 chess.com Autobiographer               3

And

date       sitename  Stu_name      Student count
2012-05-01 chess.com Student        4
2012-05-02 chess.com Student        2

How the output should be 输出应如何

date       sitename    Autoname                 AutoCount     Stu_name    Stu_count                     
2012-05-01 chess.com Autobiographer               8            Student       4
2012-05-02 chess.com Autobiographer               0            Student       2
2012-05-05 chess.com Autobiographer               1            Student       0
2012-05-15 chess.com Autobiographer               3            Student       0

I want to insert the name and student count from second into first but based on date column. 我想将姓名和学生人数从第二位插入第一位,但要基于日期列。 It doesn't look that difficult, but I am not able to figure out this one. 看起来并不困难,但我无法弄清楚这一点。

You can eg use the merge function (see the docs on merging dataframes: http://pandas.pydata.org/pandas-docs/stable/merging.html ). 您可以例如使用merge功能(请参阅有关合并数据帧的文档: http : //pandas.pydata.org/pandas-docs/stable/merging.html )。 Assuming your dataframes are called df1 and df2 : 假设您的数据帧称为df1df2

In [13]: df = pd.merge(df1, df2, how='outer')

In [14]: df
Out[14]: 
         date   sitename       Auto_name  AutoCount Stu_name  StudentCount
0  2012-05-01  chess.com  Autobiographer          8  Student             4
1  2012-05-05  chess.com  Autobiographer          1      NaN           NaN
2  2012-05-15  chess.com  Autobiographer          3      NaN           NaN
3  2012-05-02  chess.com             NaN        NaN  Student             2

Above it uses the common columns to merge on (in this case date and sitename ), but you can also specify the columns with the on keyword (see docs ). 在其上方,使用通用列进行合并(在本例中为datesitename ),但是您也可以使用on关键字指定列(请参阅docs )。

In a next step you can fill the NaN values as you like. 在下一步中,您可以根据需要填充NaN值。 Following your example output, this can be: 在示例输出之后,可以是:

In [15]: df.fillna({'Auto_name':'Autobiographer', 'AutoCount':0, 'Stu_name':'Student', 'StudentCount':0})
Out[15]: 
         date   sitename       Auto_name  AutoCount Stu_name  StudentCount
0  2012-05-01  chess.com  Autobiographer          8  Student             4
1  2012-05-05  chess.com  Autobiographer          1  Student             0
2  2012-05-15  chess.com  Autobiographer          3  Student             0
3  2012-05-02  chess.com  Autobiographer          0  Student             2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM