简体   繁体   English

获取以年份熊猫python数据框为条件的唯一行

[英]Getting unique rows conditioned on year pandas python dataframe

I have a dataframe of this form. 我有这种形式的数据框。 However, In my final dataframe, I'd like to only get a dataframe that has unique values per year. 但是,在我的最终数据框中,我只想获得一个每年具有唯一值的数据框。

     Name                    Org             Year
4    New York University     doclist[1]  2004
5    Babson College          doclist[2]  2008
6    Babson College          doclist[5]  2008

So ideally, my dataframe will look like this instead 所以理想情况下,我的数据框将看起来像这样

4    New York University     doclist[1]  2004
5    Babson College          doclist[2]  2008

What I've done so far. 到目前为止我所做的。 I've used groupby by year, and I seem to be able to get the unique names by year. 我按年使用groupby,而且似乎可以按年获得唯一的名称。 However, I am stuck because I lose all the other information, such as the "Org" column. 但是,我被困住了,因为我丢失了所有其他信息,例如“组织”列。 Advice appreciated! 咨询表示赞赏!

#how to get unique rows per year?
q = z.groupby(['Year'])

#print q.head()
#q.reset_index(level=0, drop=True)

q.Name.apply(lambda x: np.unique(x))

For this I get the following output. 为此,我得到以下输出。 How do I include the other column information as well as removing the secondary index (eg: 6, 68, 66, 72) 如何包含其他列信息以及如何删除二级索引(例如:6、68、66、72)

Year                                          
2008  6                                        Babson College
      68               European Economic And Social Committee
      66                                       European Union
      72                     Ewing Marion Kauffman Foundation

If all you want to do is keep the first entry for each name, you can use drop_duplicates Note that this will keep the first entry based on however your data is sorted, so you may want to sort first if you want keep a specific entry. 如果您要做的只是保留每个名称的第一个条目,则可以使用drop_duplicates注意,这将保留第一个条目,但drop_duplicates是您的数据已排序,因此如果要保留一个特定的条目,则可能要先排序。

In [98]: q.drop_duplicates(subset='Name')
Out[98]: 
                      Name         Org  Year
0      New York University  doclist[1]  2004
1           Babson College  doclist[2]  2008

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM