简体   繁体   English

哪种是放宽熊猫数据帧的最有效方法?

[英]Which is the most efficient way of flattening down a pandas dataframe?

I have a large pandas dataframe with 8 columns and several NaN values: 我有一个大型的pandas数据帧,有8列和几个NaN值:

0   1   2   3   4   5   6   7   8
1   Google, Inc. (Date 11/07/2016)  NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
2   Apple Inc. (Date 07/01/2016)    Amazon (Date 11/01/2016)    NaN     NaN     NaN     NaN     NaN     NaN     NaN
3   IBM, Inc. (Date 11/08/2016)     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
4   Microsoft (Date 11/10/2016)     Google, Inc. (Date 11/10/1990)  Google, Inc. (Date 11/07/2016)  Samsung (Date 05/02/2016)   NaN     NaN     NaN     NaN     NaN

How can I flatten down it like this: 我怎样才能像这样压扁它:

0   companies
1   Google, Inc. (Date 11/07/2016)
2   Apple Inc. (Date 07/01/2016)
3   Amazon (Date 11/01/2016)
4   IBM, Inc. (Date 11/08/2016)
5   Microsoft (Date 11/10/2016)
6   Google, Inc. (Date 11/10/1990)
7   Google, Inc. (Date 11/07/2016)
8   Samsung (Date 05/02/2016)

I read the docs and tried: 我阅读了文档并尝试过:

df.iloc[:,0]

The problem is that I lost information and order over the other columns. 问题是我丢失了其他列的信息和订单。 I idea of how to flat without lost data in the other cells and order?. 我想到如何平坦而不丢失其他单元格中的数据和顺序?

You can stack the columns and optionally reset the index. 您可以堆叠列并可选地重置索引。 By default, stack drops NaN's. 默认情况下,堆栈会丢弃NaN。

df.stack()
Out: 
0  0    Google, Inc. (Date 11/07/2016) 
1  0      Apple Inc. (Date 07/01/2016) 
   1          Amazon (Date 11/01/2016) 
2  0       IBM, Inc. (Date 11/08/2016) 
3  0       Microsoft (Date 11/10/2016) 
   1    Google, Inc. (Date 11/10/1990) 
   2    Google, Inc. (Date 11/07/2016) 
   3         Samsung (Date 05/02/2016) 
dtype: object

df.stack().reset_index(drop=True)
Out: 
0    Google, Inc. (Date 11/07/2016) 
1      Apple Inc. (Date 07/01/2016) 
2          Amazon (Date 11/01/2016) 
3       IBM, Inc. (Date 11/08/2016) 
4       Microsoft (Date 11/10/2016) 
5    Google, Inc. (Date 11/10/1990) 
6    Google, Inc. (Date 11/07/2016) 
7         Samsung (Date 05/02/2016) 
dtype: object

This probably do the trick: 这可能是诀窍:

df = pd.DataFrame([
        ["Google, Inc. (Date 11/07/2016)", float("NaN")], 
        ["Apple Inc. (Date 07/01/2016)", "Amazon (Date 11/01/2016)"]])
unstacked = df.T.unstack()
unstacked.dropna(inplace=True)
unstacked.reset_index(drop=True, inplace=True)
unstacked

Output: 输出:

0    Google, Inc. (Date 11/07/2016)
1      Apple Inc. (Date 07/01/2016)
2          Amazon (Date 11/01/2016)
dtype: object

PS Please, take a look at this question on providing good pandas examples in questions. PS请看一下这个关于在问题中提供好的熊猫示例的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM