如何过滤此python数据帧

Question

问候我尝试获得有效行的最小尺寸数据框

import pandas as pd
import random

columns = ['x0','y0']
df_ = pd.DataFrame(index=range(0,30), columns=columns)
df_ = df_.fillna(0)


columns1 = ['x1','y1']

df = pd.DataFrame(index=range(0,11), columns=columns1)



for index, row in df.iterrows():
   df.loc[index, "x1"] = random.randint(1, 100)
   df.loc[index, "y1"] = random.randint(1, 100)

df_ = df_.combine_first(df)    

df = pd.DataFrame(index=range(0,17), columns=columns1)



for index, row in df.iterrows():
    df.loc[index, "x2"] = random.randint(1, 100)
    df.loc[index, "y2"] = random.randint(1, 100)

df_ = df_.combine_first(df)

从示例中，数据帧应输出从0到10的行，其余部分被过滤掉。 我认为要保留一个计数器来跟踪最小行或使用pandasql，或者是否有技巧从数据框的大小中获取此信息

实际上，我将附加500+个具有各种大小的文件以附加并使用它进行一些分析。 因此，性能是一个考虑因素。

-python的学生

Answer 1

如果要删除具有NaN的行，请使用dropna（此处为前十行）：

In [11]: df_.dropna()
Out[11]:
    x0  x1  x2  y0  y1  y2
0    0  49  58   0  68   2
1    0   2  37   0  19  71
2    0  26  95   0  12  17
3    0  87   5   0  70  69
4    0  84  77   0  70  92
5    0  71  98   0  22   5
6    0  28  95   0  70  15
7    0  31  19   0  24  31
8    0   9  37   0  55  29
9    0  30  53   0  15  45
10   0   8  61   0  74  41

但是，完成整个过程的一种更清洁，更高效，更快捷的方法是仅更新第一行（我假设随机整数只是您生成一些示例数据帧）。

让我们将DataFrames存储在一个列表中：

In [21]: df1 = pd.DataFrame([[1, 2], [np.nan, 4]], columns=['a', 'b'])

In [22]: df2 = pd.DataFrame([[1, 2], [5, 6], [7, 8]], columns=['a', 'c'])

In [23]: dfs = [df1, df2]

取最小长度：

In [24]: m = min(len(df) for df in dfs)

首先使用所需的行和列创建一个空的DataFrame：

In [25]: columns = reduce(lambda x, y: y.columns.union(x), dfs, [])

In [26]: res = pd.DataFrame(index=np.arange(m), columns=columns)

为了有效地做到这一点，我们将进行更新，并就此进行这些更改-仅在此DataFrame *上：

In [27]: for df in dfs:
             res.update(df)

In [28]: res
Out[28]:
   a  b  c
0  1  2  2
1  5  4  6

*如果我们不这样做，或者正在使用combine_first或类似方法，则很可能会有大量复制（正在创建新的DataFrame），这会使事情变慢。

注意： combine_first不提供就地标志...您可以使用combine_first ，但这也更复杂（效率更低）。 使用IIRC（在哪里进行手动更新）也很简单，而IIRC是组合在后台进行的操作。

如何过滤此python数据帧

问题描述

1 个解决方案

解决方案1
0 2015-02-07 06:31:31

如何过滤此python数据帧

问题描述

1 个解决方案

解决方案1 0 2015-02-07 06:31:31

解决方案1
0 2015-02-07 06:31:31