简体   繁体   English

从pandas DataFrame中选择行的问题

[英]Problem in selecting rows from pandas DataFrame

I encounter an error when I tried to extract rows from DataFrame where the rows match the conditions. 当我尝试从符合条件的DataFrame中提取行时遇到错误。
The code I used is a very simple one: 我使用的代码非常简单:

    for c in classes:
           print(X[y == c])

where X is features of samples in DataFrame, y is classes of the samples in DataFrame, and c is a list of class labels. 其中X是DataFrame中样本的特征, y是DataFrame中样本的类, c是类标签的列表。
I used this code for two data sets, and it works with a dataset, but does not with the other, although the both are formatted in the same way. 我将这段代码用于两个数据集,它适用于数据集,但不适用于另一个数据集,尽管两者的格式均相同。

With the data set I had an error, I printed y == c by: 对于数据集,我有一个错误,我通过以下方式打印y == c

     print(y == c)

and it returned this: 它返回了这个:

           Classes
     0     True
     1     True
     2     True
           ...
     4572  False
     4573  False
     4574  False

Therefore, I am assuming that the condition matching is working properly. 因此,我假设条件匹配工作正常。
However, when I print X[y == c] by: 但是,当我通过以下方式打印X[y == c]

  print(X[y == c])

the result is like this: 结果是这样的:

            0   1   2
     0    NaN NaN NaN
     1    NaN NaN NaN
     2    NaN NaN NaN
     3    NaN NaN NaN
           ...
     4574 NaN NaN NaN

To note, the X and y are ordinary DataFrame that looks like these: 注意, Xy是看起来像这样的普通DataFrame:
X X

                     0           1          2
     0       -3.786900    9.411757  -2.246594
     1      742.632101  -74.001353  -0.567936
     2     2019.854074  102.077111 -23.776775
     3      -93.048341    3.008569  -1.043599
           ...
     4754  -247.754953   -6.851270  -0.984777

y ÿ

            Classes
     0      0
     1      0
     2      0
            ...
     4572   2
     4573   2
     4574   2

Can this problem be sorted out? 这个问题可以解决吗?

如果在数据和相同的索引值的相同长度DataFrames使用DataFrame.eq通过柱Classes沿0轴布尔DataFrame ,然后检查所述至少一个True每行由DataFrame.any和由滤波器boolean indexing

df = X[X.eq(Y['Classes'], axis=0).any(axis=1)]

I figured out the cause of the problem. 我找出了问题的原因。
X and y are the same length. Xy的长度相同。 I had the above problem only with Data1, but not with Data2. 我只有Data1有上述问题,而Data2没有。

I checked the data type of y in Data1 and Data2, and found that y was 我检查了Data1和Data2中y的数据类型,发现y

  • < class 'pandas.core. <class'pandas.core。 series.Series > with Data2 (working) series.Series > with Data2(正在运行)
  • < class 'pandas.core. <class'pandas.core。 frame.DataFrame '> with Data1 (not working) frame.DataFrame '>与Data1(不起作用)


I converted y in Data1 into Series by: 我通过以下方式将Data1中的y转换为Series:

    y = y.ix[:,0]

,then extraction of rows came to work properly. ,然后提取行即可正常工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM