简体   繁体   English

比较两列数据框中的值

[英]compare values in two columns of data frame

I have the following two columns in pandas data frame 我在pandas数据框中有以下两列

     256   Z
0     2    2
1     2    3
2     4    4
3     4    9

There are around 1594 rows. 大约有1594行。 '256' and 'Z' are column headers whereas 0,1,2,3,4 are row numbers (1st column above). '256'和'Z'是列标题,而0,1,2,3,4是行号(上面第1列)。 I want to print row numbers where value in Column '256' is not equal to values in column 'Z'. 我想打印行号,其中列'256'中的值不等于列'Z'中的值。 Thus output in the above case will be 1, 3. How can this comparison be made in pandas? 因此,上述情况下的输出将为1,3。如何在熊猫中进行比较? I will be very grateful for help. 我将非常感谢你的帮助。 Thanks. 谢谢。

Create the data frame: 创建数据框:

import pandas as pd
df = pd.DataFrame({"256":[2,2,4,4], "Z": [2,3,4,9]})

ouput: 输出继电器:

    256 Z
0   2   2
1   2   3
2   4   4
3   4   9

After subsetting your data frame, use the index to get the id of rows in the subset: 对数据框进行子集化后,使用索引获取子集中行的id:

row_ids = df[df["256"] != df.Z].index

gives

Int64Index([1, 3], dtype='int64')

Another way could be to use the .loc method of pandas.DataFrame which returns the indexed location of the rows that qualify the boolean indexing: 另一种方法是使用pandas.DataFrame.loc方法,该方法返回限定布尔索引的行的索引位置:

df.loc[(df['256'] != df['Z'])].index

with an output of: 输出:

Int64Index([1, 3], dtype='int64')

This happens to be the quickest of the listed implementations as can be seen in ipython notebook : 这恰好是列出的实现中最快的,如ipython notebook

import pandas as pd
import numpy as np

df = pd.DataFrame({"256":np.random.randint(0,10,1594), "Z": np.random.randint(0,10,1594)})

%timeit df.loc[(df['256'] != df['Z'])].index
%timeit row_ids = df[df["256"] != df.Z].index
%timeit rows = list(df[df['256'] != df.Z].index)
%timeit df[df['256'] != df['Z']].index

with an output of: 输出:

1000 loops, best of 3: 352 µs per loop
1000 loops, best of 3: 358 µs per loop
1000 loops, best of 3: 611 µs per loop
1000 loops, best of 3: 355 µs per loop

However, when it comes down to 5-10 microseconds it doesn't make a significant difference, but if in the future you have a very large data set timing and efficiency may become a much more important issue. 但是,当它降到5-10微秒时,它没有显着差异,但如果将来你有一个非常大的数据集时间和效率可能会成为一个更重要的问题。 For your relatively small data set of 1594 rows I would go with the solution that looks the most elegant and promotes the most readability. 对于1594行的相对较小的数据集,我会选择看起来最优雅的解决方案并提升最可读性。

You can try this: 你可以试试这个:

# Assuming your DataFrame is named "frame"
rows = list(frame[frame['256'] != frame.Z].index)

rows will now be a list containing the row numbers for which those two column values are not equal. rows现在将是一个列表,其中包含这两个列值不相等的行号。 So with your data: 所以你的数据:

>>> frame
   256  Z
0    2  2
1    2  3
2    4  4
3    4  9

[4 rows x 2 columns]
>>> rows = list(frame[frame['256'] != frame.Z].index)
>>> print(rows)
[1, 3]

Assuming df is your dataframe, this should do it: 假设df是你的数据帧,这应该这样做:

df[df['256'] != df['Z']].index

yielding: 收益:

Int64Index([1, 3], dtype='int64')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM