简体   繁体   English

如何将 dataframe 中行的每个值与之前行中的每个值与 python 进行比较?

[英]How to compare each value of row in a dataframe with each value in the row before with python?

I have a dataframe, which looks something like this (number of columns and rows can differ):我有一个 dataframe,它看起来像这样(列数和行数可以不同):

                0         1         2
2015-01-02    ISIN1     ISIN2     ISIN3
2015-05-04    ISIN4     ISIN2     ISIN5
2015-09-01    ISIN4     ISIN5     ISIN6
2016-01-04    ISIN7     ISIN8     ISIN2
2016-05-02    ISIN9     ISIN7     ISIN10
2016-09-01    ISIN11    ISIN12    ISIN13
2017-01-02    ISIN11    ISIN12    ISIN14
2017-05-02    ISIN12    ISIN11    ISIN15
2017-09-01    ISIN12    ISIN16    ISIN17
2018-01-02    ISIN16    ISIN11    ISIN18
2018-05-02    ISIN4     ISIN8     ISIN7
2018-09-03    ISIN12    ISIN7     ISIN19
2019-01-02    ISIN20    ISIN21    ISIN22
2019-05-02    ISIN13    ISIN7     ISIN8
2019-09-02    ISIN23    ISIN24    ISIN15
2020-01-02    ISIN25    ISIN23    ISIN24
2020-05-04    ISIN24    ISIN26    ISIN4

My task is now to compare each value of each row with each value of the row before.我现在的任务是将每行的每个值与之前行的每个值进行比较。 I want to know if the value is in the row before or not.我想知道该值是否在之前的行中。 I want to get two dataframes as result.我想得到两个数据框作为结果。

  1. Keep the values which are not in the row before:保留前一行中没有的值:

     0 1 2 2015-01-02 ISIN1 ISIN2 ISIN3 2015-05-04 ISIN4 ISIN5 2015-09-01 ISIN6 2016-01-04 ISIN7 ISIN8 ISIN2 2016-05-02 ISIN9 ISIN10 2016-09-01 ISIN11 ISIN12 ISIN13 2017-01-02 ISIN14 2017-05-02 ISIN15 2017-09-01 ISIN16 ISIN17 2018-01-02 ISIN11 ISIN18 2018-05-02 ISIN4 ISIN8 ISIN7 2018-09-03 ISIN12 ISIN19 2019-01-02 ISIN20 ISIN21 ISIN22 2019-05-02 ISIN13 ISIN7 ISIN8 2019-09-02 ISIN23 ISIN24 ISIN15 2020-01-02 ISIN25 2020-05-04 ISIN26 ISIN4
  2. Keep the values which are in the row before:保留前一行中的值:

     0 1 2 2015-01-02 2015-05-04 ISIN2 2015-09-01 ISIN4 ISIN5 2016-01-04 2016-05-02 ISIN7 2016-09-01 2017-01-02 ISIN11 ISIN12 2017-05-02 ISIN12 ISIN11 2017-09-01 ISIN12 2018-01-02 ISIN16 2018-05-02 2018-09-03 ISIN7 2019-01-02 2019-05-02 2019-09-02 2020-01-02 ISIN23 ISIN24 2020-05-04 ISIN24

What I've explored so far:到目前为止我探索的内容:

for i in range(len(df)):
    print(np.isin(df.values[i, :], df.shift().values[i, :]))

creates this:创建这个:

[False False False]
[False  True False]
[ True  True False]
[False False False]
[False  True False]
[False False False]
[ True  True False]
[ True  True False]
[ True False False]
[ True False False]
[False False False]
[False  True False]
[False False False]
[False False False]
[False False False]
[False  True  True]
[ True False False]

With appending this values to a list I would be able to create a new dataframe.通过将此值附加到列表中,我将能够创建一个新的 dataframe。 But I think there must be a better way.但我认为必须有更好的方法。

Does anyone have a clue how to do it without iterating through the dataframe?有没有人知道如何在不遍历 dataframe 的情况下做到这一点?

Thank you very much!非常感谢!

Best regards, nepy最好的问候,内皮

Here is a way to replace duplicate values by NaN:这是一种用 NaN 替换重复值的方法:

df = pd.DataFrame(dict(a=[1,1,2,2,4], b=[0,5,6,6,8]), index=np.arange(5)+100)
mask = np.full_like(df, False, dtype=bool)
mask[1:] =  df.iloc[1:].reset_index(drop=True) == df.iloc[:-1].reset_index(drop=True)
df[mask] = None

The reset_index operations are needed because otherwise, pandas will attempt to do the == comparison on matching row indices.需要reset_index操作,否则 pandas 将尝试对匹配的行索引进行==比较。

Original DataFrame:原装DataFrame:

     a  b
100  1  0
101  1  5
102  2  6
103  2  6
104  4  8

After:后:

       a    b
100  1.0  0.0
101  NaN  5.0
102  2.0  6.0
103  NaN  NaN
104  4.0  8.0

For the reverse, you need to do相反,你需要做

mask = np.logical_not(mask)

Hey maybe You are looking for something like:嘿,也许您正在寻找类似的东西:

data = {'first': ['ok', 'none', 'ok', 'ok', 'ok', 'ok', 'ok', 'ok', 'none', 'ok'],
        'second': [1, 3, 4, 7, 8, 2, 4, 9, 6, 9]}
df = pd.DataFrame(data, columns = ['first', 'second'])

df_results = df.eq(df.shift())
df_results.where(df_results != False, df)

Hope it help希望有帮助

I digged a deep further.我挖得更深了。 My solution is now:我现在的解决方案是:

import pandas as pd
import numpy as np

row_0 = np.array(['ISIN1', 'ISIN4', 'ISIN4', 'ISIN7', 'ISIN9', 'ISIN11', 'ISIN11', 'ISIN12', 'ISIN12', 'ISIN16', 'ISIN4', 'ISIN12', 'ISIN20', 'ISIN13', 'ISIN23', 'ISIN25', 'ISIN24'])
row_1 = np.array(['ISIN2', 'ISIN2', 'ISIN5', 'ISIN8', 'ISIN7', 'ISIN12', 'ISIN12', 'ISIN11', 'ISIN16', 'ISIN11', 'ISIN8', 'ISIN7', 'ISIN21', 'ISIN7', 'ISIN24', 'ISIN23', 'ISIN26'])
row_2 = np.array(['ISIN3', 'ISIN5', 'ISIN6', 'ISIN2', 'ISIN10', 'ISIN13', 'ISIN14', 'ISIN15', 'ISIN17', 'ISIN18', 'ISIN7', 'ISIN19', 'ISIN22', 'ISIN8', 'ISIN15', 'ISIN24', 'ISIN4'])

data = {0:row_0, 1:row_1, 2:row_2}

df = pd.DataFrame(data)
print(df)
df_in_row_before = df[pd.DataFrame(np.array([np.isin(df.values[i, :], df.shift().values[i, :]) for i in range(len(df))]))]

print(df_in_row_before)
df_not_in_row_before = df[pd.DataFrame(np.array([np.isin(df.values[i, :], df.shift().values[i, :], invert=True) for i in range(len(df))]))]
print(df_not_in_row_before)

This makes exactly what i needed.这正是我所需要的。 But if anyone have a better solution I'm happy to look at.但是,如果有人有更好的解决方案,我很乐意看看。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将一个数据帧中的每一行与 Python 中另一个数据帧中的每一行进行比较 - Compare each row in one dataframe to each row in another dataframe in Python 如何有效地为python数据帧中的每一行乘以某个值 - how to efficiently multiply certain value for each row in python dataframe 在日期时间值之前和之后为数据框中的每一行生成偏移量 - Generate offsets for each row in a dataframe before and after datetime value 比较每行的数据框列中的元素 - Python - Compare elements in dataframe columns for each row - Python 如何使数据框中的每一行每一列都有一个值? - How to make each row in dataframe have one value for each column? 如何将每行的每个单词转换为 dataframe 的数值 - How to convert each word of each row to numeric value of a dataframe 如何比较 python 中 dataframe 中每一行的所有值 - how to compare all values for each row in a dataframe in python 如何比较项目列表是否出现在python数据框的每一行中 - How to compare if list of items are present in each row of a dataframe in python 如何使用每行的值作为比较 object 从整个 DataFrame 中获取满足条件的行数? - How to use each row's value as compare object to get the count of rows satisfying a condition from whole DataFrame? 如何使用循环在数据框中逐行读取并返回每个值或弹出每个值 - how to read row by row in a dataframe using a loop and return each value or pop out each value
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM