简体   繁体   English

在Pandas数据帧中查找任何单元格值= = x,并返回单元格值,列标题,行和相邻单元格值

[英]Find any cell values >= x in Pandas dataframe and return cell value, column header, row and neighbouring cell value

I realise this is quite a lengthy ask, but I have been trying to solve this for days now with no success and wondered if anyone might have some ideas. 我意识到这是一个冗长的问题,但我一直试图解决这个问题几天没有成功,并想知道是否有人可能有一些想法。

Consider a spreadsheet like so: 考虑一下这样的电子表格:

        apple1  grape1  apple2  grape2  apple3  grape3
1          0       4     -0.2     2       0       4
2          0       4       0      6       0       3
3        -0.1      2       0      4       0       4
4        -0.5      5       0      6     -0.2      5
5        -0.4      4       0      5       0       2
6          0       6     -0.1     5       0       3

I would like to search my dataframe for any cell with a value less than -0.1, and write the value, column header, row number, and neighbouring value out. 我想在我的数据帧中搜索值小于-0.1的任何单元格,并写出值,列标题,行号和邻近值。

At the start, I though it might be as simple as something along the lines of: 一开始,我可能会像以下一样简单:

Newlist()

if df >= -0.1:
   Newlist.append(cell.value)
   Newlist.append(row.value)
   Newlist.append(column.value)
   Newlist.append(cell.value.shift(1))

I fully realise the above makes no sense, but I hope it conveys the idea of what I've been trying to do. 我完全意识到上述内容毫无意义,但我希望它能传达出我一直想做的事情。

Next, I could convert the df to a list and work from there( using an ifnot >= -0.1 to delete objects?), but this seems inelegant and far from ideal. 接下来,我可以将df转换为列表并从那里开始工作(使用ifnot> = -0.1删除对象?),但这看起来不够优雅且远非理想。 I am however open to this if anyone can get it to work. 但是,如果有人能够让它工作,我对此持开放态度。

I must have looked at every stack exchange question ever posted on this without managing anything so apologies if I've overlooked something very obvious. 如果我忽略了一些非常明显的事情,我一定已经看过每一次发布的堆栈交换问题而没有管理任何事情。

Thanks! 谢谢!

First, to filter your dataframe you can use boolean indexing like this : 首先,要过滤您的数据帧,您可以使用这样的布尔索引

df[df >= -0.1]

This way, all the data that is not superior to -0.1 will be displayed as nan, you can then use Pandas.isnull() to identify them. 这样,所有不优于-0.1的数据都将显示为nan,然后您可以使用Pandas.isnull()来识别它们。

To get the row and columns of the data you want, you could turn your dataframe into an array with df.to_numpy() and iterate over the rows and columns with enumerate to keep the id of row/column you are currently iterating through : 要获取所需数据的行和列,可以将数据帧转换为带有df.to_numpy()的数组,并使用枚举迭代行和列,以保留当前正在迭代的行/列的ID:

my_data = df[df >= -0.1].to_numpy()
for idrow, row in enumerate(my_data):
   for idcol, col in enumerate(row):
       if not pd.isnull(col):
           print("Value :"+str(col)+" column:"+str(idcol)+" row:"+str(idrow))

This will result in something like this : 这将导致类似这样的事情:

Value :0.0 column:0 row:0
Value :4.0 column:1 row:0
Value :2.0 column:3 row:0

You can get columns name by using this in the loop : 你可以在循环中使用它来获取列名:

df.columns[idcol]

Once you got those ids, you can get the neighbouring values by direct access ie. 获得这些ID后,您可以通过直接访问来获取相邻值。

my_data[x][y]

Just remember to set a condition to not access value that are not in the array ! 只记得设置条件不访问不在数组中的值!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM