简体   繁体   English

如何根据索引然后根据条件选择数据帧的行?

[英]How to select rows of dataframe based on index then on condition?

I am running a blob detection routine on lung nodule images. 我正在对肺结节图像执行斑点检测例程。 For each image the blob detector returns an array of detected blobs along with their coordinates and radius. 对于每个图像,斑点检测器将返回检测到的斑点及其坐标和半径的数组。 I then check if the ground truth lung nodule coordinates are within any of the blobs and return a true or false (this then becomes the training label for the next stage of the process). 然后,我检查地面真相肺结节坐标是否在任何斑点内,并返回true或false(这将成为该过程下一阶段的训练标签)。

The problem I am having is that for some images the nodule is within more than one blob so the image therefore has two (or more) true positive detections rather than one. 我遇到的问题是,对于某些图像,结节位于一个以上的斑点内,因此图像具有两个(或多个)真正的阳性检测结果,而不是一个。 For these cases I would therefore like to find the blob that is closest to the ground truth and mark that single blob as the true positive. 因此,对于这些情况,我想找到最接近地面实况的斑点并将该单一斑点标记为真正的阳性。

However I am struggling to slice the dataframe in such a way that only the positive detections per image are compared. 但是,我正在努力以这种方式对数据帧进行切片,以便仅比较每个图像的阳性检测。 The dataframe I have is like this: 我的数据框是这样的:

                    Blob_Y  Blob_X     Blob_R  True_X  True_Y  Label
JPCLN001.npy 0       840.0   220.0  16.970563   817.0   346.0      0
             1       832.0   496.0  16.970563   817.0   346.0      0
             2       496.0   872.0  69.767869   805.0   483.5      1
             3       480.0   796.0  16.970563   805.0   483.5      1
             4       820.0   888.0  56.568542   817.0   346.0      0
JPCLN002.npy 5       840.0   220.0  16.970563   817.0   346.0      0
             6       832.0   496.0  16.970563   817.0   346.0      1
             7       824.0   256.0  30.169889   817.0   346.0      0
             8       824.0   172.0  16.970563   817.0   346.0      0
             9       820.0   888.0  56.568542   817.0   346.0      0

For image JPCLN001.npy I want to select the rows where label equals 1, then calculate the Pythagorean distance from (True_X, True_Y) and (Blob_X, Blob_Y) for rows 2 and 3. The blob closest to the true coordinates needs to be assigned a label of 1 and the other is assumed to be a false positive and labelled as 0. 对于图像JPCLN001.npy我想选择标签等于1的行,然后为第2行和第3行从(True_X, True_Y)(Blob_X, Blob_Y)计算勾股距离。需要分配最接近真实坐标的Blob标记为1且另一个标记为假阳性并标记为0。

There are four images in the dataframe that need this action performed. 数据框中有四个图像需要执行此操作。

I have tried doing this by selecting the relevant rows for each image and assigning them to a new dataframe, doing the distance calculation and then reinserting these rows back into the original dataframe like so: 我尝试通过为每个图像选择相关行并将它们分配给新的数据帧,进行距离计算然后将这些行重新插入到原始数据帧中来进行操作,如下所示:

df = blobs.loc['JPCLN061.npy']
df = df[df['Label'] == 1]

df = df.assign(dist = np.sqrt((df['Blob_X']-df['True_X'])**2 + (df['Blob_Y']-df['True_Y'])**2))
df['Label'][df['dist'] == df['dist'].max()] = 0

df.drop(['dist'], inplace = True, axis = 1)

blobs.update(df)

blobs.update(df) does not update the original dataframe (which I think is due to a mismatch between the indices of each dataframe). blobs.update(df)不会更新原始数据帧(我认为是由于每个数据帧的索引之间不匹配)。 My method also seems rather cumbersome so if someone could help me how to do this it would be really appreciated as I've been working on this most of the day! 我的方法似乎也很麻烦,因此,如果有人可以帮助我做到这一点,那将是非常感谢,因为我整天都在努力!

As a quick working example, how about: 作为一个快速的示例,如何:

df = blobs.loc['JPCLN061.npy']
df = df[df['Label'] == 1]
df = df.assign(dist = np.sqrt((df['Blob_X']-df['True_X'])**2 + (df['Blob_Y']- 
df['True_Y'])**2))
df = df.sort_values('dist', ascending=False)
blobs.loc[('JPCLN061.npy', df.index[0]), 'Label'] = 0

If you post code to create the df, I am happy to help come up with a more efficient manner! 如果您发布代码来创建df,我们很乐意以更有效的方式帮助您!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM