[英]How can I calculate the F1-score and other classification metrics from a faster-RCNN? (object detection in PyTorch)
我正在努力解決這個問題,但很難理解如何在 object 檢測任務中計算 f1 分數。
理想情況下,我想知道圖像中每個目標的假陽性、真陽性、假陰性和真陰性(這是一個二進制問題,圖像中的 object 作為一個 class,背景作為另一類)。
最后我還想從圖像中提取誤報邊界框。 我不確定這是否有效,但我會將圖像名稱和 bbox 預測以及它們是否是誤報等保存到 numpy 文件中。
我目前將此設置為批量大小為 1,因此我可以對每個圖像應用非最大抑制算法:
def apply_nms(orig_prediction, iou_thresh=0.3):
# torchvision returns the indices of the bboxes to keep
keep = torchvision.ops.nms(orig_prediction['boxes'], orig_prediction['scores'], iou_thresh)
final_prediction = orig_prediction
final_prediction['boxes'] = final_prediction['boxes'][keep]
final_prediction['scores'] = final_prediction['scores'][keep]
final_prediction['labels'] = final_prediction['labels'][keep]
return final_prediction
cpu_device = torch.device("cpu")
model.eval()
with torch.no_grad():
for images, targets in valid_data_loader:
images = list(img.to(device) for img in images)
outputs = model(images)
outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
predictions = apply_nms(outputs[0], iou_thresh=0.3)
關於如何確定上述分類指標和 f1 分數的任何想法?
我在 torchvision 提供的評估代碼中遇到過這一行,想知道它是否會幫助我前進:
res = {target["image_id"].item(): output for target, output in zip(targets, outputs)}
object 檢測中術語精度、召回率和 F1 分數的使用有點令人困惑,因為這些指標最初用於二元評估任務(例如分類)。 無論如何,在 object 檢測中它們的含義略有不同:
let: TP - 一組成功匹配到 ground truth 的預測對象 object(高於你使用的任何數據集的 IOU 閾值,通常為 0.5 或 0.7) FP - 一組未成功匹配到 ground truth 的預測對象 object FN - 未成功匹配預測的地面實況對象集 object
Precision: TP / (TP + FP)
Recall: TP / (TP + FN)
F1: 2*Precision*Recall /(Precision + Recall)
您可以找到許多匹配步驟(匹配地面實況和預測對象)的實現,通常提供用於評估的數據集,或者您可以自己實現。 我會建議py-motmetrics 存儲庫。
IOU 計算的簡單實現可能如下所示:
def iou(self,a,b):
"""
Description
-----------
Calculates intersection over union for all sets of boxes in a and b
Parameters
----------
a : tensor of size [batch_size,4]
bounding boxes
b : tensor of size [batch_size,4]
bounding boxes.
Returns
-------
iou - float between [0,1]
average iou for a and b
"""
area_a = (a[2]-a[0]) * (a[3]-a[1])
area_b = (b[2]-b[0]) * (b[3]-b[1])
minx = max(a[0], b[0])
maxx = min(a[2], b[2])
miny = max(a[1], b[1])
maxy = min(a[3], b[3])
intersection = max(0, maxx-minx) * max(0,maxy-miny)
union = area_a + area_b - intersection
iou = intersection/union
return iou
所以我已經實現了要在全球范圍內計算的 f1 分數——這是針對整個數據集的。
下面的實現給出了確定驗證集的 f1 分數的示例。
model 的輸出是字典格式,因此我們需要將它們放入張量中,如下所示:
predicted_boxes (list): [[train_index, class_prediction, prob_score, x1, y1, x2, y2],[],...[]]
train_index:特定bbox來自的圖像的索引class_prediction:integer代表class預測的值prob_score:bbox x1,y1,x2,y2的輸出客觀性得分:(x1,y1)和(x2,y2)bbox坐標
gt_boxes (list): [[train_index, class_prediction, prob_score, x1, y1, x2, y2],[],...[]]
其中prob_score
對於地面實況輸入僅為1
(只要指定並填寫該維度,它實際上可以是任何東西)。
IoU 也在 torchvision 中實現,這使一切變得容易得多。
我希望這對其他人有幫助,因為我在其他任何地方都找不到 object 檢測中 f1 分數的另一個實現。
model_test.eval()
with torch.no_grad():
global_tp = []
global_fp = []
global_gt = []
valid_df_unique = get_unique(valid_df['image_id'])
for images, targets in valid_data_loader:
images = list(img.to(device) for img in images)
outputs = model_test(images)
outputs = [{k: v.to(cpu_device) for k, v in t.items()} for t in outputs]
predictions = apply_nms(outputs[0], iou_thresh=0.1)
# looping through each class
for c in range(num_classes):
# detections (list): predicted_boxes that are class c
detections = []
# ground_truths (list): gt_boxes that are class c
ground_truths = []
for b,la,s in zip(predictions['boxes'], predictions['labels'],predictions['scores']):
updated_detection_array = [targets[0]['image_id'].item(), la.item(), s.item(), b[0].item(),b[1].item(),b[2].item(),b[3].item()]
if la.item() == c:
detections.append(updated_detection_array)
for b,la in zip(targets[0]['boxes'], targets[0]['labels']):
updated_gt_array = [targets[0]['image_id'].item(), la.item(), 1, b[0].item(),b[1].item(),b[2].item(),b[3].item()]
if la.item() == c:
ground_truths.append(updated_gt_array)
global_gt.append(updated_gt_array)
# use Counter to create a dictionary where key is image # and value
# is the # of bboxes in the given image
amount_bboxes = Counter([gt[0] for gt in ground_truths])
# goal: keep track of the gt bboxes we have already "detected" with prior predicted bboxes
# key: image #
# value: tensor of 0's (size is equal to # of bboxes in the given image)
for key, value in amount_bboxes.items():
amount_bboxes[key] = torch.zeros(value)
# sort over the probabiliity scores of the detections
detections.sort(key = lambda x: x[2], reverse = True)
true_Positives = torch.zeros(len(detections))
false_Positives = torch.zeros(len(detections))
total_gt_bboxes = len(ground_truths)
false_positives_frame = []
true_positives_frame = []
# iterate through all detections in given class c
for detection_index, detection in enumerate(detections):
# detection[0] indicates image #
# ground_truth_image: the gt bbox's that are in same image as detection
ground_truth_image = [bbox for bbox in ground_truths if bbox[0] == detection[0]]
# num_gt_boxes: number of ground truth boxes in given image
num_gt_boxes = len(ground_truth_image)
best_iou = 0
best_gt_index = 0
for index, gt in enumerate(ground_truth_image):
iou = torchvision.ops.box_iou(torch.tensor(detection[3:]).unsqueeze(0),
torch.tensor(gt[3:]).unsqueeze(0))
if iou > best_iou:
best_iou = iou
best_gt_index = index
if best_iou > iou_threshold:
# check if gt_bbox with best_iou was already covered by previous detection with higher confidence score
# amount_bboxes[detection[0]][best_gt_index] == 0 if not discovered yet, 1 otherwise
if amount_bboxes[detection[0]][best_gt_index] == 0:
true_Positives[detection_index] = 1
amount_bboxes[detection[0]][best_gt_index] == 1
true_positives_frame.append(detection)
global_tp.append(detection)
else:
false_Positives[detection_index] = 1
false_positives_frame.append(detection)
global_fp.append(detection)
else:
false_Positives[detection_index] = 1
false_positives_frame.append(detection)
global_fp.append(detection)
# remove nan values from ground truth list as list contains every mitosis image row entry (including images with no targets)
global_gt_updated = []
for gt in global_gt:
if math.isnan(gt[3]) == False:
global_gt_updated.append(gt)
global_fn = len(global_gt_updated) - len(global_tp)
precision = len(global_tp)/ (len(global_tp)+ len(global_fp))
recall = len(global_tp)/ (len(global_tp) + global_fn)
f1_score = 2* (precision * recall)/ (precision + recall)
print(len(global_tp))
print(recall)
print(precision)
print(f1_score)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.