Bounding Box Regression
Bounding Box Regression
What is bounding box regression?
Find a to map the raw input window to real window , we get . Figure 1
Why we need it?
To learn a transformation that maps a proposed box to a ground-truth box . You know, if we don’t define something to optimize, we cannot achive the goals. Figure 2
What is IoU(Intersection over Union)?
Notice the green and red box below. Figure 3
We use Bounding Box Regression to adjust that red window to approach green window. Figure 4
## get IoU according to box parameters # import the necessary packages from collections import namedtuple import numpy as np import cv2 # define the `Detection` object Detection = namedtuple("Detection", ["image_path", "gt", "pred"]) def bb_intersection_over_union(boxA, boxB): # determine the (x, y)-coordinates of the intersection rectangle xA = max(boxA, boxB) yA = max(boxA, boxB) xB = min(boxA, boxB) yB = min(boxA, boxB) # compute the area of intersection rectangle interArea = (xB - xA) * (yB - yA) # compute the area of both the prediction and ground-truth # rectangles boxAArea = (boxA - boxA) * (boxA - boxA) boxBArea = (boxB - boxB) * (boxB - boxB) # compute the intersection over union by taking the intersection # area and dividing it by the sum of prediction + ground-truth # areas - the interesection area iou = interArea / float(boxAArea + boxBArea - interArea) # return the intersection over union value return iou
You see, it’s easy to calculate IOU using python.
How we find that function ?
Target mapping :
How to map to in figure 1?
That means, bounding box regression is to learning , we can find that we could use to learn how to map (pool5 features of proposal P) to , which is a simple linear regression problem, and we can use the formula - to get .
is forecast value, but we need to be . So we still need to find the difference between and .
What’s the difference between and ?
We want to be as close to as possible, that means, we need to find a bounding box input: features of proposal (CNN pool5 output) bounding box output: . Then we can map to .
How to use the bounding box output to get ?
From above we can know that we could get from , not . Notice, from to means:
That means if we reduce the error between and , we can really map our to , because we have bounding box help use to learn the real for us.
We can reduce the loss function: to accomplish our goal. Here means the input to bounding box.
Also to regularize the loss function, we can use: and use gredient descent to get .
Why we can use ?
When IoU > (like 0.6), we can think the transformation be a linear transformation, and use that function to adjust. When , we think it as linear.
Welcome to share or comment on this post: