I am working on using opencv to get the total number of people in a video stream. The problem is that my code only captures the number of people in frame without taking into account all of the frames in the stream. I thought about extracting all the faces detected from a video or webcam and then comparing them. My question here is how do I get the exact count of people by comparing the faces which are extracted? Or is there any other approach to get the total count?
This is the function which detects faces and gives gender and count(but its only for that frame )
def start_webcam(model_gender, window_size, window_name='live', update_time=50):
cv2.namedWindow(window_name, WINDOW_NORMAL)
if window_size:
width, height = window_size
cv2.resizeWindow(window_name, width, height)
video_feed = cv2.VideoCapture(0)
video_feed.set(3, width)
video_feed.set(4, height)
read_value, webcam_image = video_feed.read()
delay = 0
init = True
while read_value:
read_value, webcam_image = video_feed.read()
webcam_image=cv2.flip(webcam_image,1,0)
faces = face_cascade.detectMultiScale(webcam_image)
for normalized_face, (x, y, w, h) in find_faces(webcam_image):
if init or delay == 0:
init = False
gender_prediction = model_gender.predict(normalized_face)
if (gender_prediction[0] == 0):
cv2.rectangle(webcam_image, (x,y), (x+w, y+h), (0,0,255), 2)
cv2.putText(webcam_image, 'female', (x,y-10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
else:
cv2.rectangle(webcam_image, (x,y), (x+w, y+h), (255,0,0), 2)
cv2.putText(webcam_image, 'male', (x,y-10), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (255,0,0), 2)
delay += 1
delay %= 20
cv2.putText(webcam_image, "Number of faces detected: " + str(len(faces)), (0,webcam_image.shape[0] -10), cv2.FONT_HERSHEY_TRIPLEX, 0.7, (255,255,255), 1)
cv2.imshow(window_name, webcam_image)
key = cv2.waitKey(update_time)
if key == ESC:
break
cv2.destroyWindow(window_name)
If I understood your question, in a nutshell you problem can be divided in the following pieces:
0 - detection: for each frame detect zero or more faces. The output of this step is a sequence of "events". Each event is a face and the coordinate of the region where the face was detected in the image:
evts = {{face0, (x0,y0,w0,h0)}, {face1, (x1,y1,w1,h1)}, ..., {faceN, (xN,yN,wN,hN)}}
for N + 1 detected faces.
1 - identification: the objective of this step is provide an ID for each event (face/region) detected in the previous step. So, for each face in evts or: I. The face is a "new face" so, a new ID is generated and assigned to the face II. The face is the same face detected in one of the previous frames, so you should assign the previous same ID for that face. The output of this step is a collection of assigned ID's:
ids = {id0, id1, id2, ..., idM}
2 - count: repeat step 1 and 2 up to the last frame. The size of the ids collection is the count of different faces in the video stream
The real problem
The real problem is: how to determine if an event (a face in this case) in frame X is the "same" face in the frame Y? Yes, this is the key problem. In your case, you should use a mix of approaches:
Giving the issues on poses changing, lighting and occlusion, the use of the position of previous detected face to identify the current can be more robust than the face recognition algorithm. This depends on your video and scenes.
There are several operational problems to implement a robust solution for this problem:
And all sort of CV related challenges. So, be ready to handle false positives/negatives rates.
Tips:
I wrote so much. Hope I actually have understood your question.
Try hashing all of the faces for each frame. Then store each hash in a set and get its size to find the number of faces in the video feed.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.