简体   繁体   中英

How to get the exact count of people by face detection?

I am working on using opencv to get the total number of people in a video stream. The problem is that my code only captures the number of people in frame without taking into account all of the frames in the stream. I thought about extracting all the faces detected from a video or webcam and then comparing them. My question here is how do I get the exact count of people by comparing the faces which are extracted? Or is there any other approach to get the total count?

This is the function which detects faces and gives gender and count(but its only for that frame )

def start_webcam(model_gender, window_size, window_name='live', update_time=50):
cv2.namedWindow(window_name, WINDOW_NORMAL)
if window_size:
    width, height = window_size
    cv2.resizeWindow(window_name, width, height)

video_feed = cv2.VideoCapture(0)
video_feed.set(3, width)
video_feed.set(4, height)
read_value, webcam_image = video_feed.read()


delay = 0
init = True
while read_value:
    read_value, webcam_image = video_feed.read()
    webcam_image=cv2.flip(webcam_image,1,0)
    faces = face_cascade.detectMultiScale(webcam_image)
    for normalized_face, (x, y, w, h) in find_faces(webcam_image):
      if init or delay == 0:
        init = False
        gender_prediction = model_gender.predict(normalized_face)
      if (gender_prediction[0] == 0):
          cv2.rectangle(webcam_image, (x,y), (x+w, y+h), (0,0,255), 2)
          cv2.putText(webcam_image, 'female', (x,y-10), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,0,255), 2)
      else:
          cv2.rectangle(webcam_image, (x,y), (x+w, y+h), (255,0,0), 2)
          cv2.putText(webcam_image, 'male', (x,y-10), cv2.FONT_HERSHEY_SIMPLEX, 1.5, (255,0,0), 2)

    delay += 1
    delay %= 20

    cv2.putText(webcam_image, "Number of faces detected: " + str(len(faces)), (0,webcam_image.shape[0] -10), cv2.FONT_HERSHEY_TRIPLEX, 0.7,  (255,255,255), 1)
    cv2.imshow(window_name, webcam_image)
    key = cv2.waitKey(update_time)
    if key == ESC:
        break

cv2.destroyWindow(window_name)

If I understood your question, in a nutshell you problem can be divided in the following pieces:

0 - detection: for each frame detect zero or more faces. The output of this step is a sequence of "events". Each event is a face and the coordinate of the region where the face was detected in the image:

evts = {{face0, (x0,y0,w0,h0)}, {face1, (x1,y1,w1,h1)}, ..., {faceN, (xN,yN,wN,hN)}} 

for N + 1 detected faces.

1 - identification: the objective of this step is provide an ID for each event (face/region) detected in the previous step. So, for each face in evts or: I. The face is a "new face" so, a new ID is generated and assigned to the face II. The face is the same face detected in one of the previous frames, so you should assign the previous same ID for that face. The output of this step is a collection of assigned ID's:

ids = {id0, id1, id2, ..., idM}

2 - count: repeat step 1 and 2 up to the last frame. The size of the ids collection is the count of different faces in the video stream

The real problem

The real problem is: how to determine if an event (a face in this case) in frame X is the "same" face in the frame Y? Yes, this is the key problem. In your case, you should use a mix of approaches:

  • perform the FACE RECOGNITION (face recognition is a different thing than face detection). Luckily last years there are a lot of improvements in this field and you can more or less easily use openface or similar API's in your code to achieve your needs. Don't waste your time trying the Viola's based algorithms for face recognition (they were introduced in 2001 and maybe no so accurate for practical needs today).
  • take in consideration the spatial and temporal locality principle and maximize the plausibility to find the same face in a neighbourhood region for successive frames

Giving the issues on poses changing, lighting and occlusion, the use of the position of previous detected face to identify the current can be more robust than the face recognition algorithm. This depends on your video and scenes.

There are several operational problems to implement a robust solution for this problem:

  • Same faces in different poses
  • Faces in different scales
  • Occlusion (is always a problem)
  • Realtime requirements

And all sort of CV related challenges. So, be ready to handle false positives/negatives rates.

Tips:

  • Try your solution against several different videos to avoid overfitting.
  • If in the videos the faces are moving the Kalman Estimator can be useful.

I wrote so much. Hope I actually have understood your question.

Try hashing all of the faces for each frame. Then store each hash in a set and get its size to find the number of faces in the video feed.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM