1、人脸检测(人脸五官位置检测,用于人脸对齐;人脸关键点检测;)
2、人脸跟踪:在视频中跟踪人脸位置
3、人脸验证:1:1,判断两张图片是不是同一个人
4、人脸识别:1:n,输入一张人脸,判断其属于数据库中的哪个人
5、人脸聚类:输入一批人脸,将属于同一人的自动归为一类
Google工程师Florian Schroff,Dmitry Kalenichenko,James Philbin提出了人脸识别FaceNet模型,该模型没有用传统的softmax的方式去进行分类学习,而是抽取其中某一层作为特征,学习一个从图像到欧式空间的编码方法,然后基于这个编码再做人脸识别、人脸验证和人脸聚类等。所以FacceNet模型学习并输出的是脸部的特征表示,不同人的面部特征表示是不同的,通过计算两个面部特征的距离,来达到不同面部分类的目的。当人脸特征距离小于1.06可看作是同一个人。 使用FaceNet可以使用已经训练好的模型,也可以自己训练模型。Facenet模型是一个通用的系统,采用CNN神经网络将人脸彩色图像或者灰度图像编码(embedings)为128维或者512维的数据,此数据表征了此人脸的特征,并映射到的欧几里得空间,我们可以根据两幅人像的欧几里得距离去判断两个人像的相似程度。使用FaceNet可以使用已经训练好的模型,也可以自己训练模型。Facenet模型是一个通用的系统,采用CNN神经网络将人脸图像映射到128维的欧几里得空间,我们可以根据两幅人像的欧几里得距离去判断两个人像的相似程度。
人脸对齐使用的是基于深度学习方法的mtcnn人脸检测,2016年Kaipeng Zhang, Zhanpeng Zhang, Zhifeng Li, Yu Qiao提出了人脸检测MTCNN(Multi-task Cascaded Convolutional Net works )模型,MTCNN模型流程如下:
利用开源的人脸识别工程facenet,迁移训练自己的模型:
作者提供了两个预训练模型,分别是基于CASIA-WebFace和VGGFace2的人脸数据库,如果下载需要翻墙到谷歌网盘才可以下载,我下载了其中一个20180402-114759模型,后续再分享网盘~
依赖包包括如下:
tensorflow == 1.7 scipy scikit-learn opencv-python h5py matplotlib pillow requests psutilfacenet计算的是图片中的两个人脸经过facenet的网络映射之后的欧氏距离
1、cd facenet/src目录下,即compare.py的目录下,对比1.jpg和2.jpg
2、运行compare.py文件
python compare.py 20180402-114759 1.jpg 2.jpg3、运行结果如下:
FacceNet模型学习并输出的是脸部的特征表示,不同人的面部特征表示是不同的,通过计算两个面部特征的距离,来达到不同面部分类的目的。当人脸特征距离小于1.06可看作是同一个人。
1、要训练自己的模型,需要自己采集到数据。自己准备的数据要满足如下的格式:
即以人名为文件夹,文件夹内,以人名加编号来存储图像文件,比如第一个文件夹内是chenzewu_0000.jpg、chenzewu_0001.jpg..........,我放在了facenet/data/mydatasets/目录下。
2、人脸对齐
对齐数据,我准备的人脸图片中的人脸位置都是比较随机的,头部位置在图片中的位置比较不固定,所以这里需要运行 MTCN算法来检测头部,并将其缩放到一个合适的尺寸,以供FaceNet模型进行训练,比如160*160 pixel。写成align.sh脚本如下:
export PYTHONPATH=PYTHONPATH:/$你放工程的位置/facenet/src python align_dataset_mtcnn.py /$你放工程的位置/facenet/data/mydatasets/ /$你放工程的位置/facenet/data/mydatasets_160/ --image_size 160 --margin 44如果你搜索其他的人的分享,可能会报错,文件位置不正确之类的错误,则吧align_dataset_mtcnn.py文件移动至src目录下再运行就OK了。
最后结果会保存在文件夹facenet/data/maydatasets_160中,如下图:
将视频中的每帧图像先用MTCNN提取人脸框,再将人脸框送入到facenet中进行特征提取,将提取的结果与数据库中的所有人脸进行比较,找到小于阈值的所有人脸特征,在阈值内找到欧式距离最小的所对应的图片的标签,即为这张人脸的label。如果所有的结果都大于一定的阈值,则判定为非数据库中的人。
提取数据库中人脸图像特征,并训练分类器(copy:https://github.com/bearsprogrammer/real-time-deep-face-recognition)
from __future__ import absolute_import from __future__ import division from __future__ import print_function import tensorflow as tf import numpy as np import argparse import facenet import detect_face import os import sys import math import pickle from sklearn.svm import SVC with tf.Graph().as_default(): with tf.Session() as sess: datadir = '/..Path to align face data../' dataset = facenet.get_dataset(datadir) paths, labels = facenet.get_image_paths_and_labels(dataset) print('Number of classes: %d' % len(dataset)) print('Number of images: %d' % len(paths)) print('Loading feature extraction model') modeldir = '/..Path to Pre-trained model../20170512-110547/20170512-110547.pb' facenet.load_model(modeldir) images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0") embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0") phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0") embedding_size = embeddings.get_shape()[1] # Run forward pass to calculate embeddings print('Calculating features for images') batch_size = 1000 image_size = 160 nrof_images = len(paths) nrof_batches_per_epoch = int(math.ceil(1.0 * nrof_images / batch_size)) emb_array = np.zeros((nrof_images, embedding_size)) for i in range(nrof_batches_per_epoch): start_index = i * batch_size end_index = min((i + 1) * batch_size, nrof_images) paths_batch = paths[start_index:end_index] images = facenet.load_data(paths_batch, False, False, image_size) feed_dict = {images_placeholder: images, phase_train_placeholder: False} emb_array[start_index:end_index, :] = sess.run(embeddings, feed_dict=feed_dict) classifier_filename = '/..Path to save classifier../my_classifier.pkl' classifier_filename_exp = os.path.expanduser(classifier_filename) # Train classifier print('Training classifier') model = SVC(kernel='linear', probability=True) model.fit(emb_array, labels) # Create a list of class names class_names = [cls.name.replace('_', ' ') for cls in dataset] # Saving classifier model with open(classifier_filename_exp, 'wb') as outfile: pickle.dump((model, class_names), outfile) print('Saved classifier model to file "%s"' % classifier_filename_exp) print('Goodluck')实时人脸识别:
from __future__ import absolute_import from __future__ import division from __future__ import print_function import tensorflow as tf from scipy import misc import cv2 import matplotlib.pyplot as plt import numpy as np import argparse import facenet import detect_face import os from os.path import join as pjoin import sys import time import copy import math import pickle from sklearn.svm import SVC from sklearn.externals import joblib print('Creating networks and loading parameters') with tf.Graph().as_default(): gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.6) sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options, log_device_placement=False)) with sess.as_default(): pnet, rnet, onet = detect_face.create_mtcnn(sess, './Path to det1.npy,..') minsize = 20 # minimum size of face threshold = [0.6, 0.7, 0.7] # three steps's threshold factor = 0.709 # scale factor margin = 44 frame_interval = 3 batch_size = 1000 image_size = 182 input_image_size = 160 HumanNames = ['Human_a','Human_b','Human_c','...','Human_h'] #train human name print('Loading feature extraction model') modeldir = '/..Path to pre-trained model../20170512-110547/20170512-110547.pb' facenet.load_model(modeldir) images_placeholder = tf.get_default_graph().get_tensor_by_name("input:0") embeddings = tf.get_default_graph().get_tensor_by_name("embeddings:0") phase_train_placeholder = tf.get_default_graph().get_tensor_by_name("phase_train:0") embedding_size = embeddings.get_shape()[1] classifier_filename = '/..Path to classifier model../my_classifier.pkl' classifier_filename_exp = os.path.expanduser(classifier_filename) with open(classifier_filename_exp, 'rb') as infile: (model, class_names) = pickle.load(infile) print('load classifier file-> %s' % classifier_filename_exp) video_capture = cv2.VideoCapture(0) c = 0 # #video writer # fourcc = cv2.VideoWriter_fourcc(*'DIVX') # out = cv2.VideoWriter('3F_0726.avi', fourcc, fps=30, frameSize=(640,480)) print('Start Recognition!') prevTime = 0 while True: ret, frame = video_capture.read() frame = cv2.resize(frame, (0,0), fx=0.5, fy=0.5) #resize frame (optional) curTime = time.time() # calc fps timeF = frame_interval if (c % timeF == 0): find_results = [] if frame.ndim == 2: frame = facenet.to_rgb(frame) frame = frame[:, :, 0:3] bounding_boxes, _ = detect_face.detect_face(frame, minsize, pnet, rnet, onet, threshold, factor) nrof_faces = bounding_boxes.shape[0] print('Detected_FaceNum: %d' % nrof_faces) if nrof_faces > 0: det = bounding_boxes[:, 0:4] img_size = np.asarray(frame.shape)[0:2] cropped = [] scaled = [] scaled_reshape = [] bb = np.zeros((nrof_faces,4), dtype=np.int32) for i in range(nrof_faces): emb_array = np.zeros((1, embedding_size)) bb[i][0] = det[i][0] bb[i][1] = det[i][1] bb[i][2] = det[i][2] bb[i][3] = det[i][3] # inner exception if bb[i][0] <= 0 or bb[i][1] <= 0 or bb[i][2] >= len(frame[0]) or bb[i][3] >= len(frame): print('face is inner of range!') continue cropped.append(frame[bb[i][1]:bb[i][3], bb[i][0]:bb[i][2], :]) cropped[0] = facenet.flip(cropped[0], False) scaled.append(misc.imresize(cropped[0], (image_size, image_size), interp='bilinear')) scaled[0] = cv2.resize(scaled[0], (input_image_size,input_image_size), interpolation=cv2.INTER_CUBIC) scaled[0] = facenet.prewhiten(scaled[0]) scaled_reshape.append(scaled[0].reshape(-1,input_image_size,input_image_size,3)) feed_dict = {images_placeholder: scaled_reshape[0], phase_train_placeholder: False} emb_array[0, :] = sess.run(embeddings, feed_dict=feed_dict) predictions = model.predict_proba(emb_array) best_class_indices = np.argmax(predictions, axis=1) best_class_probabilities = predictions[np.arange(len(best_class_indices)), best_class_indices] cv2.rectangle(frame, (bb[i][0], bb[i][1]), (bb[i][2], bb[i][3]), (0, 255, 0), 2) #boxing face #plot result idx under box text_x = bb[i][0] text_y = bb[i][3] + 20 # print('result: ', best_class_indices[0]) for H_i in HumanNames: if HumanNames[best_class_indices[0]] == H_i: result_names = HumanNames[best_class_indices[0]] cv2.putText(frame, result_names, (text_x, text_y), cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, (0, 0, 255), thickness=1, lineType=2) else: print('Unable to align') sec = curTime - prevTime prevTime = curTime fps = 1 / (sec) str = 'FPS: %2.3f' % fps text_fps_x = len(frame[0]) - 150 text_fps_y = 20 cv2.putText(frame, str, (text_fps_x, text_fps_y), cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, (0, 0, 0), thickness=1, lineType=2) # c+=1 cv2.imshow('Video', frame) if cv2.waitKey(1) & 0xFF == ord('q'): break video_capture.release() # #video writer # out.release() cv2.destroyAllWindows()利用以上mtcnn处理好的人脸数据SVM分类器训练,产生.pkl文件,将生成的.pkl文件保存在当前目录,即src文件夹下
cd facenet/src/ python classifier.py TRAIN ../data/mydatasets_160 20180402-114759 my_classifier.pkl在facenet/contributed中找到face.py,将face.py和real_time_face_recogniyion.py拷贝到src的目录下,并打开face.py文件,并且修改其中的模型文件地址和.pkl文件地址
调用代码实现实时检测:
python real_time_face_recognition.py在运行上述步骤时会出现一些错误,如下:
module 'scipy.misc' has no attribute 'imread' 在facenet.py文件中,将misc.imread改成imageio.imreadmodule 'scipy.misc' has no attribute 'imresize' 在face.py文件中,将misc.imresize改成cv2.resize,并将binear去掉