Difference between revisions of "Smartphone Facial Recognition"

Revision as of 23:34, 21 October 2022

By Kenneth Wu

Note: This page is incomplete.

Facial recognition systems are computer programs that match faces against a database [1]. A trivial task for humans, achieving high levels of accuracy has been difficult for computers until recently.^[1] Deep learning [2] through the use of convolutional neural networks [3] currently dominates the facial recognition field.^[2] However, deep learning uses much more memory, disk storage, and computational resources than traditional computer vision, presenting significant challenges to facial recognition with the limited hardware capabilities of smartphones.^[3] Accordingly, smartphone manufacturers have taken to using processors with dedicated neural engines for deep learning tasks ^[4] as well as creating simpler and more compact models that mimic the behavior of more complex models.^[3]

Model

Facial recognition systems accomplish their tasks by detecting the presence of a face, analyzing its features, and confirming the identity of the person.^[5] Training data is fed into a facial detection algorithm, where the two most popular such methods are the Viola-Jones algorithm and the use of convolutional neural networks.^[6]

Viola-Jones Algorithm

The Viola-Jones algorithm was the first real-time object detection framework, and works by converting images to grayscale and looking for edges that signify the presence of human features. ^[7] While highly accurate in detecting well-lit front-facing faces and also requiring relatively little memory, it is slower than deep-learning based methods, including the now industry-standard convolutional neural network (CNNs).^[6]

Convolutional Neural Networks

Convolutional neural networks are closely related to artificial neural networks (ANNs) [4]. Unlike traditional ANNs, CNNs have three dimensions - width, depth, and height - and only connect to a certain subset of the preceding layer.^[8] Their architecture is sparse [5], topographic, and feed-forward [6]^[9] featuring an input and output layer along with three types of hidden layers.^[10] The first hidden layer type is convolutional, which involves using a filter of n x n size with pre-determined values, sweeping across a larger matrix at a pre-determined stride and adding the dot products to an map.^[8] This presents a significant advantage over ANNs by greatly reducing the amount of information stored ^[8]. Because convolution only uses matrix multiplication, another process is needed to introduce non-linearity; the most popular is the Rectified Linear-Unit (RELU), which replaces negative values with zero. ^[10] The next hidden layer type is pooling, which works to further reduce the size and required computational power.^[11] The most common pooling method involves sweeping over the activation layer with another layer, usually 2 x 2 with a stride of 2, and selecting the largest value to put onto the next activation layer.^[11] The final hidden layer is the fully-connected layer, where neurons are fully connected to their two adjacent layers, as in an ANN.^[8] CNNs are usually configured in one of two ways: the first stacks convolutional layers which then pass to a stack of pooling layers; the second alternates between two stacks of convolutional layers and a stack of pooling layers.^[8]

Deep CNNs are the go-to method for supervised training and are even capable of unsupervised classification given a large enough training data set.^[12] Training results in learned weights, which are data patterns or rules extracted from the provided images.^[13] The trained filter values help determine the visual features of an input image, which it can compare to its existing database for a match.^[13] Once trained, models can be retrained to include faces not included in the original training image set in a process known as transfer learning.^[14] Through this process, weights for feature extraction - finding the features in an image - are retained, while weights for classification are changed.^[14] In this way, smartphones can learn new faces after they have already been trained.

Optimizations for Mobile Devices

Many smartphone operations requiring neural networks are traditionally run via the cloud, so services like Apple's Siri are unavailable offline.^[15] However, several options exist for improving the performance of on-device neural networks on mobile phones, both from a software and hardware perspective.

Software

Weight Pruning

A trained neural network has weights that make minimal impact to the activation of a neuron but still require computational power.^[16] Pruning removes several of the connections with the lowest weight, which speeds up the calculations while having a minimal impact on accuracy.^[16] One such pruning technique has been experimentally shown to allow for real-time neural net computation of the largest data set (ImageNet) using the largest deep neural network (VGG-16).^[16]

Applications

Apple / FaceID

References

↑ Brownlee, J. (2019, July 5). A gentle introduction to deep learning for face recognition. Machine Learning Mastery. Retrieved November 14, 2022, from https://machinelearningmastery.com/introduction-to-deep-learning-for-face-recognition/
↑ Almabdy, S., & Elrefaei, L. (2019). Deep convolutional neural network-based approaches for face recognition. Applied Sciences, 9(20), 4397. https://doi.org/10.3390/app9204397
↑ ^3.0 ^3.1 Computer Vision Machine Learning Team. (2017, November). An on-device deep neural network for face detection. Apple Machine Learning Research. Retrieved November 14, 2022, from https://machinelearning.apple.com/research/face-detection#1
↑ Samsung. (2018). Exynos 9810: Mobile Processor. Samsung Semiconductor Global. Retrieved November 14, 2022, from https://semiconductor.samsung.com/processor/mobile-processor/exynos-9-series-9810/
↑ Klosowski, T. (2020, July 15). Facial recognition is everywhere. here's what we can do about it. The New York Times. Retrieved November 14, 2022, from https://www.nytimes.com/wirecutter/blog/how-facial-recognition-works/
↑ ^6.0 ^6.1 Enriquez, K. (2018, May 15). (thesis). Faster face detection using Convolutional Neural Networks & the Viola-Jones algorithm. California State University Stanislaus. Retrieved November 14, 2022, from https://www.csustan.edu/sites/default/files/groups/University%20Honors%20Program/Journals/01_enriquez.pdf.
↑ Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of Simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. https://doi.org/10.1109/cvpr.2001.990517
↑ ^8.0 ^8.1 ^8.2 ^8.3 ^8.4 O'Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458.
↑ Gurucharan, M. (2022, July 28). Basic CNN architecture: Explaining 5 layers of Convolutional Neural Network. upGrad. Retrieved November 14, 2022, from https://www.upgrad.com/blog/basic-cnn-architecture/#:~:text=other%20advanced%20tasks.-,What%20is%20the%20architecture%20of%20CNN%3F,the%20main%20responsibility%20for%20computation.
↑ ^10.0 ^10.1 Mishra, M. (2020, August 26). Convolutional neural networks, explained. Towards Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
↑ ^11.0 ^11.1 Saha, S. (2018, December 15). A comprehensive guide to convolutional neural network. Toward Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
↑ Guérin, J., Gibaru, O., Thiery, S., & Nyiri, E. (2018). CNN features are also great at unsupervised classification. Computer Science & Information Technology. https://doi.org/10.5121/csit.2018.80308
↑ ^13.0 ^13.1 Khandelwal, R. (2020, May 18). Convolutional Neural Network: Feature map and filter visualization. Toward Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/convolutional-neural-network-feature-map-and-filter-visualization-f75012a5a49c
↑ ^14.0 ^14.1 Tammina, S. (2019). Transfer learning using VGG-16 with deep convolutional neural network for classifying images. International Journal of Scientific and Research Publications (IJSRP), 9(10), 143–150. https://doi.org/10.29322/ijsrp.9.10.2019.p9420
↑ Cite error: Invalid <ref> tag; no text was provided for refs named Apple
↑ ^16.0 ^16.1 ^16.2 Ma, X., Guo, F. M., Niu, W., Lin, X., Tang, J., Ma, K., ... & Wang, Y. (2020, April). Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 04, pp. 5117-5124).

[1] Brownlee, J. (2019, July 5). A gentle introduction to deep learning for face recognition. Machine Learning Mastery. Retrieved November 14, 2022, from https://machinelearningmastery.com/introduction-to-deep-learning-for-face-recognition/

[2] Almabdy, S., & Elrefaei, L. (2019). Deep convolutional neural network-based approaches for face recognition. Applied Sciences, 9(20), 4397. https://doi.org/10.3390/app9204397

[apple-3] 3.0 ^3.1 Computer Vision Machine Learning Team. (2017, November). An on-device deep neural network for face detection. Apple Machine Learning Research. Retrieved November 14, 2022, from https://machinelearning.apple.com/research/face-detection#1

[4] Samsung. (2018). Exynos 9810: Mobile Processor. Samsung Semiconductor Global. Retrieved November 14, 2022, from https://semiconductor.samsung.com/processor/mobile-processor/exynos-9-series-9810/

[5] Klosowski, T. (2020, July 15). Facial recognition is everywhere. here's what we can do about it. The New York Times. Retrieved November 14, 2022, from https://www.nytimes.com/wirecutter/blog/how-facial-recognition-works/

[enriquez-6] 6.0 ^6.1 Enriquez, K. (2018, May 15). (thesis). Faster face detection using Convolutional Neural Networks & the Viola-Jones algorithm. California State University Stanislaus. Retrieved November 14, 2022, from https://www.csustan.edu/sites/default/files/groups/University%20Honors%20Program/Journals/01_enriquez.pdf.

[7] Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of Simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. https://doi.org/10.1109/cvpr.2001.990517

[intro-8] 8.0 ^8.1 ^8.2 ^8.3 ^8.4 O'Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458.

[9] Gurucharan, M. (2022, July 28). Basic CNN architecture: Explaining 5 layers of Convolutional Neural Network. upGrad. Retrieved November 14, 2022, from https://www.upgrad.com/blog/basic-cnn-architecture/#:~:text=other%20advanced%20tasks.-,What%20is%20the%20architecture%20of%20CNN%3F,the%20main%20responsibility%20for%20computation.

[layers-10] 10.0 ^10.1 Mishra, M. (2020, August 26). Convolutional neural networks, explained. Towards Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939

[basic-11] 11.0 ^11.1 Saha, S. (2018, December 15). A comprehensive guide to convolutional neural network. Toward Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53

[12] Guérin, J., Gibaru, O., Thiery, S., & Nyiri, E. (2018). CNN features are also great at unsupervised classification. Computer Science & Information Technology. https://doi.org/10.5121/csit.2018.80308

[train-13] 13.0 ^13.1 Khandelwal, R. (2020, May 18). Convolutional Neural Network: Feature map and filter visualization. Toward Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/convolutional-neural-network-feature-map-and-filter-visualization-f75012a5a49c

[learn-14] 14.0 ^14.1 Tammina, S. (2019). Transfer learning using VGG-16 with deep convolutional neural network for classifying images. International Journal of Scientific and Research Publications (IJSRP), 9(10), 143–150. https://doi.org/10.29322/ijsrp.9.10.2019.p9420

[Apple-15] Cite error: Invalid <ref> tag; no text was provided for refs named Apple

[prune-16] 16.0 ^16.1 ^16.2 Ma, X., Guo, F. M., Niu, W., Lin, X., Tang, J., Ma, K., ... & Wang, Y. (2020, April). Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 04, pp. 5117-5124).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

@@ Line 24: / Line 24: @@
 ====Weight Pruning====
-A trained neural network has weights that make minimal impact to the activation of a neuron but still require computational power.<ref name="prune"> Ma, X., Guo, F. M., Niu, W., Lin, X., Tang, J., Ma, K., ... & Wang, Y. (2020, April). Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. In ''Proceedings of the AAAI Conference on Artificial Intelligence'' (Vol. 34, No. 04, pp. 5117-5124). </ref> Pruning removes several of the connections with the lowest weight, which speeds up the calculations while having a minimal impact on accuracy.<ref name="prune" /> One such pruning technique has been experimentally shown to allow for real-time neural net computation of the largest data set (ImageNet) using the largest deep neural network (VGG-16).<ref name="prune" .>
+A trained neural network has weights that make minimal impact to the activation of a neuron but still require computational power.<ref name="prune"> Ma, X., Guo, F. M., Niu, W., Lin, X., Tang, J., Ma, K., ... & Wang, Y. (2020, April). Pconv: The missing but desirable sparsity in dnn weight pruning for real-time execution on mobile devices. In ''Proceedings of the AAAI Conference on Artificial Intelligence'' (Vol. 34, No. 04, pp. 5117-5124). </ref> Pruning removes several of the connections with the lowest weight, which speeds up the calculations while having a minimal impact on accuracy.<ref name="prune" /> One such pruning technique has been experimentally shown to allow for real-time neural net computation of the largest data set (ImageNet) using the largest deep neural network (VGG-16).<ref name="prune" />
 == Applications ==

Difference between revisions of "Smartphone Facial Recognition"

Revision as of 23:34, 21 October 2022

Contents

Model

Viola-Jones Algorithm

Convolutional Neural Networks

Optimizations for Mobile Devices

Software

Weight Pruning

Applications

Apple / FaceID

References

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools