Difference between revisions of "Smartphone Facial Recognition"

From Psyc 40 Wiki
Jump to: navigation, search
(Model)
(Model)
Line 5: Line 5:
 
Facial recognition systems are computer programs that match faces against a database [https://en.wikipedia.org/wiki/Facial_recognition_system]. A trivial task for humans, achieving high levels of accuracy has been difficult for computers until recently.<ref> Brownlee, J. (2019, July 5). ''A gentle introduction to deep learning for face recognition.'' Machine Learning Mastery. Retrieved November 14, 2022, from https://machinelearningmastery.com/introduction-to-deep-learning-for-face-recognition/  </ref> Deep learning [https://en.wikipedia.org/wiki/Deep_learning] through the use of convolutional neural networks [https://en.wikipedia.org/wiki/Convolutional_neural_network] currently dominates the facial recognition field.<ref> Almabdy, S., &amp; Elrefaei, L. (2019). Deep convolutional neural network-based approaches for face recognition. ''Applied Sciences'', 9(20), 4397. https://doi.org/10.3390/app9204397 </ref> However, deep learning uses much more memory, disk storage, and computational resources than traditional computer vision, presenting significant challenges to facial recognition with the limited hardware capabilities of smartphones.<ref name="apple"> Computer Vision Machine Learning Team. (2017, November). ''An on-device deep neural network for face detection.'' Apple Machine Learning Research. Retrieved November 14, 2022, from https://machinelearning.apple.com/research/face-detection#1 </ref> Accordingly, smartphone manufacturers have taken to using processors with dedicated neural engines for deep learning tasks <ref> Samsung. (2018). ''Exynos 9810: Mobile Processor.'' Samsung Semiconductor Global. Retrieved November 14, 2022, from https://semiconductor.samsung.com/processor/mobile-processor/exynos-9-series-9810/ </ref> as well as creating simpler and more compact models that mimic the behavior of more complex models.<ref name="apple" />  
 
Facial recognition systems are computer programs that match faces against a database [https://en.wikipedia.org/wiki/Facial_recognition_system]. A trivial task for humans, achieving high levels of accuracy has been difficult for computers until recently.<ref> Brownlee, J. (2019, July 5). ''A gentle introduction to deep learning for face recognition.'' Machine Learning Mastery. Retrieved November 14, 2022, from https://machinelearningmastery.com/introduction-to-deep-learning-for-face-recognition/  </ref> Deep learning [https://en.wikipedia.org/wiki/Deep_learning] through the use of convolutional neural networks [https://en.wikipedia.org/wiki/Convolutional_neural_network] currently dominates the facial recognition field.<ref> Almabdy, S., &amp; Elrefaei, L. (2019). Deep convolutional neural network-based approaches for face recognition. ''Applied Sciences'', 9(20), 4397. https://doi.org/10.3390/app9204397 </ref> However, deep learning uses much more memory, disk storage, and computational resources than traditional computer vision, presenting significant challenges to facial recognition with the limited hardware capabilities of smartphones.<ref name="apple"> Computer Vision Machine Learning Team. (2017, November). ''An on-device deep neural network for face detection.'' Apple Machine Learning Research. Retrieved November 14, 2022, from https://machinelearning.apple.com/research/face-detection#1 </ref> Accordingly, smartphone manufacturers have taken to using processors with dedicated neural engines for deep learning tasks <ref> Samsung. (2018). ''Exynos 9810: Mobile Processor.'' Samsung Semiconductor Global. Retrieved November 14, 2022, from https://semiconductor.samsung.com/processor/mobile-processor/exynos-9-series-9810/ </ref> as well as creating simpler and more compact models that mimic the behavior of more complex models.<ref name="apple" />  
 
== Model ==
 
== Model ==
Facial recognition systems accomplish their tasks by detecting the presence of a face, analyzing its features, and confirming the identity of the person.<ref> Klosowski, T. (2020, July 15). ''Facial recognition is everywhere. here's what we can do about it.'' The New York Times. Retrieved November 14, 2022, from https://www.nytimes.com/wirecutter/blog/how-facial-recognition-works/ </ref> Training data is fed into a facial detection algorithm, where the two most popular such methods are the Viola-Jones algorithm and the use of convolutional neural networks.<ref name="enriquez"> Enriquez, K. (2018, May 15). (thesis). ''Faster face detection using Convolutional Neural Networks & the Viola-Jones algorithm.'' California State University Stanislaus. Retrieved November 14, 2022, from https://www.csustan.edu/sites/default/files/groups/University%20Honors%20Program/Journals/01_enriquez.pdf. </ref> The Viola-Jones algorithm was the first real-time object detection framework, and works by converting images to grayscale and looking for edges that signify the presence of human features. <ref> Viola, P., &amp; Jones, M. (2001). Rapid object detection using a boosted cascade of Simple features. ''Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.'' CVPR 2001. https://doi.org/10.1109/cvpr.2001.990517 </ref> While highly accurate in detecting well-lit front-facing faces and also requiring relatively little memory, it is slower than deep-learning based methods, including the now industry-standard convolutional neural network (CNNs).<ref name="enriquez" />
+
Facial recognition systems accomplish their tasks by detecting the presence of a face, analyzing its features, and confirming the identity of the person.<ref> Klosowski, T. (2020, July 15). ''Facial recognition is everywhere. here's what we can do about it.'' The New York Times. Retrieved November 14, 2022, from https://www.nytimes.com/wirecutter/blog/how-facial-recognition-works/ </ref> Training data is fed into a facial detection algorithm, where the two most popular such methods are the Viola-Jones algorithm and the use of convolutional neural networks.<ref name="enriquez"> Enriquez, K. (2018, May 15). (thesis). ''Faster face detection using Convolutional Neural Networks & the Viola-Jones algorithm.'' California State University Stanislaus. Retrieved November 14, 2022, from https://www.csustan.edu/sites/default/files/groups/University%20Honors%20Program/Journals/01_enriquez.pdf. </ref>  
 +
 
 +
===Viola-Jones Algorithm===
 +
The Viola-Jones algorithm was the first real-time object detection framework, and works by converting images to grayscale and looking for edges that signify the presence of human features. <ref> Viola, P., &amp; Jones, M. (2001). Rapid object detection using a boosted cascade of Simple features. ''Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.'' CVPR 2001. https://doi.org/10.1109/cvpr.2001.990517 </ref> While highly accurate in detecting well-lit front-facing faces and also requiring relatively little memory, it is slower than deep-learning based methods, including the now industry-standard convolutional neural network (CNNs).<ref name="enriquez" />
 +
 
 +
===Convolutional Neural Networks==
  
 
Convolutional neural networks are closely related to artificial neural networks (ANNs) [https://en.wikipedia.org/wiki/Artificial_neural_network]. Unlike traditional ANNs, CNNs have three dimensions - width, depth, and height - and only connect to a certain subset of the preceding layer.<ref name="intro"> O'Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. ''arXiv preprint arXiv:1511.08458''. </ref> Their architecture is sparse [https://en.wikipedia.org/wiki/Sparse_network], topographic,  and feed-forward [https://en.wikipedia.org/wiki/Feedforward_neural_network]<ref> Gurucharan, M. (2022, July 28). ''Basic CNN architecture: Explaining 5 layers of Convolutional Neural Network.'' upGrad. Retrieved November 14, 2022, from https://www.upgrad.com/blog/basic-cnn-architecture/#:~:text=other%20advanced%20tasks.-,What%20is%20the%20architecture%20of%20CNN%3F,the%20main%20responsibility%20for%20computation. </ref> featuring an input and output layer along with three types of hidden layers.<ref name="layers"> Mishra, M. (2020, August 26). ''Convolutional neural networks, explained.'' Towards Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939 </ref> The first hidden layer type is convolutional, which involves using a filter of n x n size with pre-determined values, sweeping across a larger matrix at a pre-determined stride and adding the dot products to an map.<ref name="intro" /> This presents a significant advantage over ANNs by greatly reducing the amount of information stored <ref name="intro" />. Because convolution only uses matrix multiplication, another process is needed to introduce non-linearity; the most popular is the Rectified Linear-Unit (RELU), which replaces negative values with zero. <ref name="layers" /> The next hidden layer type is pooling, which works to further reduce the size and required computational power.<ref name="basic"> Saha, S. (2018, December 15). ''A comprehensive guide to convolutional neural network.'' Toward Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 </ref> The most common pooling method involves sweeping over the activation layer with another layer, usually 2 x 2 with a stride of 2, and selecting the largest value to put onto the next activation layer.<ref name="basic" /> The final hidden layer is the fully-connected layer, where neurons are fully connected to their two adjacent layers, as in an ANN.<ref name="intro" /> CNNs are usually configured in one of two ways: the first stacks convolutional layers which then pass to a stack of pooling layers; the second alternates between two stacks of convolutional layers and a stack of pooling layers.<ref name="intro" />
 
Convolutional neural networks are closely related to artificial neural networks (ANNs) [https://en.wikipedia.org/wiki/Artificial_neural_network]. Unlike traditional ANNs, CNNs have three dimensions - width, depth, and height - and only connect to a certain subset of the preceding layer.<ref name="intro"> O'Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. ''arXiv preprint arXiv:1511.08458''. </ref> Their architecture is sparse [https://en.wikipedia.org/wiki/Sparse_network], topographic,  and feed-forward [https://en.wikipedia.org/wiki/Feedforward_neural_network]<ref> Gurucharan, M. (2022, July 28). ''Basic CNN architecture: Explaining 5 layers of Convolutional Neural Network.'' upGrad. Retrieved November 14, 2022, from https://www.upgrad.com/blog/basic-cnn-architecture/#:~:text=other%20advanced%20tasks.-,What%20is%20the%20architecture%20of%20CNN%3F,the%20main%20responsibility%20for%20computation. </ref> featuring an input and output layer along with three types of hidden layers.<ref name="layers"> Mishra, M. (2020, August 26). ''Convolutional neural networks, explained.'' Towards Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939 </ref> The first hidden layer type is convolutional, which involves using a filter of n x n size with pre-determined values, sweeping across a larger matrix at a pre-determined stride and adding the dot products to an map.<ref name="intro" /> This presents a significant advantage over ANNs by greatly reducing the amount of information stored <ref name="intro" />. Because convolution only uses matrix multiplication, another process is needed to introduce non-linearity; the most popular is the Rectified Linear-Unit (RELU), which replaces negative values with zero. <ref name="layers" /> The next hidden layer type is pooling, which works to further reduce the size and required computational power.<ref name="basic"> Saha, S. (2018, December 15). ''A comprehensive guide to convolutional neural network.'' Toward Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53 </ref> The most common pooling method involves sweeping over the activation layer with another layer, usually 2 x 2 with a stride of 2, and selecting the largest value to put onto the next activation layer.<ref name="basic" /> The final hidden layer is the fully-connected layer, where neurons are fully connected to their two adjacent layers, as in an ANN.<ref name="intro" /> CNNs are usually configured in one of two ways: the first stacks convolutional layers which then pass to a stack of pooling layers; the second alternates between two stacks of convolutional layers and a stack of pooling layers.<ref name="intro" />

Revision as of 23:34, 21 October 2022

By Kenneth Wu

Note: This page is incomplete.

Facial recognition systems are computer programs that match faces against a database [1]. A trivial task for humans, achieving high levels of accuracy has been difficult for computers until recently.[1] Deep learning [2] through the use of convolutional neural networks [3] currently dominates the facial recognition field.[2] However, deep learning uses much more memory, disk storage, and computational resources than traditional computer vision, presenting significant challenges to facial recognition with the limited hardware capabilities of smartphones.[3] Accordingly, smartphone manufacturers have taken to using processors with dedicated neural engines for deep learning tasks [4] as well as creating simpler and more compact models that mimic the behavior of more complex models.[3]

Model

Facial recognition systems accomplish their tasks by detecting the presence of a face, analyzing its features, and confirming the identity of the person.[5] Training data is fed into a facial detection algorithm, where the two most popular such methods are the Viola-Jones algorithm and the use of convolutional neural networks.[6]

Viola-Jones Algorithm

The Viola-Jones algorithm was the first real-time object detection framework, and works by converting images to grayscale and looking for edges that signify the presence of human features. [7] While highly accurate in detecting well-lit front-facing faces and also requiring relatively little memory, it is slower than deep-learning based methods, including the now industry-standard convolutional neural network (CNNs).[6]

=Convolutional Neural Networks

Convolutional neural networks are closely related to artificial neural networks (ANNs) [4]. Unlike traditional ANNs, CNNs have three dimensions - width, depth, and height - and only connect to a certain subset of the preceding layer.[8] Their architecture is sparse [5], topographic, and feed-forward [6][9] featuring an input and output layer along with three types of hidden layers.[10] The first hidden layer type is convolutional, which involves using a filter of n x n size with pre-determined values, sweeping across a larger matrix at a pre-determined stride and adding the dot products to an map.[8] This presents a significant advantage over ANNs by greatly reducing the amount of information stored [8]. Because convolution only uses matrix multiplication, another process is needed to introduce non-linearity; the most popular is the Rectified Linear-Unit (RELU), which replaces negative values with zero. [10] The next hidden layer type is pooling, which works to further reduce the size and required computational power.[11] The most common pooling method involves sweeping over the activation layer with another layer, usually 2 x 2 with a stride of 2, and selecting the largest value to put onto the next activation layer.[11] The final hidden layer is the fully-connected layer, where neurons are fully connected to their two adjacent layers, as in an ANN.[8] CNNs are usually configured in one of two ways: the first stacks convolutional layers which then pass to a stack of pooling layers; the second alternates between two stacks of convolutional layers and a stack of pooling layers.[8]

Deep CNNs are the go-to method for supervised training and are even capable of unsupervised classification given a large enough training data set.[12] Training results in learned weights, which are data patterns or rules extracted from the provided images.[13] The trained filter values help determine the visual features of an input image, which it can compare to its existing database for a match.[13] Once trained, models can be retrained to include faces not included in the original training image set in a process known as transfer learning.[14] Through this process, weights for feature extraction - finding the features in an image - are retained, while weights for classification are changed.[14] In this way, smartphones can learn new faces after they have already been trained.

History

Applications

References

  1. Brownlee, J. (2019, July 5). A gentle introduction to deep learning for face recognition. Machine Learning Mastery. Retrieved November 14, 2022, from https://machinelearningmastery.com/introduction-to-deep-learning-for-face-recognition/
  2. Almabdy, S., & Elrefaei, L. (2019). Deep convolutional neural network-based approaches for face recognition. Applied Sciences, 9(20), 4397. https://doi.org/10.3390/app9204397
  3. 3.0 3.1 Computer Vision Machine Learning Team. (2017, November). An on-device deep neural network for face detection. Apple Machine Learning Research. Retrieved November 14, 2022, from https://machinelearning.apple.com/research/face-detection#1
  4. Samsung. (2018). Exynos 9810: Mobile Processor. Samsung Semiconductor Global. Retrieved November 14, 2022, from https://semiconductor.samsung.com/processor/mobile-processor/exynos-9-series-9810/
  5. Klosowski, T. (2020, July 15). Facial recognition is everywhere. here's what we can do about it. The New York Times. Retrieved November 14, 2022, from https://www.nytimes.com/wirecutter/blog/how-facial-recognition-works/
  6. 6.0 6.1 Enriquez, K. (2018, May 15). (thesis). Faster face detection using Convolutional Neural Networks & the Viola-Jones algorithm. California State University Stanislaus. Retrieved November 14, 2022, from https://www.csustan.edu/sites/default/files/groups/University%20Honors%20Program/Journals/01_enriquez.pdf.
  7. Viola, P., & Jones, M. (2001). Rapid object detection using a boosted cascade of Simple features. Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001. https://doi.org/10.1109/cvpr.2001.990517
  8. 8.0 8.1 8.2 8.3 8.4 O'Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. arXiv preprint arXiv:1511.08458.
  9. Gurucharan, M. (2022, July 28). Basic CNN architecture: Explaining 5 layers of Convolutional Neural Network. upGrad. Retrieved November 14, 2022, from https://www.upgrad.com/blog/basic-cnn-architecture/#:~:text=other%20advanced%20tasks.-,What%20is%20the%20architecture%20of%20CNN%3F,the%20main%20responsibility%20for%20computation.
  10. 10.0 10.1 Mishra, M. (2020, August 26). Convolutional neural networks, explained. Towards Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/convolutional-neural-networks-explained-9cc5188c4939
  11. 11.0 11.1 Saha, S. (2018, December 15). A comprehensive guide to convolutional neural network. Toward Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53
  12. Guérin, J., Gibaru, O., Thiery, S., & Nyiri, E. (2018). CNN features are also great at unsupervised classification. Computer Science & Information Technology. https://doi.org/10.5121/csit.2018.80308
  13. 13.0 13.1 Khandelwal, R. (2020, May 18). Convolutional Neural Network: Feature map and filter visualization. Toward Data Science. Retrieved November 14, 2022, from https://towardsdatascience.com/convolutional-neural-network-feature-map-and-filter-visualization-f75012a5a49c
  14. 14.0 14.1 Tammina, S. (2019). Transfer learning using VGG-16 with deep convolutional neural network for classifying images. International Journal of Scientific and Research Publications (IJSRP), 9(10), 143–150. https://doi.org/10.29322/ijsrp.9.10.2019.p9420