Machine learning technology is nowadays ubiquitous in many aspects of modern society and it is increasingly used in popular products such as cameras and smartphones. The uses of machine learning systems vary from web searches, recommendation systems to identify objects in images and many others. Most of these applications are based on what is called deep learning. Deep learning relies on Neural Networks, a set of algorithms that are designed to solve any problem representable through examples.
Let’s say you were to build a system that allows you to classify images between cats and cars. The first thing to do is to collect a large dataset of images of cats and cars and label each one with the correct category. Then, during the training step, the system will produce a vector score as an output for each category based on a shown image. The next step is to compute an objective function to measure the error between the output scores and the expected score, to finally, learn from it, and adjust the parameters again to reduce the error.
Most of the applications of deep neural networks involve image recognition. In fact, the first neural machine, the Mark 1 Perceptron, was born to be a vision machine. It must be noted that the goal of the Perceptron was not to recognize simple shapes, but to learn how to recognize shapes by using statistical calculations. As mentioned by Rosenblatt in 1958 “The theory has been developed for a hypothetical nervous system, or machine called a perceptron. The perceptron is designed to illustrate some of the fundamental properties of intelligent systems in general.”
The perceptron was the starting point of the field now commonly denominated as Computer Vision. In the summer of 1966, Marvin Minsky assigned the “Summer Vision Project” to an undergrad student to build a system that can analyze a scene and identify objects in it. Soon they would realize that it would take more than just a summer to make computers able to see and that the whole project was way bigger than what they thought. More than fifty years later, Computer Vision is a prominent field of A.I. that aims to give computers a visual understanding of the world while trying to automatize human visual tasks. Computer Vision’s ambition is not only to see but also to extract valuable information based on the observation.
Training is a crucial step for deep neural networks. Sets of labelled images are constantly being used to develop contemporary computer vision systems. The selection of the training dataset is perhaps one of the most critical and vulnerable parts of the design of neural networks. Neural networks are trained to recognize patterns in image training datasets in order to recognize the same patterns in future images. These systems built on a foundation of arguable and unstable epistemological and metaphysical assumptions about the nature of images, labels, categorization, and 7 representation. Categories say a lot about how they approach the problem and there is always a specific interest behind it. As Bowker and Star mention, “categorization is a powerful semantic and political intervention”. How categories are decided and what belongs in each category implies that someone decides how to implement them. All of these decisions are powerful allegations about who gets to decide how things are supposed to be. Categories say a lot about what the approach of the problem is and there is always a certain perspective behind categories.
Contrary to the good-old-fashion belief of the autonomy of A.I., during the design of neural networks, many steps are still affected by human intervention. Even though we would like to assume that most of the systems already working among us were trained using well-balanced datasets, this assumption might not be necessarily correct. Indeed, there are many examples of systems reflecting racial, gender, or class bias which were reflected and amplified through the neural network. For example, HP facial recognition system that was trained on a database of white people’s faces failed to recognize black people. This problem is known as “overfitting”. Given many examples of the same, a neural network will tend to learn too much about it and end up focusing on a very specific pattern.
Related to overfitting we find another phenomenon called “apophenia”. Google DeepDream is a great example of this. Apophenia proves as Hyto Steyerl mentions “that pattern recognition also exists where there is no pattern but a form is detected nevertheless.” Apophenia is like creating patterns from a noisy background. Overfitting and apophenia show one of the major issues of the nature of training datasets, this is the limitation of how the categories are constructed in neural networks. While training datasets might reflect just a small sample of the real world, we should raise questions about the relation of such a sample with the real world. The goal of a natural network is to be able to generalize results with unknown data, yet this generalization is possible due to the “homogeneity between training and test dataset.” Working towards a right balanced training set should be one of the goals while designing an algorithmic system. This way the risks arising from overfitting and apophenia would be minimized.
In current daily life, more and more processes are based on algorithms automatizing different tasks. Many of these tasks are putting into the world with the belief that they will take decisions in a more neutral way than a human would. Nevertheless, “algorithms are built and embedded into the lived world, at the level of institutional practice, individual behavior, and human experience”. We train and model algorithmic systems based on our visions of the world, and with a clear outcome in mind. At the same time, the outcome is influenced by social, cultural, and economical interests and agendas. As Beer points out “algorithms should not be understood as an object that exists outside of those social processes”.