Artificial Intelligence and The Transparency Challenge
30 July 2020
A POWERFUL new general purpose Artificial Intelligence (AI) technology has emerged from the computer science community. GPT 3.0 contains over 100 times more learning reference points than its predecessor, writes Pauline Norstrom, resulting in a quantum leap in capability.
The beneficial application of this new AI engine could improve the analysis of millions of lines of text in legal documents, medical documents and other academic papers in order to extract meaning and, in doing so, accelerate human decision-making. This could lead to a faster route to a cure for infectious diseases such as COVID-19 or otherwise help to recommend patient-specific treatments.
Natural language processing (NLP) may seem fairly straightforward when compared with the AIs needed in the security domain. The structure of language is generally well defined so “all” the algorithm has to do is interpret the form of the characters (using convolutional neural networks/image processing) and assimilate words, assuming the data is already in a readable format. It’s the context which presents the greatest challenge to any neural network.
NLP uses recurrent neural networks. Every time you use predictive text on your phone, you’re interacting with a form of NLP. Is NLP always right, though? Usually, you have to decide which option is most appropriate. The human, knowing the intended meaning, makes the final decision.
With a brief from a human, GPT 3.0 may be able to create factual text indiscernible from human writings. This may alarm those who enjoy spending hours assimilating research data. In stark contrast, replicating the human traits of ideation and creative flair caused by the unique way in which we each experience life is thought to be beyond the capability of such technology.
AI terminology may sound complex to those unfamiliar with the nuances of computer science. Nevertheless, business leaders are still accountable for the ethical application of AI.
Beneficial AI in the security domain
In the security space, things are not so “easy” for AI as they seem to be with NLP. This is due to the inconsistent and changing nature of the environments involved, the random activities of malevolent actors and the multitude of available data sources from which meaning may be extracted.
The smart city, the smart building and the smart public place all generate billions of bytes of data which can be analysed automatically by AI. Autonomous Internet of Things building sensor technologies are proliferating and, although legacy systems create drag in the convergence process, there’s an increasing ability to convert protocols and transcode unstructured data resulting in a harmonised data pool suitable for AI analysis. After all this manipulation for accessibility, it’s quite possible that the AIs may be analysing data which is no longer in its original format.
The purpose of AI in the security domain is simple. It’s to enable better human decision-making, thereby resulting in fast and relevant action to protect people and property and reduce loss.
Transparency is one of the building blocks of trust. In the context of security decisions, we need to understand and trust how the AI reached its recommendation. Adoption then follows trust. In a constantly moving environment, it’s somewhat difficult to test the replication of an AI decision when detecting a one-off threat. Where did the AI’s training data come from in this case?
The use of AI must be tested in the context of the use case because scenarios which could not be thought of by the developers due to narrow experience may arise will be new to the AI. This may seem obvious, but you wouldn’t drop an untrained person into a security and risk environment without contextual and professional training.
Millions of classified images
On the subject of training, for video analytics to work effectively across a range of scenarios, it has to learn from millions of classified images gathered from many sources. People are invited to contribute to open source projects by looking at images describing what’s seen. Emerging AIs can now learn from their own learnings which means that they can recognise familiar objects in a changing scene over time. This new technology reduces the need to label every instance of a type of object in an image, and also reduces the anomalies which creep into AI if the background is inadvertently classified.
For example, if a boat and a blue sky become associated because all classified images of boats also have a blue sky in the background, only a boat with a blue sky will be recognised. Boats without blue sky will not be recognised as boats. Some AIs are really that raw and fundamentally untrained.
It’s important to understand the distinction between the AI engine and what it does and the context of the application (in essence, what it’s being used for). If leaders were not using AI, we would define the goal and create a process journey from start to finish, usually with traceability throughout.
Transparency is one of the fundamental elements of supply chain management. Component traceability in electronics manufacture is vital for quality control and warranty. Every piece of data is tagged in some way to identify it and its source. Blockchain is emerging in supply chain verification, but doesn’t provide the answer in the security intelligence environment due to the risk of false data being introduced which then becomes the “truth”.
If you can trace the components in a manufactured electronic device back to their source, it’s logical that an AI decision should also be traceable back to the original data sources. It should, therefore, be possible to determine whether the source data, which may be the prima facie evidence in a legal case, has been altered. This may be beyond current technology, but when a data problem’s clearly defined, it follows that a solution is usually found.
It could be argued that generic open source AIs released to the world for experimentation become prone to bias due to the fact that the specific applications are not defined and, as a result, the training data sources are uncontrolled. This has happened in practice recently when automatic facial recognition technology was released into law enforcement use in the USA. It was discovered that the AI had been developed using a narrow demographic and started to show inaccuracy and bias in its recommendations. This can be corrected by widening the data set. If it had not been highlighted through this use case, some of the endemic biases in the organisations may have been left unchallenged, too. AI may help to remove bias.
Transparency and explainability: why so important?
The security industry generates more data than it currently uses. Even so, AI is focused around generating data about the data. This is because the unstructured data produced by video surveillance cameras is impossible to analyse alongside other relevant data sources without some kind of change of format from the original.
Integrations which store alarm data in the image header aid the text search for video associated with an alarm, but don’t tell the analyst what appeared in the video. Every image has to be reviewed by a human. This is time-consuming and, many would suggest, inefficient.
AIs in the form of video analytics (convolutional neural networks) provide effective filter and search functions for video data. The search results can trigger the start point for another analytical AI process such as picking up a related social media tweet. Sadly, not only do those who mean harm tweet their intent, but members of the public seem to believe that tweeting about a serious incident may be a good enough substitute for calling the Emergency Services (ie the socially responsible thing to do). What’s buzzing on the social media networks could be highly relevant to an incident captured on a video surveillance camera.
The source data and the resulting decision can become far removed and much harder to trace back to the original. The digital evidence bag may contain multiple data formats gathered from disparate sources and might also be full of untraceable data, potentially reducing its value in the criminal justice system.
Verifying the validity of digital forensic evidence is becoming an ever-increasing problem, even more so now than it was in the 2000s when digital recording took off as a means of storing video.
Without challenging the roots of the AIs embedded in security systems, we may inadvertently propagate flaws right through to the final decisions. There’s a school of thought in leadership which suggests that any decision is better than no decision, but when the decision is about life safety, security and respect for the privacy and dignity of the individual, and based on a biased process, it’s worse than no decision due to the potential harm it may cause.
Decisions based on bias
Decisions based on bias still occur every day in society, but with the increased level of awareness, now is a good time to scrutinise the AIs used in security decision-making and ensure that there’s a continuing education as well as a drive to make things better.
To challenge AI, leaders have to gain a deeper understanding of how these technologies work. This is not a matter for the Data Science Department in isolation. Rather, it’s a Board-level consideration.
Robust measures need to be put in place to enable traceability back to the source to verify whether the data has been altered along the way. It’s only a matter of time before a deep fake video or audio file carries weight in a prosecution. Let’s all determine to ensure that AI doesn’t magnify injustice and, instead, propagates fairness.
We need AI which is free from biased training data. We need AI whose decisions can be replicated and the type and use of AI to be understood and transparent. Where the source data and the decision become dissociated in some way, let’s be mindful of this and apply AI in such a way which allows the source data to be identified and verified.
Most of all, AI should be applied to achieve the maximum benefit on an ethical level, within the legal, regulatory and standards frameworks and without causing harm. This is precisely why we need to challenge the transparency of AI.
Pauline Norstrom is CEO and Founder of Anekanta Consulting