Digital Forensics Principles and Analysis : Artificial Intelligence and Machine Learning | CyberSecRad

Our technology has evolved and made our lives easier and it will continue to do so. Currently, we are now in the fifth technological revolution, the so-called Convergence Revolution. These involves emerging technologies such as Blockchain and Quantum Computing, yet at the center of all of it was Artificial Intelligence (AI). In parallel of those technological advances, crimes have also evolved in their new forms (digital), which gave birth to the rise of cybercrimes as we know it today.
PRINCIPLES OF DIGITAL FORENSICS
Digital forensics only focuses on digital evidences that are involve in a crime or certain situation. It is different in nature to traditional forensics where mostly physical evidences are involved, which is slowly becoming very unlikely today, as our technology continues to advance. Becoming different than traditional forensics, digital forensics has its own principles for the entire process. These principles are broken down in specifics below:
- Identification and Acquisition – This is the process of identifying and acquiring only relevant data that may be involved in the case/crime. Over acquisition of these data may cause ethical dilemmas such as privacy concerns and lack of consent.
- Preservation – After gathering the relevant data, it is very crucial to preserve those data to maintain the integrity of it. This process involves creating multiple copies of the data gathered (disk cloning, backups creation) and proper record of chain of custody.
- Analysis and Interpretation – This is the process where digital forensics professionals analyze the gathered and preserved data for interpretation brainstorming and coming out with multiple scenarios based on the crime/case.
- Presentation – This is the process of presenting the outcome of the investigation as a form of reliable evidence that will be further presented in legal courts.
Following the above principles plays a big factor when conducting digital forensics investigation. It should be strictly implemented because any violation of those principles may lead to unsuccessful outcomes.
MACHINE LEARNING IN DIGITAL FORENSICS
By simple definition, AI is the ability of a computer system to properly collect data, analyze it, and then learn from those gathered data (Machine Learning or ML) to produce reliable solution/s to a specific or set of problems. AI is the whole pipeline and ML is a subset of it.
Looking at the definition of AI, it is much alike to the definition of Digital Forensics – gathering of various relevant data, preserving, validating, analyzing, interpreting those data and then presenting the findings to the legal court as reliable evidence. This likeness in definition is a big factor in the application of AI and ML to modern digital forensics.
Due to the fact that technology becoming advanced in a very quick fashion, a normal digital forensics investigation conducted by humans alone without the help of any AI are doomed to be unsuccessful – it is becoming very difficult and challenging for the investigators. Take a look at the below Figure 1 for example, before it is very easy to investigate and confirm that this video is fake by just analyzing changes in illumination, compression, frames, etc.
Figure 1. Putin’s address to the nation sparks confusion – AI-generated video dubbed as “Fake Leader”
Today, it is getting harder and harder to detect fake information online by just human observation and investigation, that is why we definitely need the help of AI and ML techniques to get the job done as analyzing such contents involves a lot of time and computational effort which simply humans lack. That being said, it is still important to consider that human judgement is a very crucial factor when it comes to an investigation. We cannot just rely on AI all the time. The goal is to create harmony between humans and systems to get the best possible outcome.
AI-generated images and videos are being used to spread fake news and propaganda to influence upcoming events like an election, war, and financial economy. In the next section, we will explore various case studies about these cases and discuss the role and implications of AI and ML with each scenario.
CASE STUDY INTEGRATION
In this section, we’ll be exploring several scenarios where AI and ML are being utilized to help digital forensic professionals.
Fake News Detection
Fake news is not a new subject to cover but it has evolved into something that is very concerning to our daily lives. Internet and social media websites made it very easy for an individual to just post something on the Internet and make people believe that it is true or legitimate. This poses a main problem because by nature, humans tend to believe whatever they see and read online without fact-checking it first whether it is true.
Let us take a look at the Figure 2 below. By first seeing this post, how can you know that this is true or not? Probably you would say, by reading the comments whether most comments align with the legitimacy of the post. According to researchers, in theory, that way is possible but in real practice, it can be really biased in nature. CNN as we know it is one of the leaders and trusted entity in news industry so when they posted a post, most people interacted with positive actions complimenting it. Yet afterwards, it is debunked as fake by the community.
Figure 2. CNN Report posted by CNN about famous actor involved in a war – Faked and Debunked
Identifying fake news or information is very challenging for professionals as well because it can involve a lot of data to produce a reliable conclusion. The perfect solution to this problem is to harness AI technology and ML techniques.
To further understand how AI and ML is utilized with these kinds of scenarios, let us study how researchers develop ideas and AI tools to help them. Note that these tools are still in development and is not freely available.
X-AI: Explainable Fact-checking Methodology
Figure 2. X-AI Explainable fact-checking methodology
As an example, researchers and developers came up with an AI solution called X-AI, to help them detect fake news. The idea, methodology and process on how they did it are below.
- Identify the Claim – This is when someone posted a claim on social media for example.
- Gather Existing Evidences – Through AI and ML techniques, the tool can automatically gather evidences that are available online or in its database.
- Decompose the claim by series of questions – The AI tool then automatically decompose the claim and produces series of questions relevant to the claim.
- Compare results based on answers – Afterwards, the AI tool can then compare the answers from the claim and the gathered evidence.
- Produce reliable conclusion – After comparison, the tool then produces a conclusion whether it supports the claim to be true or false.
This kind of tools for fact-checking are still in its infancy and faces many challenges such as lack of collaboration with the big companies such as the Big 9 – the big players when it comes to social media (Facebook, Twitter (X), Google, etc.) as those companies are in possession with those data and have different terms in dealing with it.
Person Re-identification
Person re-identification (re-ID) is very crucial in digital forensics investigations. This is when real-life scenarios happens and then images or videos captured by available CCTVs or surveillance around the scene are collected and analyzed to identify a suspect.
A very good example is the when the bombing happened in a marathon event in Boston. Digital Forensics professionals proceeded with the investigation to find the subject by first, gathering available footages in the scene and combining each vantage points to identify a possible suspect.
Figure 3. Boston Marathon Bombing, 2013
To further understand the process and methodology of utilizing AI tools for this specific case, let us uncover its main idea.
First, they must answer the question of “How to find the same person across different cameras without any annotation?”. No annotation means that you are not telling the tool about pre-defined characteristics (ex. Black hair, with a blue bag). This is a very challenging task for digital forensics professionals because it involves a lot of analyzation with such humongous database of vantage points. That is why they must leverage an AI tool powered by ML to help them.
Figure 4. Self-supervised learning pipeline methodology
The idea is to apply a method called self-supervised learning where AI and ML are utilized to produce a conclusion. As seen on Figure 5 above, it involves a series of steps which are explained below.
- Step 1, Feature Extraction and Distance Computation – This step is when the AI-tool automatically analyze the images gathered and identifying neighbouring images as well as their distances.
- Step 2, Ensemble-based Clustering – This is the step where the tool makes cluster of those images based on their distance
- Step 3, Proxy Selection per Cluster – This is when the tool automatically selects highly positive images per cluster.
- Step 4, Backbones updating – This step is where the tool automatically optimize the selection and further produce correct cases and failure cases.
Figure 5. Person re-ID using X-AI methodology
These AI-driven, when fully-developed and in-used, promises a very helpful and useful aide for digital forensics investigation.
CURRENT CHALLENGES
AI in Digital Forensics and its practitioners/researchers face ever-evolving challenges and it’s getting very complicated as technology progress. Thankfully, efforts in developing tools to combat these challenges are being made. Some of the challenges faced by digital forensics when it comes to the rise of AI are stated below.
- Deepfakes 2.0 – Deepfakes have now evolved from being simply face-swapping on an app like SnapChat to very realistic AI-generated images and videos. It is now very difficult today to identify a fake media by just mere observation and simple media analyzation.
- Hallucinations – AI tends to produce irrelevant or unreliable information because the algorithm of the model might have some inconsistencies.
- Biases – AI are trained by humans using specific data sets. What it means basically is that, if the humans who trained it are biased and fed it with biased information, the AI will be biased as well.
- Continuous Learning – Digital Forensics professionals and researches faces a challenge of continuous learning as the technology continuous to change over time where old techniques are becoming outdated quickly.
- Lack of Data and Missing Variables – When gathering data, some sensors and sources are limited, this contributes to having lack of data and missing important variables to further progress with the analysis.
- Lack of Network Unification in Fact-Checking – The Big 9 companies (Facebook, Twitter (X), Amazon, etc) which are the major companies for social can easily analyze the flow of information and fact-check everything whether fake news or not simply because they already have the data. Sadly, these companies have their own terms of handling those data.
Indeed, in every new innovation, there will always be some challenge in harnessing it. This is very true when it comes to the application of AI and ML techniques to digital forensics.
In conclusion, the application of AI and ML techniques in Digital Forensics is definitely revolutionary. It is proven effective in helping professionals in uncovering mysteries and producing reliable information as evidence to solve cybercrimes. As covered in this report, utilizing such technology also involve lots of challenges to preserve the very principles of digital forensics. These challenges are here to stay as well as the innovations are unstoppable. It is our job as humans to establish a coherent relationship with these machines and not by eliminating the need for the other. Human intelligence and judgement combined with the capabilities of AI can be a very valuable factor for solving complex crimes specially in today’s world of cyber.