ChatGPT Has Eyes on Your Data: Can You Escape Its Stare? - Shailendra Shyam Sahasrabudhe, Country Manager, India, UAE and South East Asia, Cymulate Ltd
By- Shailendra Shyam Sahasrabudhe, Country Manager, India, UAE and South East Asia, Cymulate Ltd.
When Samsung allowed engineers at its semiconductor division to use
ChatGPT for fixing problems with the source code, little did the consumer
electronics behemoth know that it was opening the gateway to unexpected data
leakage. In three separate instances and within a month’s span, the company’s
employees accidentally leaked confidential data while interacting with a
chatbot.
At a time when generative artificial intelligence (AI) has polarised
conversations, this instance gave more heft to those who have been advocating
caution about its usage. And people are sitting up and paying attention to what
they are saying.
This is not because data leakage is a new discussion in technology circles.
However, the rising instances of how critical data reaches the wrong hands is
sounding alarm bells. From the ransomware attack on Fullerton India, which
allegedly compromised 600GB of data, to ICICI Bank’s leakage of sensitive
information, including financial figures and client details, it is once again
clear that these risks are multiplying.
What is worrying industry stakeholders is the absence of a comprehensive
data protection law in India. According to a new IMF working paper titled ‘Stacking
up the Benefits: Lessons from India’s Digital Journey’, this places the privacy
and other digital rights of users at risk. “A robust data protection framework
is essential to protect citizens’ privacy, prevent companies and governments
from indiscriminately collecting data, and holding companies and governments
accountable for data breaches to incentivize appropriate data handling and
adequate investments in cybersecurity.”
The biggest issue with data leakage is the difficulty in stemming its
flow and also the course it charts. This means that once outside the organization’s
purview, and in control of unscrupulous actors, this can be misused in
unimaginable ways.
There are numerous cases of cybercriminals misusing Aadhar and PAN card
details to procure loans without the knowledge of the actual users. This
underlines what Cymulate’s Annual Usage Report highlights—that attempts to
limit the exfiltration of data in many forms have been unsuccessful and in fact
have gotten worse for the last three years.
Enter ChatGPT, Exit Cybersecurity?
India already has the ignominy of having more than 80 million Indian users
affected by data breaches in 2021, according to various reports. Now throw in OpenAI’s
news-making generative AI platform, ChatGPT, into this mix and you have a
cauldron of boiling issues.
But can such an innocent-sounding technology should be the harbinger of
catastrophe? Let’s first understand its composition to see if it has the
makings of a villainous character.
ChatGPT, from Sam Altman-led OpenAI, is a text-generating AI-based
chatbot that is capable of answering complex questions quickly and accurately
with the ability to learn with every interaction. This means it gets better as
it provides users with better replies, as its information bank grows.
It has caught the imagination of people worldwide because it offers a
conversational interface. While on one hand students can use it write their
essays, professionals can use it to perform tasks automatically, be it to reply
to emails, edit blog posts, do the spreadsheet grunt work.
But, this means, it opens a backdoor for data leakage. As these
employees feed ChatGPT new information daily, it could inadvertently include
classified company data, without knowing its implications.
Consider this scenario. An employee uses ChatGPT to summarise notes from
a board meeting that could include another company’s takeover or an investor
call where details of a prospective IPO are discussed. The technology now has
access to data from these closed-door discussions.
Now, imagine the impact it could have when these confidential details
become public.
This is not because ChatGPT has malicious intents. Given its inherent
nature, it consumes content, making it part of its data-lake, and could
potentially share this information while answering questions posed by an
unrelated individual.
Since it is an evolving technology, OpenAI has warned users not to share
sensitive information, which will also safeguard itself from any potential
litigations after data breaches. It has also introduced new privacy features
for the AI language model to provide users with more control over their data.
Now, they can turn off chat history in ChatGPT, which means that their
conversations started with chat history disabled will not be used to train and
improve the AI model or appear in the history sidebar.
Getting A Grip
While OpenAI is doing its bit to give users the assurance that they can
trust the platform with their sensitive information, the onus ultimately rests
on organizations who need to build guardrails to protect their critical data.
While this sounds like a reasonable ask, it can be extremely challenging to
implement.
The easiest route is to ChatGPT related websites on company networks.
However, since means filtering domains and IP addresses on the firewall or
proxy, it still leaves a window open for data leakage.
Since users often use their own personal devices or networks, this is
not a foolproof modus operandi as IPs can change periodically. Every time the cybersecurity
team shuts down one address, another one could open, and technocrats would be
busy just locking one door as another one gets unlocked.
Companies could consider Advanced Data Loss Prevention (DLP) and Cloud
Access Security Broker (CASB) systems, which could monitor all outgoing communications
from the organizational networks. However, this is easier said than done
because monitoring communications over TLS-encrypted (SSL) website sessions
could open a company to privacy concerns.
If the legal and technical teams give the company the go-ahead, it can
work with providers like Cymulate and leverage solutions like their Advanced
Scenarios module already for testing simulations.
While tech tools are essential in plugging any potential data leakage,
the most important part of the cybersecurity journey is user education. Companies
can put strong access controls in place so that only authorized personnel can
access specific information. Further this data exchange should be regularly
monitored to identify any suspicious behavior, which needs proactive
intervention.
Organizations should also educate people within the corporate structure about
the importance of the data they are responsible for, and the liabilities the company
could be responsible for. Once they realise its significance and the risks of data
sharing with AI chatbots, they are likely to be more careful about data
integrity.