ChatGPT Has Eyes on Your Data: Can You Escape Its Stare? - Shailendra Shyam Sahasrabudhe, Country Manager, India, UAE and South East Asia, Cymulate Ltd

By- Shailendra Shyam Sahasrabudhe, Country Manager, India, UAE and South East Asia, Cymulate Ltd.

When Samsung allowed engineers at its semiconductor division to use ChatGPT for fixing problems with the source code, little did the consumer electronics behemoth know that it was opening the gateway to unexpected data leakage. In three separate instances and within a month’s span, the company’s employees accidentally leaked confidential data while interacting with a chatbot.

At a time when generative artificial intelligence (AI) has polarised conversations, this instance gave more heft to those who have been advocating caution about its usage. And people are sitting up and paying attention to what they are saying.

This is not because data leakage is a new discussion in technology circles. However, the rising instances of how critical data reaches the wrong hands is sounding alarm bells. From the ransomware attack on Fullerton India, which allegedly compromised 600GB of data, to ICICI Bank’s leakage of sensitive information, including financial figures and client details, it is once again clear that these risks are multiplying.

What is worrying industry stakeholders is the absence of a comprehensive data protection law in India. According to a new IMF working paper titled ‘Stacking up the Benefits: Lessons from India’s Digital Journey’, this places the privacy and other digital rights of users at risk. “A robust data protection framework is essential to protect citizens’ privacy, prevent companies and governments from indiscriminately collecting data, and holding companies and governments accountable for data breaches to incentivize appropriate data handling and adequate investments in cybersecurity.”

The biggest issue with data leakage is the difficulty in stemming its flow and also the course it charts. This means that once outside the organization’s purview, and in control of unscrupulous actors, this can be misused in unimaginable ways.

There are numerous cases of cybercriminals misusing Aadhar and PAN card details to procure loans without the knowledge of the actual users. This underlines what Cymulate’s Annual Usage Report highlights—that attempts to limit the exfiltration of data in many forms have been unsuccessful and in fact have gotten worse for the last three years.

Enter ChatGPT, Exit Cybersecurity?

India already has the ignominy of having more than 80 million Indian users affected by data breaches in 2021, according to various reports. Now throw in OpenAI’s news-making generative AI platform, ChatGPT, into this mix and you have a cauldron of boiling issues.

But can such an innocent-sounding technology should be the harbinger of catastrophe? Let’s first understand its composition to see if it has the makings of a villainous character.

ChatGPT, from Sam Altman-led OpenAI, is a text-generating AI-based chatbot that is capable of answering complex questions quickly and accurately with the ability to learn with every interaction. This means it gets better as it provides users with better replies, as its information bank grows.

It has caught the imagination of people worldwide because it offers a conversational interface. While on one hand students can use it write their essays, professionals can use it to perform tasks automatically, be it to reply to emails, edit blog posts, do the spreadsheet grunt work.

But, this means, it opens a backdoor for data leakage. As these employees feed ChatGPT new information daily, it could inadvertently include classified company data, without knowing its implications.

Consider this scenario. An employee uses ChatGPT to summarise notes from a board meeting that could include another company’s takeover or an investor call where details of a prospective IPO are discussed. The technology now has access to data from these closed-door discussions.

Now, imagine the impact it could have when these confidential details become public.

This is not because ChatGPT has malicious intents. Given its inherent nature, it consumes content, making it part of its data-lake, and could potentially share this information while answering questions posed by an unrelated individual.

Since it is an evolving technology, OpenAI has warned users not to share sensitive information, which will also safeguard itself from any potential litigations after data breaches. It has also introduced new privacy features for the AI language model to provide users with more control over their data. Now, they can turn off chat history in ChatGPT, which means that their conversations started with chat history disabled will not be used to train and improve the AI model or appear in the history sidebar.

Getting A Grip

While OpenAI is doing its bit to give users the assurance that they can trust the platform with their sensitive information, the onus ultimately rests on organizations who need to build guardrails to protect their critical data. While this sounds like a reasonable ask, it can be extremely challenging to implement.

The easiest route is to ChatGPT related websites on company networks. However, since means filtering domains and IP addresses on the firewall or proxy, it still leaves a window open for data leakage.

Since users often use their own personal devices or networks, this is not a foolproof modus operandi as IPs can change periodically. Every time the cybersecurity team shuts down one address, another one could open, and technocrats would be busy just locking one door as another one gets unlocked.

Companies could consider Advanced Data Loss Prevention (DLP) and Cloud Access Security Broker (CASB) systems, which could monitor all outgoing communications from the organizational networks. However, this is easier said than done because monitoring communications over TLS-encrypted (SSL) website sessions could open a company to privacy concerns.

If the legal and technical teams give the company the go-ahead, it can work with providers like Cymulate and leverage solutions like their Advanced Scenarios module already for testing simulations.

While tech tools are essential in plugging any potential data leakage, the most important part of the cybersecurity journey is user education. Companies can put strong access controls in place so that only authorized personnel can access specific information. Further this data exchange should be regularly monitored to identify any suspicious behavior, which needs proactive intervention.

Organizations should also educate people within the corporate structure about the importance of the data they are responsible for, and the liabilities the company could be responsible for. Once they realise its significance and the risks of data sharing with AI chatbots, they are likely to be more careful about data integrity.

At the same time, data backup should be done regularly to safeguard all digital assets, in case any breach does occur. After all, prevention is definitely better than cure when it comes to data security, especially as ChatGPT gains wider acceptance.