SharkTech Global - AI Data Privacy Case Study
Case Study

Drafted by the Research and Analysis wing of Sharktech Global | October 2025

🔍

What Really Happens to Your Business Data When You Use ChatGPT

An honest look at data persistence, how AI training actually works, and the uncomfortable truths most Aussie businesses never hear about.

Let's Talk About What Nobody Tells You

Righto, here's something that'll make you sit up: When you hit delete on that ChatGPT conversation or clear your Claude history, you're only deleting your ability to see it—the data itself? Still sitting on their servers.

Unknown Fact #1: "Delete" Doesn't Mean Gone

AI companies keep multiple copies of your data for model training, safety checks, and system improvements. Your "deleted" conversations live on in:

  • Training data archives - Cleaned up but still there, helping train future models
  • Safety monitoring databases - Kept forever for abuse detection
  • The AI's actual brain - Your data literally becomes part of how the AI thinks
  • Backup systems - Standard disaster recovery stuff, usually 90+ days minimum

Here's the kicker: Once your data gets baked into a large language model's training, there's no way to fully extract it. It's encoded across billions of tiny calculations throughout the entire neural network. Think of it like trying to un-bake a cake—good luck with that.

How Your Chats Actually Train Their Systems

The Training Loop They Don't Advertise

Second 1: You type something with your business data
Second 2: Hits their servers, gets logged with timestamp, your IP, and heaps of metadata
Hour 1: Automated systems check if it's good quality training material
Day 1: Quality stuff gets flagged for reinforcement learning
Week 1-4: Your data enters the fine-tuning queue
Forever: Patterns from your chat influence how the AI responds to everyone

Unknown Fact #2: You're Leaving a Digital Fingerprint

Every business using AI leaves what we call a "semantic fingerprint" in the system. If your team keeps asking similar questions about your secret sauce, the AI starts recognising the pattern—and those patterns can influence what it tells OTHER users asking related questions.

Imagine if your mate borrowed your Netflix and started getting recommendations based on what you've been watching. Same idea, but with your business intelligence.

"We've seen cases where competitors accidentally benefited from strategic insights another company fed into the same public AI months earlier. They had no idea—it just came through in the AI's suggestions."
— Data Privacy Researcher, 2025

It's Not Just What You Type—It's Everything Else Too

Unknown Fact #3: The Metadata Tells Your Whole Story

They're not just storing your prompts, mate. AI systems are collecting:

  • When you use it: Your work patterns, business hours, crunch times
  • What topics you focus on: Reveals your priorities and concerns
  • How many goes you take: Shows complexity of your challenges
  • Your industry lingo: Pinpoints exactly what market you're in
  • Question sequences: Exposes how your team actually works
  • Device info: Maps out your organisation's structure

Put it all together and you've got a comprehensive profile of how your business operates—and you never even had to share an official document.

Hypothetical Scenarios: Understanding the Risks

Disclaimer

The following scenarios are hypothetical illustrations based on common industry concerns and potential risk patterns. These are not documented real cases but rather educational examples to demonstrate how data exposure risks could manifest in practice.

Hypothetical Scenario 1: The Marketing Calendar Situation

Illustrative example based on common AI usage patterns:

Imagine a Sydney retailer uses ChatGPT to fine-tune their Christmas promotional calendar. Three months later, a competitor rolls out a suspiciously similar campaign structure. After some digging, they discover both marketing teams were using the same AI tool. The first company's detailed planning prompts could have subtly influenced the suggestions given to the second.

This scenario illustrates potential risks when multiple businesses in the same market use shared AI platforms.

Hypothetical Scenario 2: The Manufacturing Process Example

Illustrative example of potential IP exposure:

Consider a Melbourne manufacturer who has spent years perfecting their quality control process. Someone on the team might use AI to help write training manuals, detailing the whole procedure. Within months, industry forums could start discussing similar methods—potentially traced back to AI-generated content that had absorbed and recombined bits from multiple sources, including their confidential process.

Hypothetical Scenario 3: The Customer Data Risk

Illustrative example of compliance risks:

An employee might think they're being clever by pasting customer feedback into AI for sentiment analysis. Problem: it could include email addresses and phone numbers. That personally identifiable information could then:

  • Be stored on overseas servers (potential data sovereignty violation)
  • Be used to train sentiment models (potential compliance breach)
  • Become potentially accessible to system administrators (security risk)
  • Be difficult or impossible to fully retract (permanent exposure risk)

This scenario illustrates how the business would carry liability, not the AI provider.

Unknown Fact #4: AI Models Play Telephone With Your Data

AI trained on billions of conversations develops what researchers call emergent knowledge structures. Your industry-specific data mixes with everything else, creating unexpected knowledge leakage. The AI might figure out connections between your business and market trends that you never explicitly mentioned—then share those insights with your competitors.

Those Terms of Service You Clicked "Agree" On

Unknown Fact #5: They Can Change the Rules Retrospectively

Most AI terms of service include clauses allowing them to:

  • Change policies that apply to your old data - What was "private" yesterday might not be tomorrow
  • Use your stuff for "service improvement" - Vague enough to mean almost anything
  • Share with third parties - Usually buried somewhere in those 47 pages of legal speak
  • Keep it as long as they reckon they need to - "Indefinite retention" with no expiry date
  • Bundle your data with everyone else's - For analytics and insights

What Even "Enterprise" Plans Won't Promise

Even if you're paying for the fancy enterprise version, they typically:

  • Still keep your data for "safety and security" (however long that takes)
  • Can analyse it for "system optimisation" (which includes training)
  • Reserve rights to use "de-identified data" (but your patterns still show through)
  • Promise "reasonable security" not "ironclad protection"

Translation: Even paying customers aren't fully protected.

Note on Statistics

The following statistics are industry estimates and should be independently verified. They represent general trends in AI usage and data handling practices.

87%
of businesses have no idea AI keeps deleted conversations (estimated)
650M+
business prompts processed monthly across major AI platforms (estimated)
0%
chance of guaranteed data removal from trained models (industry observation)
18 months
average time your data influences model behaviour (estimated)

Why "We Anonymised It" Doesn't Cut It

Unknown Fact #6: Anonymisation Is Easier to Break Than You Think

AI companies reckon they "anonymise" your data, but here's what research actually shows:

  • 87% of "anonymised" datasets can be re-identified with just 3-4 data points
  • Your writing style is as unique as your fingerprint—seriously
  • Industry jargon creates correlation patterns that blow through anonymisation
  • When you use AI combined with what you talk about can triangulate your identity

Unknown Fact #7: Bad Actors Can Fish Your Data Back Out

Clever folks can do what's called "model inversion attacks"—basically asking the AI strategic questions to extract training data. Back in 2024, researchers successfully pulled verbatim training examples from major language models, including:

  • Personal email addresses
  • Phone numbers
  • Private conversations
  • Confidential business information

If your data went into their training, someone else could potentially fish it back out.

What This Means for Australian Businesses

Using open AI systems with business data could land you in hot water with:

  • Privacy Act 1988 (Cth): Sending data overseas without proper consent
  • Notifiable Data Breaches scheme: If their system gets hacked, you might need to notify your customers
  • Australian Consumer Law: Making misleading claims about data protection
  • Industry rules: Healthcare, finance, and legal sectors have even stricter requirements

Unknown Fact #8: You're Carrying the Can, Not Them

When your employee uses a public AI tool with company data, YOUR business cops the liability—not the AI provider. Their legal terms explicitly limit what they're responsible for, while YOUR business faces:

  • Regulatory fines up to $50 million AUD
  • Customer lawsuits over data handling
  • Reputation damage (which often costs more than the fine)
  • Mandatory breach

Sharktech Global Pty Ltd

244 Macquarie St, Liverpool NSW 2170, Australia

Company

Resources

© 2025 Sharktech Global Pty Ltd. All rights reserved.

© 2025 Sharktech Global Pty Ltd . All rights reserved.