AI Data Privacy for Business: What Your Team Actually Needs to Know

Most businesses are already sending sensitive data to AI services without a privacy framework in place. Here's what's at risk and what to do about it.

May 1, 2025 · ~7 min read · Auxot Team

AI privacydata privacyAI complianceenterprise AI securityGDPRHIPAA

Most businesses are sending regulated and confidential data to AI services without a privacy framework in place — and most employees doing it haven’t thought about it at all. BlackFog research found that 34.8% of data pasted into ChatGPT is sensitive business data, and IBM’s 2024 data breach report found that breaches involving AI tools add an average of $670K to incident costs. The risk is not theoretical; it is already showing up in breach costs and regulatory enforcement.

What this article covers:

What data types your team is likely sending to AI services right now
What HIPAA, GDPR, SOC 2, and financial services regulations actually require for AI use
Three realistic compliance options — from banning tools to self-hosted gateways
A five-step governance framework you can start implementing this week

What data are your employees sending to AI services right now?

Let’s be specific about what your employees are sending to AI tools:

Sales and marketing teams are pasting customer data, deal information, prospect lists, and competitive strategy into AI tools to get help with outreach, analysis, and planning.

Finance teams are pasting financial models, budget projections, and vendor contracts to get analysis and summaries.

Legal and HR teams are pasting employment agreements, performance reviews, and legal documents to get drafts and analysis.

Engineering teams are pasting source code — sometimes including API keys, credentials, and proprietary algorithms — to get help with debugging.

Every one of those document types has privacy, regulatory, or competitive sensitivity. And in most cases, the employees doing the pasting haven’t thought carefully about where that data goes.

Where does your data go when employees use consumer AI tools?

The terms of service vary by product and tier, but the general pattern for consumer AI tools:

Your input may be used for model training. Most consumer AI products default to using your conversations to improve their models. You can usually opt out, but the default is on. If your employees are using personal ChatGPT accounts (not the enterprise tier), their inputs may be training data.

Your data is stored on the provider’s servers. Conversations are retained for some period — this varies by provider and product tier. That retention creates a data store that, if breached, exposes your information.

The provider’s security posture, not yours, determines the risk. You’re trusting someone else’s security controls, patching cadence, and incident response procedures with your data.

Enterprise tiers are better, but not a complete solution. Enterprise ChatGPT, Claude for Enterprise, and similar products offer stronger protections — no training on your data, stronger contractual commitments. But your data still transits and is processed on their infrastructure.

HIPAA

HIPAA regulates Protected Health Information (PHI). If your team works in healthcare, or handles any data that could identify a patient’s medical condition, treatment, or payment for treatment, HIPAA applies.

To use a third-party AI service with PHI legally, you need a Business Associate Agreement (BAA) with the AI provider. Some enterprise AI tiers offer BAAs; most consumer products do not.

If your clinical staff are pasting patient notes into ChatGPT without a BAA in place, that’s a HIPAA violation. The fines are real — $100 to $50,000 per violation, with an annual cap of $1.9 million per category.

GDPR covers personal data of EU residents. If you have customers, employees, or partners in the EU, GDPR applies to their data regardless of where your company is located.

Key requirements that affect AI use:

Data processing basis: You need a legal basis for processing personal data. Using personal data in an AI tool requires either consent, legitimate interest, or another valid basis.
Data transfers: GDPR restricts transferring personal data to countries outside the EU/EEA unless certain conditions are met. Sending EU personal data to US-based AI services requires compliance with transfer mechanisms like Standard Contractual Clauses.
Data subject rights: Individuals have the right to access and deletion of their data. If that data is in an AI provider’s training set, fulfilling deletion requests becomes complicated.

SOC 2 and enterprise security requirements

If your customers are enterprises, many of them will ask in their vendor security questionnaires whether you have controls around AI usage. “Our employees use whatever AI tools they want” is not a good answer.

SOC 2 doesn’t explicitly cover AI, but its trust service criteria (especially confidentiality) extend naturally to how you handle data processed by third-party AI services.

Financial services regulations

FINRA, the SEC, and their international equivalents have begun explicitly addressing AI usage. The general principle: if you’re subject to communication retention requirements, conversations with AI tools about client matters may need to be retained.

What are your realistic options for compliant AI use?

There are three realistic options:

Option 1: Ban AI tools

Some organizations ban consumer AI tools outright. This is defensible from a compliance standpoint but increasingly difficult to enforce and costly in productivity. Employees who can’t use AI tools for work use them anyway, just on personal devices — which is worse.

Blanket bans also put you at a competitive disadvantage as the tools become table stakes.

Option 2: Approve specific cloud AI enterprise tiers

Enterprise tiers of major AI products (ChatGPT Enterprise, Claude for Enterprise) offer stronger data handling commitments — no training on your data, retention controls, BAA availability for healthcare.

This is a reasonable middle ground for many teams. The gaps:

You’re still trusting the provider’s infrastructure
You don’t have granular control over what each team member can do
You still need a governance framework for which data types can be used with AI

Option 3: Deploy a self-hosted AI gateway

A self-hosted AI gateway runs on your infrastructure and gives you control over the governance layer — access control, logging, agent configuration — while still calling cloud AI providers for model inference.

The data handling improvement: the routing, logging, and agent orchestration happen on your servers. The only data that leaves your perimeter is the prompt that goes to the AI provider for inference, going directly from your server to the provider.

This approach:

Keeps the management layer under your control
Gives you audit logs on your infrastructure
Lets you control which employees can access which AI capabilities
Lets you build agents that have the right context to answer questions without requiring employees to paste sensitive documents manually

For teams that want to also keep inference on-premise, self-hosted gateways that support local models (like running Llama or Mistral on your own GPU) eliminate the external call entirely for sensitive workloads.

How do you build an AI data governance framework?

Whether you use cloud AI or self-hosted, you need a governance framework. Here’s a minimal starting point:

1. Classify your data Which data types does your company handle? Which are regulated (PHI, PII, confidential business information)? Which can be freely shared with external services?

2. Approve tools by data type Create a simple matrix: which AI tools can be used with which data types. Example:

Public information: any approved tool
Internal information: approved enterprise AI tools only
Customer PII: approved tools with BAA or self-hosted only
Regulated health/financial data: self-hosted or approved tools with appropriate agreements only

3. Train your team The most important step. Your employees need to know the rules before they violate them, not after.

4. Establish logging and monitoring You can’t audit what you can’t see. A self-hosted gateway gives you logs. Enterprise AI tools usually offer some audit capability. Know what you have and review it.

5. Review regularly The AI landscape moves fast. Review your framework quarterly. What was reasonable six months ago may need updating.

Auxot and data privacy

Auxot is a self-hosted AI gateway. All the governance infrastructure — access control, logging, agent management, context files — runs on your servers. When agents call AI providers, they call them directly from your server; no third-party intermediary handles the data.

For teams that need to keep inference on-premise, Auxot supports local GPU workers running open-source models.

Install Auxot to get your own AI gateway running.

Read the security overview in the docs for technical details on the architecture.

← All posts