AI Data Privacy for Business: What Your Team Actually Needs to Know
Most businesses are already sending sensitive data to AI services without a privacy framework in place. Here's what's at risk and what to do about it.
There’s a quiet data problem happening in most companies right now.
Your employees are using AI tools — ChatGPT, Claude, Gemini, Copilot — to help with real work. Real work involves real data: customer lists, financial projections, legal documents, internal strategies. That data is going into AI systems that your legal and compliance teams probably haven’t reviewed.
This isn’t speculation. According to multiple surveys, the majority of employees using AI tools at work have pasted work documents into them. Most didn’t ask if that was allowed. Many didn’t think about it at all.
This guide covers what’s actually at risk, what the regulations say, and what you can do to use AI properly without banning it.
What data is actually being sent to AI services
Let’s be specific about what your employees are sending to AI tools:
Sales and marketing teams are pasting customer data, deal information, prospect lists, and competitive strategy into AI tools to get help with outreach, analysis, and planning.
Finance teams are pasting financial models, budget projections, and vendor contracts to get analysis and summaries.
Legal and HR teams are pasting employment agreements, performance reviews, and legal documents to get drafts and analysis.
Engineering teams are pasting source code — sometimes including API keys, credentials, and proprietary algorithms — to get help with debugging.
Every one of those document types has privacy, regulatory, or competitive sensitivity. And in most cases, the employees doing the pasting haven’t thought carefully about where that data goes.
Where your data goes when you use consumer AI tools
The terms of service vary by product and tier, but the general pattern for consumer AI tools:
Your input may be used for model training. Most consumer AI products default to using your conversations to improve their models. You can usually opt out, but the default is on. If your employees are using personal ChatGPT accounts (not the enterprise tier), their inputs may be training data.
Your data is stored on the provider’s servers. Conversations are retained for some period — this varies by provider and product tier. That retention creates a data store that, if breached, exposes your information.
The provider’s security posture, not yours, determines the risk. You’re trusting someone else’s security controls, patching cadence, and incident response procedures with your data.
Enterprise tiers are better, but not a complete solution. Enterprise ChatGPT, Claude for Enterprise, and similar products offer stronger protections — no training on your data, stronger contractual commitments. But your data still transits and is processed on their infrastructure.
What the regulations actually require
HIPAA
HIPAA regulates Protected Health Information (PHI). If your team works in healthcare, or handles any data that could identify a patient’s medical condition, treatment, or payment for treatment, HIPAA applies.
To use a third-party AI service with PHI legally, you need a Business Associate Agreement (BAA) with the AI provider. Some enterprise AI tiers offer BAAs; most consumer products do not.
If your clinical staff are pasting patient notes into ChatGPT without a BAA in place, that’s a HIPAA violation. The fines are real — $100 to $50,000 per violation, with an annual cap of $1.9 million per category.
GDPR
GDPR covers personal data of EU residents. If you have customers, employees, or partners in the EU, GDPR applies to their data regardless of where your company is located.
Key requirements that affect AI use:
- Data processing basis: You need a legal basis for processing personal data. Using personal data in an AI tool requires either consent, legitimate interest, or another valid basis.
- Data transfers: GDPR restricts transferring personal data to countries outside the EU/EEA unless certain conditions are met. Sending EU personal data to US-based AI services requires compliance with transfer mechanisms like Standard Contractual Clauses.
- Data subject rights: Individuals have the right to access and deletion of their data. If that data is in an AI provider’s training set, fulfilling deletion requests becomes complicated.
SOC 2 and enterprise security requirements
If your customers are enterprises, many of them will ask in their vendor security questionnaires whether you have controls around AI usage. “Our employees use whatever AI tools they want” is not a good answer.
SOC 2 doesn’t explicitly cover AI, but its trust service criteria (especially confidentiality) extend naturally to how you handle data processed by third-party AI services.
Financial services regulations
FINRA, the SEC, and their international equivalents have begun explicitly addressing AI usage. The general principle: if you’re subject to communication retention requirements, conversations with AI tools about client matters may need to be retained.
What to do about it
There are three realistic options:
Option 1: Ban AI tools
Some organizations ban consumer AI tools outright. This is defensible from a compliance standpoint but increasingly difficult to enforce and costly in productivity. Employees who can’t use AI tools for work use them anyway, just on personal devices — which is worse.
Blanket bans also put you at a competitive disadvantage as the tools become table stakes.
Option 2: Approve specific cloud AI enterprise tiers
Enterprise tiers of major AI products (ChatGPT Enterprise, Claude for Enterprise) offer stronger data handling commitments — no training on your data, retention controls, BAA availability for healthcare.
This is a reasonable middle ground for many teams. The gaps:
- You’re still trusting the provider’s infrastructure
- You don’t have granular control over what each team member can do
- You still need a governance framework for which data types can be used with AI
Option 3: Deploy a self-hosted AI gateway
A self-hosted AI gateway runs on your infrastructure and gives you control over the governance layer — access control, logging, agent configuration — while still calling cloud AI providers for model inference.
The data handling improvement: the routing, logging, and agent orchestration happen on your servers. The only data that leaves your perimeter is the prompt that goes to the AI provider for inference, going directly from your server to the provider.
This approach:
- Keeps the management layer under your control
- Gives you audit logs on your infrastructure
- Lets you control which employees can access which AI capabilities
- Lets you build agents that have the right context to answer questions without requiring employees to paste sensitive documents manually
For teams that want to also keep inference on-premise, self-hosted gateways that support local models (like running Llama or Mistral on your own GPU) eliminate the external call entirely for sensitive workloads.
Building an AI governance framework
Whether you use cloud AI or self-hosted, you need a governance framework. Here’s a minimal starting point:
1. Classify your data Which data types does your company handle? Which are regulated (PHI, PII, confidential business information)? Which can be freely shared with external services?
2. Approve tools by data type Create a simple matrix: which AI tools can be used with which data types. Example:
- Public information: any approved tool
- Internal information: approved enterprise AI tools only
- Customer PII: approved tools with BAA or self-hosted only
- Regulated health/financial data: self-hosted or approved tools with appropriate agreements only
3. Train your team The most important step. Your employees need to know the rules before they violate them, not after.
4. Establish logging and monitoring You can’t audit what you can’t see. A self-hosted gateway gives you logs. Enterprise AI tools usually offer some audit capability. Know what you have and review it.
5. Review regularly The AI landscape moves fast. Review your framework quarterly. What was reasonable six months ago may need updating.
Auxot and data privacy
Auxot is a self-hosted AI gateway. All the governance infrastructure — access control, logging, agent management, context files — runs on your servers. When agents call AI providers, they call them directly from your server; no third-party intermediary handles the data.
For teams that need to keep inference on-premise, Auxot supports local GPU workers running open-source models.
Install Auxot to get your own AI gateway running.
Read the security overview in the docs for technical details on the architecture.