feat: add GDPR-compliant engineering practices skill documentation (#1230)

* feat: add GDPR-compliant engineering practices skill documentation

* Add GDPR compliance references for Security and Data Rights

- Introduced a comprehensive Security.md file detailing encryption, password hashing, secrets management, anonymization, cloud practices, CI/CD controls, and incident response protocols.
- Created a Data Rights.md file outlining user rights implementation, Record of Processing Activities (RoPA), consent management, sub-processor management, and DPIA triggers.

* Refine GDPR compliance documentation by removing unnecessary symbols and ensuring clarity in security and data rights references

* refactor: streamline description formatting in GDPR compliance skill documentation

---------

Co-authored-by: Aaron Powell <me@aaron-powell.com>
This commit is contained in:
Mikael
2026-03-31 01:53:25 +02:00
committed by GitHub
parent 235f5740d4
commit 7454bbdb7c
4 changed files with 727 additions and 0 deletions

View File

@@ -0,0 +1,266 @@
# GDPR Reference — Security, Operations & Architecture
Load this file when you need implementation detail on:
encryption, password hashing, secrets management, anonymization/pseudonymization,
cloud/DevOps practices, CI/CD controls, incident response, architecture patterns.
---
## Encryption
### At-Rest Encryption
| Data sensitivity | Minimum standard |
|---|---|
| Standard personal data (name, address, email) | AES-256 disk/volume encryption (cloud provider default) |
| Sensitive personal data (health, biometric, financial, national ID) | AES-256 **column-level** encryption + envelope encryption via KMS |
| Encryption keys | HSM-backed KMS (Azure Key Vault Premium / AWS KMS CMK / GCP Cloud KMS) |
**Envelope encryption pattern:**
1. Encrypt data with a **Data Encryption Key (DEK)** (AES-256, generated per record or per table).
2. Encrypt the DEK with a **Key Encryption Key (KEK)** stored in the KMS.
3. Store the encrypted DEK alongside the encrypted data.
4. Deleting the KEK = effective crypto-shredding of all data encrypted with it.
### In-Transit Encryption
- **MUST** enforce TLS 1.2 minimum; prefer TLS 1.3.
- **MUST** set `Strict-Transport-Security: max-age=31536000; includeSubDomains; preload`.
- **MUST NOT** allow TLS 1.0, TLS 1.1, null cipher suites, or export-grade ciphers.
- **MUST NOT** use self-signed certificates in production.
### Key Management
- Rotate DEKs annually minimum; rotate immediately upon suspected compromise.
- Use separate key namespaces per environment (dev / staging / prod).
- Log all KMS key access events — alert on anomalous access patterns.
- MUST NOT hardcode encryption keys in source code or configuration files.
---
## Password Hashing
| Algorithm | Parameters | Notes |
|---|---|---|
| **Argon2id** recommended | memory ≥ 64 MB, iterations ≥ 3, parallelism ≥ 4 | OWASP and NIST recommended |
| **bcrypt** acceptable | cost factor ≥ 12 | Widely supported; use if Argon2id unavailable |
| **scrypt** acceptable | N=32768, r=8, p=1 | Good alternative |
| MD5 | — | Never — trivially broken |
| SHA-1 / SHA-256 | — | Never for passwords — not designed for this purpose |
**MUST**
- Use a unique salt per password (built into all three algorithms above).
- Store only the hash — never the plaintext, never a reversible encoding.
- Re-hash on login if the stored hash uses an outdated algorithm — upgrade transparently.
**SHOULD**
- Add a **pepper** (server-side secret added before hashing) stored in the KMS, not in the DB.
- Check passwords against known breach lists at registration (`haveibeenpwned` API, k-anonymity mode).
- Enforce minimum password length of 12 characters.
**MUST NOT**
- Log passwords in any form — not during registration, not during failed login.
- Transmit passwords in URLs or query strings.
- Store password reset tokens in plaintext — hash them before storage.
---
## Secrets Management
**MUST**
- Store all secrets in a dedicated secret manager: Azure Key Vault, AWS Secrets Manager,
GCP Secret Manager, or HashiCorp Vault.
- Use pre-commit hooks to prevent secret commits: `gitleaks`, `detect-secrets`, GitHub native secret scanning.
- Rotate secrets immediately upon: developer offboarding, suspected compromise, annual schedule.
- Maintain a **secrets inventory document** — every secret listed with its purpose and rotation date.
**SHOULD**
- Use **short-lived credentials** via OIDC federation (GitHub Actions → Azure/AWS/GCP) instead of long-lived API keys.
- Audit all KMS secret access — alert on access outside business hours or from unexpected sources.
- Use separate secret namespaces per environment.
**`.gitignore` MUST include:**
```
.env
.env.*
*.pem
*.key
*.pfx
*.p12
secrets/
appsettings.*.json # if it may contain connection strings
```
**MUST NOT**
- Commit secrets to source code repositories.
- Pass secrets as plain-text CLI arguments (they appear in process lists and shell history).
- Store secrets as unencrypted environment variable defaults in code.
---
## Anonymization & Pseudonymization
### Definitions
| Term | Reversible? | GDPR scope? | Use case |
|---|---|---|---|
| **Anonymization** | No | Outside GDPR scope | Retained records after erasure, analytics datasets |
| **Pseudonymization** | Yes (with key) | Still personal data | Analytics pipelines, audit logs, reduced-risk processing |
### Anonymization Techniques
| Technique | How | When |
|---|---|---|
| Suppression | Remove the field entirely | Fields with no analytical value |
| Masking | Replace with fixed placeholder (`"ANONYMIZED_USER"`) | Audit log identifiers after erasure |
| Generalization | Replace exact value with a range (age 34 → "3040") | Analytics |
| Noise addition | Add statistical noise to numerical values | Aggregate analytics |
| Aggregation | Report group statistics, never individual values | Reporting |
| K-anonymity | Ensure each record is indistinguishable from k-1 others | Analytics datasets |
### Pseudonymization Techniques
| Technique | How |
|---|---|
| HMAC-SHA256 with secret key | Consistent, one-way, keyed. Use for user IDs in analytics. Key in KMS. |
| Tokenization | Replace value with opaque token; mapping in separate secure vault. |
| Encryption with separate key | Decrypt only with explicit KMS authorization. |
**MUST**
- When erasing a user, **anonymize** records that must be retained (financial, audit logs) — replace identifying fields with `"ANONYMIZED"` or a hashed placeholder.
- Store the pseudonymization key in the KMS — never in the same database as the pseudonymized data.
- Test anonymization routines with assertions: the original value MUST NOT be recoverable from the output.
**Crypto-shredding pattern (event sourcing):**
Encrypt personal data in events with a per-user DEK. Store the DEK in the KMS.
On erasure: delete the DEK from the KMS → all events for that user are effectively anonymized.
**MUST NOT**
- Call data "anonymized" if re-identification is possible through linkage with other datasets.
- Apply pseudonymization and store the mapping key in the same table as the pseudonymized data.
---
## Cloud & DevOps Practices
**MUST**
- Enable encryption at rest for all cloud storage: blobs, managed databases, queues, caches.
- Use **private endpoints** — databases MUST NOT be publicly accessible.
- Apply network security groups / firewall rules: restrict DB access to application layers only.
- Enable cloud-native audit logging: Azure Monitor / AWS CloudTrail / GCP Cloud Audit Logs.
- Store personal data only in **approved geographic regions** (EEA, or adequacy decision / SCCs).
- Tag all cloud resources processing personal data with a `DataClassification` tag.
**SHOULD**
- Enable Microsoft Defender for Cloud / AWS Security Hub / GCP SCC — review recommendations weekly.
- Use **managed identities** (Azure) or **IAM roles** (AWS/GCP) instead of long-lived access keys.
- Enable soft delete and versioning on object storage.
- Apply DLP policies on cloud storage to detect PII written to unprotected buckets.
- Enable database-level audit logging for SELECT on sensitive tables.
**MUST NOT**
- Store personal data in public storage buckets without access controls.
- Deploy databases with public IPs in production.
- Use the same cloud account/subscription for production and non-production if data could bleed across.
---
## CI/CD Controls
**MUST**
- Run **secret scanning** on every commit: `gitleaks`, `detect-secrets`, GitHub secret scanning.
- Run **dependency vulnerability scanning** on every build: `npm audit`, `dotnet list package --vulnerable`, `trivy`, `snyk`.
- MUST NOT use real personal data in CI test jobs.
- MUST NOT log environment variables in CI pipelines — mask all secrets.
**SHOULD**
- Run **SAST**: SonarQube, Semgrep, or CodeQL on every PR.
- Run **container image scanning**: `trivy`, Snyk Container, or AWS ECR scanning.
- Add a **GDPR compliance gate** to the pipeline:
- New migrations without a documented retention period → fail.
- Log statements containing known PII field names → warn.
**Pipeline secret rules:**
```yaml
# MUST: mask secrets before use
- name: Mask secret
run: echo "::add-mask::${{ secrets.MY_SECRET }}"
# MUST NOT: echo secrets to console
- run: echo "Key=$API_KEY" # Never
# SHOULD: use OIDC federation (no long-lived keys)
- uses: azure/login@v1
with:
client-id: ${{ vars.AZURE_CLIENT_ID }}
tenant-id: ${{ vars.AZURE_TENANT_ID }}
subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
```
---
## Incident & Breach Handling
### Regulatory Timeline
| Window | Obligation |
|---|---|
| **72 hours** from awareness | Notify the supervisory authority (CNIL, APD, ICO…) — unless breach is unlikely to risk individuals |
| **Without undue delay** | Notify affected data subjects if breach is likely to result in **high risk** to their rights |
Log **all** personal data breaches internally — even those that do not require DPA notification.
### Breach Response Runbook (template)
1. **Detection** — Define criteria: what triggers an incident (credential leak, DB dump exposed, ransomware, accidental public bucket).
2. **Severity classification** — Low / Medium / High / Critical based on data sensitivity and volume.
3. **Containment** — Revoke compromised credentials; isolate affected systems; preserve evidence (do NOT delete logs).
4. **Assessment** — What data was exposed? How many subjects? What is the risk level?
5. **DPA notification** — Use the supervisory authority's online portal; include: nature of breach, categories and approximate number of data subjects, categories and approximate number of records, contact point, likely consequences, measures taken.
6. **Data subject notification** — If high risk: clear language, nature of breach, likely consequences, measures taken, DPO contact.
7. **Post-incident review** — Root cause analysis; corrective measures; update runbook.
### Automated Breach Detection Alerts
Configure alerts for:
- Unusual volume of data exports (threshold per hour)
- Access to sensitive tables outside business hours
- Bulk deletion events
- Failed authentication spikes
- New credentials appearing in public breach databases (HaveIBeenPwned monitoring)
Store breach records internally for at least **5 years**.
---
## Architecture Patterns
### Data Store Separation
Separate operational data (transactional DB) from analytical data (data warehouse).
Apply different retention periods and access controls to each.
The analytics store MUST NOT read directly from production operational tables.
### Dedicated Consent Store
Track consent as an immutable event log in a separate store, not a boolean column on the user table.
This enables: auditable consent history, version tracking, easy withdrawal without data loss.
### Audit Log Segregation
Store audit logs in a separate, append-only store.
The application service account MUST NOT be able to delete audit log entries.
Use a separate DB user with INSERT-only rights on the audit table.
### DSR Queue Pattern
Implement Data Subject Requests as an asynchronous workflow:
`POST /api/v1/me/erasure-request` → enqueue a job → worker scrubs all stores → notify user on completion.
This handles the complexity of multi-store scrubbing reliably and provides a retry mechanism.
### Pseudonymization Gateway
For analytics pipelines, implement a pseudonymization service at the boundary between
operational and analytical systems.
The mapping key (HMAC secret or tokenization vault) never leaves the operational zone.
The analytics zone receives only pseudonymized identifiers.
### Crypto-Shredding (Event Sourcing)
Encrypt personal data in events with a per-user DEK stored in the KMS.
On user erasure: delete the DEK → all historical events for that user are effectively anonymized
without modifying the event log.

View File

@@ -0,0 +1,177 @@
# GDPR Reference — Data Rights, Accountability & Governance
Load this file when you need implementation detail on:
user rights endpoints, Data Subject Request (DSR) workflow,
Record of Processing Activities (RoPA), consent management.
---
## User Rights Implementation (Articles 1522)
Every right MUST have a tested API endpoint or documented back-office process
before the system goes live. Respond to verified requests within **30 calendar days**.
| Right | Article | Engineering implementation |
|---|---|---|
| Right of access | 15 | `GET /api/v1/me/data-export` — all personal data, JSON or CSV |
| Right to rectification | 16 | `PUT /api/v1/me/profile` — propagate to all downstream stores |
| Right to erasure | 17 | `DELETE /api/v1/me` — scrub all stores per erasure checklist |
| Right to restriction | 18 | `ProcessingRestricted` flag on user record; gate non-essential processing |
| Right to portability | 20 | Same as access endpoint; structured, machine-readable (JSON) |
| Right to object | 21 | Opt-out endpoint for legitimate-interest processing; honor immediately |
| Automated decision-making | 22 | Expose a human review path + explanation of the logic |
### Erasure Checklist — MUST cover all stores
When `DELETE /api/v1/me` is called, the erasure pipeline MUST scrub:
- Primary relational database (anonymize or delete rows)
- Read replicas
- Search index (Elasticsearch, Azure Cognitive Search, etc.)
- In-memory cache (Redis, IMemoryCache)
- Object storage (S3, Azure Blob — profile pictures, documents)
- Email service logs (Brevo, SendGrid — delivery logs)
- Analytics platform (Mixpanel, Amplitude, GA4 — user deletion API)
- Audit logs (anonymize identifying fields — do not delete the event)
- Backups (document the backup TTL; accept that backups expire naturally)
- CDN edge cache (purge if personal data may be cached)
- Third-party sub-processors (trigger their deletion API or document the manual step)
### Data Export Format (`GET /api/v1/me/data-export`)
```json
{
"exportedAt": "2025-03-30T10:00:00Z",
"subject": {
"id": "uuid",
"email": "user@example.com",
"createdAt": "2024-01-15T08:30:00Z"
},
"profile": { ... },
"orders": [ ... ],
"consents": [ ... ],
"auditEvents": [ ... ]
}
```
- MUST be machine-readable (JSON preferred, CSV acceptable).
- MUST NOT be a PDF screenshot or HTML page.
- MUST include all stores listed in the RoPA for this user.
### DSR Tracker (back-office)
Implement a **Data Subject Request tracker** with:
- Incoming request date
- Request type (access / rectification / erasure / portability / restriction / objection)
- Verification status (identity confirmed y/n)
- Deadline (received date + 30 days)
- Assigned handler
- Completion date and outcome
- Notes
Automate the primary store scrubbing; document manual steps for third-party stores.
---
## Record of Processing Activities (RoPA)
Maintain as a living document (Markdown, YAML, or JSON) version-controlled in the repo.
Update with **every** new feature that introduces a processing activity.
### Minimum fields per processing activity
```yaml
- name: "User account management"
purpose: "Create and manage user accounts for service access"
legalBasis: "Contract (Art. 6(1)(b))"
dataSubjects: ["Registered users"]
personalDataCategories: ["Name", "Email", "Password hash", "IP address"]
recipients: ["Internal engineering team", "Brevo (email delivery)"]
retentionPeriod: "Account lifetime + 12 months"
transfers:
outside_eea: true
safeguard: "Brevo — Standard Contractual Clauses (SCCs)"
securityMeasures: ["TLS 1.3", "AES-256 at rest", "bcrypt password hashing"]
dpia_required: false
```
### Legal basis options (Art. 6)
| Basis | When to use |
|---|---|
| `Contract (6(1)(b))` | Processing necessary to fulfill the service contract |
| `Legitimate interest (6(1)(f))` | Fraud prevention, security, analytics (requires balancing test) |
| `Consent (6(1)(a))` | Marketing, non-essential cookies, optional profiling |
| `Legal obligation (6(1)(c))` | Tax records, anti-money-laundering |
| `Vital interest (6(1)(d))` | Emergency situations only |
| `Public task (6(1)(e))` | Public authorities |
---
## Consent Management
### MUST
- Store consent as an **immutable event log**, not a mutable boolean flag.
- Record: what was consented to, when, which version of the privacy policy, the mechanism.
- Load analytics / marketing SDKs **conditionally** — only after consent is granted.
- Provide a consent withdrawal mechanism as easy to use as the consent grant.
### Consent store schema (minimum)
```sql
CREATE TABLE ConsentRecords (
Id UUID PRIMARY KEY,
UserId UUID NOT NULL,
Purpose VARCHAR(100) NOT NULL, -- e.g. "marketing_emails", "analytics"
Granted BOOLEAN NOT NULL,
PolicyVersion VARCHAR(20) NOT NULL,
ConsentedAt TIMESTAMPTZ NOT NULL,
IpAddressHash VARCHAR(64), -- HMAC-SHA256 of anonymized IP
UserAgent VARCHAR(500)
);
```
### MUST NOT
- MUST NOT pre-tick consent checkboxes.
- MUST NOT bundle consent for marketing with consent for service delivery.
- MUST NOT make service access conditional on marketing consent.
- MUST NOT use dark patterns (e.g., "Accept all" prominent, "Reject" buried).
---
## Sub-processor Management
Maintain a **sub-processor list** updated with every new SaaS tool or cloud service
that touches personal data.
Minimum fields per sub-processor:
| Field | Example |
|---|---|
| Name | Brevo |
| Service | Transactional email |
| Data categories transferred | Email address, name, email content |
| Processing location | EU (Paris) |
| DPA signed | 2024-01-10 |
| DPA URL / reference | [link] |
| SCCs applicable | N/A (EU-based) |
**MUST** review the sub-processor list annually and upon any change.
**MUST NOT** allow data to flow to a new sub-processor before a DPA is signed.
---
## DPIA Triggers (Article 35)
A DPIA is **mandatory** before processing that is likely to result in a high risk. Triggers include:
- Systematic and extensive profiling with significant effects on individuals
- Large-scale processing of special category data (health, biometric, racial origin, sexual orientation, religion)
- Systematic monitoring of publicly accessible areas (CCTV, location tracking)
- Processing of children's data at scale
- Innovative technology with unknown privacy implications
- Matching or combining datasets from multiple sources
When in doubt: conduct the DPIA anyway. Document the outcome.