feat: add GDPR-compliant engineering practices skill documentation (#1230)

* feat: add GDPR-compliant engineering practices skill documentation * Add GDPR compliance references for Security and Data Rights - Introduced a comprehensive Security.md file detailing encryption, password hashing, secrets management, anonymization, cloud practices, CI/CD controls, and incident response protocols. - Created a Data Rights.md file outlining user rights implementation, Record of Processing Activities (RoPA), consent management, sub-processor management, and DPIA triggers. * Refine GDPR compliance documentation by removing unnecessary symbols and ensuring clarity in security and data rights references * refactor: streamline description formatting in GDPR compliance skill documentation --------- Co-authored-by: Aaron Powell <me@aaron-powell.com>
2026-07-18 19:51:50 +00:00 · 2026-03-31 01:53:25 +02:00
parent 235f5740d4
commit 7454bbdb7c
4 changed files with 727 additions and 0 deletions
@@ -0,0 +1,266 @@
+# GDPR Reference — Security, Operations & Architecture
+
+Load this file when you need implementation detail on:
+encryption, password hashing, secrets management, anonymization/pseudonymization,
+cloud/DevOps practices, CI/CD controls, incident response, architecture patterns.
+
+---
+
+## Encryption
+
+### At-Rest Encryption
+
+| Data sensitivity | Minimum standard |
+|---|---|
+| Standard personal data (name, address, email) | AES-256 disk/volume encryption (cloud provider default) |
+| Sensitive personal data (health, biometric, financial, national ID) | AES-256 **column-level** encryption + envelope encryption via KMS |
+| Encryption keys | HSM-backed KMS (Azure Key Vault Premium / AWS KMS CMK / GCP Cloud KMS) |
+
+**Envelope encryption pattern:**
+1. Encrypt data with a **Data Encryption Key (DEK)** (AES-256, generated per record or per table).
+2. Encrypt the DEK with a **Key Encryption Key (KEK)** stored in the KMS.
+3. Store the encrypted DEK alongside the encrypted data.
+4. Deleting the KEK = effective crypto-shredding of all data encrypted with it.
+
+### In-Transit Encryption
+
+- **MUST** enforce TLS 1.2 minimum; prefer TLS 1.3.
+- **MUST** set `Strict-Transport-Security: max-age=31536000; includeSubDomains; preload`.
+- **MUST NOT** allow TLS 1.0, TLS 1.1, null cipher suites, or export-grade ciphers.
+- **MUST NOT** use self-signed certificates in production.
+
+### Key Management
+
+- Rotate DEKs annually minimum; rotate immediately upon suspected compromise.
+- Use separate key namespaces per environment (dev / staging / prod).
+- Log all KMS key access events — alert on anomalous access patterns.
+- MUST NOT hardcode encryption keys in source code or configuration files.
+
+---
+
+## Password Hashing
+
+| Algorithm | Parameters | Notes |
+|---|---|---|
+| **Argon2id**  recommended | memory ≥ 64 MB, iterations ≥ 3, parallelism ≥ 4 | OWASP and NIST recommended |
+| **bcrypt**  acceptable | cost factor ≥ 12 | Widely supported; use if Argon2id unavailable |
+| **scrypt**  acceptable | N=32768, r=8, p=1 | Good alternative |
+| MD5  | — | Never — trivially broken |
+| SHA-1 / SHA-256  | — | Never for passwords — not designed for this purpose |
+
+**MUST**
+- Use a unique salt per password (built into all three algorithms above).
+- Store only the hash — never the plaintext, never a reversible encoding.
+- Re-hash on login if the stored hash uses an outdated algorithm — upgrade transparently.
+
+**SHOULD**
+- Add a **pepper** (server-side secret added before hashing) stored in the KMS, not in the DB.
+- Check passwords against known breach lists at registration (`haveibeenpwned` API, k-anonymity mode).
+- Enforce minimum password length of 12 characters.
+
+**MUST NOT**
+- Log passwords in any form — not during registration, not during failed login.
+- Transmit passwords in URLs or query strings.
+- Store password reset tokens in plaintext — hash them before storage.
+
+---
+
+## Secrets Management
+
+**MUST**
+- Store all secrets in a dedicated secret manager: Azure Key Vault, AWS Secrets Manager,
+  GCP Secret Manager, or HashiCorp Vault.
+- Use pre-commit hooks to prevent secret commits: `gitleaks`, `detect-secrets`, GitHub native secret scanning.
+- Rotate secrets immediately upon: developer offboarding, suspected compromise, annual schedule.
+- Maintain a **secrets inventory document** — every secret listed with its purpose and rotation date.
+
+**SHOULD**
+- Use **short-lived credentials** via OIDC federation (GitHub Actions → Azure/AWS/GCP) instead of long-lived API keys.
+- Audit all KMS secret access — alert on access outside business hours or from unexpected sources.
+- Use separate secret namespaces per environment.
+
+**`.gitignore` MUST include:**
+```
+.env
+.env.*
+*.pem
+*.key
+*.pfx
+*.p12
+secrets/
+appsettings.*.json   # if it may contain connection strings
+```
+
+**MUST NOT**
+- Commit secrets to source code repositories.
+- Pass secrets as plain-text CLI arguments (they appear in process lists and shell history).
+- Store secrets as unencrypted environment variable defaults in code.
+
+---
+
+## Anonymization & Pseudonymization
+
+### Definitions
+
+| Term | Reversible? | GDPR scope? | Use case |
+|---|---|---|---|
+| **Anonymization** | No | Outside GDPR scope | Retained records after erasure, analytics datasets |
+| **Pseudonymization** | Yes (with key) | Still personal data | Analytics pipelines, audit logs, reduced-risk processing |
+
+### Anonymization Techniques
+
+| Technique | How | When |
+|---|---|---|
+| Suppression | Remove the field entirely | Fields with no analytical value |
+| Masking | Replace with fixed placeholder (`"ANONYMIZED_USER"`) | Audit log identifiers after erasure |
+| Generalization | Replace exact value with a range (age 34 → "30–40") | Analytics |
+| Noise addition | Add statistical noise to numerical values | Aggregate analytics |
+| Aggregation | Report group statistics, never individual values | Reporting |
+| K-anonymity | Ensure each record is indistinguishable from k-1 others | Analytics datasets |
+
+### Pseudonymization Techniques
+
+| Technique | How |
+|---|---|
+| HMAC-SHA256 with secret key | Consistent, one-way, keyed. Use for user IDs in analytics. Key in KMS. |
+| Tokenization | Replace value with opaque token; mapping in separate secure vault. |
+| Encryption with separate key | Decrypt only with explicit KMS authorization. |
+
+**MUST**
+- When erasing a user, **anonymize** records that must be retained (financial, audit logs) — replace identifying fields with `"ANONYMIZED"` or a hashed placeholder.
+- Store the pseudonymization key in the KMS — never in the same database as the pseudonymized data.
+- Test anonymization routines with assertions: the original value MUST NOT be recoverable from the output.
+
+**Crypto-shredding pattern (event sourcing):**
+Encrypt personal data in events with a per-user DEK. Store the DEK in the KMS.
+On erasure: delete the DEK from the KMS → all events for that user are effectively anonymized.
+
+**MUST NOT**
+- Call data "anonymized" if re-identification is possible through linkage with other datasets.
+- Apply pseudonymization and store the mapping key in the same table as the pseudonymized data.
+
+---
+
+## Cloud & DevOps Practices
+
+**MUST**
+- Enable encryption at rest for all cloud storage: blobs, managed databases, queues, caches.
+- Use **private endpoints** — databases MUST NOT be publicly accessible.
+- Apply network security groups / firewall rules: restrict DB access to application layers only.
+- Enable cloud-native audit logging: Azure Monitor / AWS CloudTrail / GCP Cloud Audit Logs.
+- Store personal data only in **approved geographic regions** (EEA, or adequacy decision / SCCs).
+- Tag all cloud resources processing personal data with a `DataClassification` tag.
+
+**SHOULD**
+- Enable Microsoft Defender for Cloud / AWS Security Hub / GCP SCC — review recommendations weekly.
+- Use **managed identities** (Azure) or **IAM roles** (AWS/GCP) instead of long-lived access keys.
+- Enable soft delete and versioning on object storage.
+- Apply DLP policies on cloud storage to detect PII written to unprotected buckets.
+- Enable database-level audit logging for SELECT on sensitive tables.
+
+**MUST NOT**
+- Store personal data in public storage buckets without access controls.
+- Deploy databases with public IPs in production.
+- Use the same cloud account/subscription for production and non-production if data could bleed across.
+
+---
+
+## CI/CD Controls
+
+**MUST**
+- Run **secret scanning** on every commit: `gitleaks`, `detect-secrets`, GitHub secret scanning.
+- Run **dependency vulnerability scanning** on every build: `npm audit`, `dotnet list package --vulnerable`, `trivy`, `snyk`.
+- MUST NOT use real personal data in CI test jobs.
+- MUST NOT log environment variables in CI pipelines — mask all secrets.
+
+**SHOULD**
+- Run **SAST**: SonarQube, Semgrep, or CodeQL on every PR.
+- Run **container image scanning**: `trivy`, Snyk Container, or AWS ECR scanning.
+- Add a **GDPR compliance gate** to the pipeline:
+  - New migrations without a documented retention period → fail.
+  - Log statements containing known PII field names → warn.
+
+**Pipeline secret rules:**
+```yaml
+# MUST: mask secrets before use
+- name: Mask secret
+  run: echo "::add-mask::${{ secrets.MY_SECRET }}"
+
+# MUST NOT: echo secrets to console
+- run: echo "Key=$API_KEY"   # Never
+
+# SHOULD: use OIDC federation (no long-lived keys)
+- uses: azure/login@v1
+  with:
+    client-id: ${{ vars.AZURE_CLIENT_ID }}
+    tenant-id: ${{ vars.AZURE_TENANT_ID }}
+    subscription-id: ${{ vars.AZURE_SUBSCRIPTION_ID }}
+```
+
+---
+
+## Incident & Breach Handling
+
+### Regulatory Timeline
+
+| Window | Obligation |
+|---|---|
+| **72 hours** from awareness | Notify the supervisory authority (CNIL, APD, ICO…) — unless breach is unlikely to risk individuals |
+| **Without undue delay** | Notify affected data subjects if breach is likely to result in **high risk** to their rights |
+
+Log **all** personal data breaches internally — even those that do not require DPA notification.
+
+### Breach Response Runbook (template)
+
+1. **Detection** — Define criteria: what triggers an incident (credential leak, DB dump exposed, ransomware, accidental public bucket).
+2. **Severity classification** — Low / Medium / High / Critical based on data sensitivity and volume.
+3. **Containment** — Revoke compromised credentials; isolate affected systems; preserve evidence (do NOT delete logs).
+4. **Assessment** — What data was exposed? How many subjects? What is the risk level?
+5. **DPA notification** — Use the supervisory authority's online portal; include: nature of breach, categories and approximate number of data subjects, categories and approximate number of records, contact point, likely consequences, measures taken.
+6. **Data subject notification** — If high risk: clear language, nature of breach, likely consequences, measures taken, DPO contact.
+7. **Post-incident review** — Root cause analysis; corrective measures; update runbook.
+
+### Automated Breach Detection Alerts
+
+Configure alerts for:
+- Unusual volume of data exports (threshold per hour)
+- Access to sensitive tables outside business hours
+- Bulk deletion events
+- Failed authentication spikes
+- New credentials appearing in public breach databases (HaveIBeenPwned monitoring)
+
+Store breach records internally for at least **5 years**.
+
+---
+
+## Architecture Patterns
+
+### Data Store Separation
+Separate operational data (transactional DB) from analytical data (data warehouse).
+Apply different retention periods and access controls to each.
+The analytics store MUST NOT read directly from production operational tables.
+
+### Dedicated Consent Store
+Track consent as an immutable event log in a separate store, not a boolean column on the user table.
+This enables: auditable consent history, version tracking, easy withdrawal without data loss.
+
+### Audit Log Segregation
+Store audit logs in a separate, append-only store.
+The application service account MUST NOT be able to delete audit log entries.
+Use a separate DB user with INSERT-only rights on the audit table.
+
+### DSR Queue Pattern
+Implement Data Subject Requests as an asynchronous workflow:
+`POST /api/v1/me/erasure-request` → enqueue a job → worker scrubs all stores → notify user on completion.
+This handles the complexity of multi-store scrubbing reliably and provides a retry mechanism.
+
+### Pseudonymization Gateway
+For analytics pipelines, implement a pseudonymization service at the boundary between
+operational and analytical systems.
+The mapping key (HMAC secret or tokenization vault) never leaves the operational zone.
+The analytics zone receives only pseudonymized identifiers.
+
+### Crypto-Shredding (Event Sourcing)
+Encrypt personal data in events with a per-user DEK stored in the KMS.
+On user erasure: delete the DEK → all historical events for that user are effectively anonymized
+without modifying the event log.
@@ -0,0 +1,177 @@
+# GDPR Reference — Data Rights, Accountability & Governance
+
+Load this file when you need implementation detail on:
+user rights endpoints, Data Subject Request (DSR) workflow,
+Record of Processing Activities (RoPA), consent management.
+
+---
+
+## User Rights Implementation (Articles 15–22)
+
+Every right MUST have a tested API endpoint or documented back-office process
+before the system goes live. Respond to verified requests within **30 calendar days**.
+
+| Right | Article | Engineering implementation |
+|---|---|---|
+| Right of access | 15 | `GET /api/v1/me/data-export` — all personal data, JSON or CSV |
+| Right to rectification | 16 | `PUT /api/v1/me/profile` — propagate to all downstream stores |
+| Right to erasure | 17 | `DELETE /api/v1/me` — scrub all stores per erasure checklist |
+| Right to restriction | 18 | `ProcessingRestricted` flag on user record; gate non-essential processing |
+| Right to portability | 20 | Same as access endpoint; structured, machine-readable (JSON) |
+| Right to object | 21 | Opt-out endpoint for legitimate-interest processing; honor immediately |
+| Automated decision-making | 22 | Expose a human review path + explanation of the logic |
+
+### Erasure Checklist — MUST cover all stores
+
+When `DELETE /api/v1/me` is called, the erasure pipeline MUST scrub:
+
+- Primary relational database (anonymize or delete rows)
+- Read replicas
+- Search index (Elasticsearch, Azure Cognitive Search, etc.)
+- In-memory cache (Redis, IMemoryCache)
+- Object storage (S3, Azure Blob — profile pictures, documents)
+- Email service logs (Brevo, SendGrid — delivery logs)
+- Analytics platform (Mixpanel, Amplitude, GA4 — user deletion API)
+- Audit logs (anonymize identifying fields — do not delete the event)
+- Backups (document the backup TTL; accept that backups expire naturally)
+- CDN edge cache (purge if personal data may be cached)
+- Third-party sub-processors (trigger their deletion API or document the manual step)
+
+### Data Export Format (`GET /api/v1/me/data-export`)
+
+```json
+{
+  "exportedAt": "2025-03-30T10:00:00Z",
+  "subject": {
+    "id": "uuid",
+    "email": "user@example.com",
+    "createdAt": "2024-01-15T08:30:00Z"
+  },
+  "profile": { ... },
+  "orders": [ ... ],
+  "consents": [ ... ],
+  "auditEvents": [ ... ]
+}
+```
+
+- MUST be machine-readable (JSON preferred, CSV acceptable).
+- MUST NOT be a PDF screenshot or HTML page.
+- MUST include all stores listed in the RoPA for this user.
+
+### DSR Tracker (back-office)
+
+Implement a **Data Subject Request tracker** with:
+- Incoming request date
+- Request type (access / rectification / erasure / portability / restriction / objection)
+- Verification status (identity confirmed y/n)
+- Deadline (received date + 30 days)
+- Assigned handler
+- Completion date and outcome
+- Notes
+
+Automate the primary store scrubbing; document manual steps for third-party stores.
+
+---
+
+## Record of Processing Activities (RoPA)
+
+Maintain as a living document (Markdown, YAML, or JSON) version-controlled in the repo.
+Update with **every** new feature that introduces a processing activity.
+
+### Minimum fields per processing activity
+
+```yaml
+- name: "User account management"
+  purpose: "Create and manage user accounts for service access"
+  legalBasis: "Contract (Art. 6(1)(b))"
+  dataSubjects: ["Registered users"]
+  personalDataCategories: ["Name", "Email", "Password hash", "IP address"]
+  recipients: ["Internal engineering team", "Brevo (email delivery)"]
+  retentionPeriod: "Account lifetime + 12 months"
+  transfers:
+    outside_eea: true
+    safeguard: "Brevo — Standard Contractual Clauses (SCCs)"
+  securityMeasures: ["TLS 1.3", "AES-256 at rest", "bcrypt password hashing"]
+  dpia_required: false
+```
+
+### Legal basis options (Art. 6)
+
+| Basis | When to use |
+|---|---|
+| `Contract (6(1)(b))` | Processing necessary to fulfill the service contract |
+| `Legitimate interest (6(1)(f))` | Fraud prevention, security, analytics (requires balancing test) |
+| `Consent (6(1)(a))` | Marketing, non-essential cookies, optional profiling |
+| `Legal obligation (6(1)(c))` | Tax records, anti-money-laundering |
+| `Vital interest (6(1)(d))` | Emergency situations only |
+| `Public task (6(1)(e))` | Public authorities |
+
+---
+
+## Consent Management
+
+### MUST
+
+- Store consent as an **immutable event log**, not a mutable boolean flag.
+- Record: what was consented to, when, which version of the privacy policy, the mechanism.
+- Load analytics / marketing SDKs **conditionally** — only after consent is granted.
+- Provide a consent withdrawal mechanism as easy to use as the consent grant.
+
+### Consent store schema (minimum)
+
+```sql
+CREATE TABLE ConsentRecords (
+    Id          UUID PRIMARY KEY,
+    UserId      UUID NOT NULL,
+    Purpose     VARCHAR(100) NOT NULL,   -- e.g. "marketing_emails", "analytics"
+    Granted     BOOLEAN NOT NULL,
+    PolicyVersion VARCHAR(20) NOT NULL,
+    ConsentedAt TIMESTAMPTZ NOT NULL,
+    IpAddressHash VARCHAR(64),           -- HMAC-SHA256 of anonymized IP
+    UserAgent   VARCHAR(500)
+);
+```
+
+### MUST NOT
+
+- MUST NOT pre-tick consent checkboxes.
+- MUST NOT bundle consent for marketing with consent for service delivery.
+- MUST NOT make service access conditional on marketing consent.
+- MUST NOT use dark patterns (e.g., "Accept all" prominent, "Reject" buried).
+
+---
+
+## Sub-processor Management
+
+Maintain a **sub-processor list** updated with every new SaaS tool or cloud service
+that touches personal data.
+
+Minimum fields per sub-processor:
+
+| Field | Example |
+|---|---|
+| Name | Brevo |
+| Service | Transactional email |
+| Data categories transferred | Email address, name, email content |
+| Processing location | EU (Paris) |
+| DPA signed |  2024-01-10 |
+| DPA URL / reference | [link] |
+| SCCs applicable | N/A (EU-based) |
+
+**MUST** review the sub-processor list annually and upon any change.
+**MUST NOT** allow data to flow to a new sub-processor before a DPA is signed.
+
+---
+
+## DPIA Triggers (Article 35)
+
+A DPIA is **mandatory** before processing that is likely to result in a high risk. Triggers include:
+
+- Systematic and extensive profiling with significant effects on individuals
+- Large-scale processing of special category data (health, biometric, racial origin, sexual orientation, religion)
+- Systematic monitoring of publicly accessible areas (CCTV, location tracking)
+- Processing of children's data at scale
+- Innovative technology with unknown privacy implications
+- Matching or combining datasets from multiple sources
+
+When in doubt: conduct the DPIA anyway. Document the outcome.