Adds a new Agent Skill - Acquire-Codebase-Knowledge (#1373)

* feat(skill): add acquire-codebase-knowledge skill documentation

* feat(templates): add architecture, concerns, conventions, integrations, stack, structure, and testing documentation templates

* feat(references): add inquiry checkpoints and stack detection documentation

* feat(scan): add script to collect project discovery information for acquire-codebase-knowledge skill

* feat(skills): add acquire-codebase-knowledge skill for codebase mapping and documentation

* feat(scan): enhance scan script with absolute path handling and improved output variable validation

* feat(scan): replace bash script with Python script for project discovery information collection

* feat(skills): update acquire-codebase-knowledge skill to replace scan.sh with scan.py
This commit is contained in:
Satya K
2026-04-14 05:59:57 +05:30
committed by GitHub
parent e163a40937
commit b8f3822748
12 changed files with 1450 additions and 0 deletions

View File

@@ -26,6 +26,7 @@ See [CONTRIBUTING.md](../CONTRIBUTING.md#adding-skills) for guidelines on how to
| Name | Description | Bundled Assets |
| ---- | ----------- | -------------- |
| [acquire-codebase-knowledge](../skills/acquire-codebase-knowledge/SKILL.md) | Use this skill when the user explicitly asks to map, document, or onboard into an existing codebase. Trigger for prompts like "map this codebase", "document this architecture", "onboard me to this repo", or "create codebase docs". Do not trigger for routine feature implementation, bug fixes, or narrow code edits unless the user asks for repository-level discovery. | `assets/templates`<br />`references/inquiry-checkpoints.md`<br />`references/stack-detection.md`<br />`scripts/scan.py` |
| [add-educational-comments](../skills/add-educational-comments/SKILL.md) | Add educational comments to the file specified, or prompt asking for file to comment if one is not provided. | None |
| [agent-governance](../skills/agent-governance/SKILL.md) | Patterns and techniques for adding governance, safety, and trust controls to AI agent systems. Use this skill when:<br />- Building AI agents that call external tools (APIs, databases, file systems)<br />- Implementing policy-based access controls for agent tool usage<br />- Adding semantic intent classification to detect dangerous prompts<br />- Creating trust scoring systems for multi-agent workflows<br />- Building audit trails for agent actions and decisions<br />- Enforcing rate limits, content filters, or tool restrictions on agents<br />- Working with any agent framework (PydanticAI, CrewAI, OpenAI Agents, LangChain, AutoGen) | None |
| [agent-owasp-compliance](../skills/agent-owasp-compliance/SKILL.md) | Check any AI agent codebase against the OWASP Agentic Security Initiative (ASI) Top 10 risks.<br />Use this skill when:<br />- Evaluating an agent system's security posture before production deployment<br />- Running a compliance check against OWASP ASI 2026 standards<br />- Mapping existing security controls to the 10 agentic risks<br />- Generating a compliance report for security review or audit<br />- Comparing agent framework security features against the standard<br />- Any request like "is my agent OWASP compliant?", "check ASI compliance", or "agentic security audit" | None |

View File

@@ -0,0 +1,174 @@
---
name: acquire-codebase-knowledge
description: 'Use this skill when the user explicitly asks to map, document, or onboard into an existing codebase. Trigger for prompts like "map this codebase", "document this architecture", "onboard me to this repo", or "create codebase docs". Do not trigger for routine feature implementation, bug fixes, or narrow code edits unless the user asks for repository-level discovery.'
license: MIT
compatibility: 'Cross-platform. Requires Python 3.8+ and git. Run scripts/scan.py from the target project root.'
metadata:
version: "1.3"
enhancements:
- Multi-language manifest detection (25+ languages supported)
- CI/CD pipeline detection (10+ platforms)
- Container & orchestration detection
- Code metrics by language
- Security & compliance config detection
- Performance testing markers
argument-hint: 'Optional: specific area to focus on, e.g. "architecture only", "testing and concerns"'
---
# Acquire Codebase Knowledge
Produces seven populated documents in `docs/codebase/` covering everything needed to work effectively on the project. Only document what is verifiable from files or terminal output — never infer or assume.
## Output Contract (Required)
Before finishing, all of the following must be true:
1. Exactly these files exist in `docs/codebase/`: `STACK.md`, `STRUCTURE.md`, `ARCHITECTURE.md`, `CONVENTIONS.md`, `INTEGRATIONS.md`, `TESTING.md`, `CONCERNS.md`.
2. Every claim is traceable to source files, config, or terminal output.
3. Unknowns are marked as `[TODO]`; intent-dependent decisions are marked `[ASK USER]`.
4. Every document includes a short "evidence" list with concrete file paths.
5. Final response includes numbered `[ASK USER]` questions and intent-vs-reality divergences.
## Workflow
Copy and track this checklist:
```
- [ ] Phase 1: Run scan, read intent documents
- [ ] Phase 2: Investigate each documentation area
- [ ] Phase 3: Populate all seven docs in docs/codebase/
- [ ] Phase 4: Validate docs, present findings, resolve all [ASK USER] items
```
## Focus Area Mode
If the user supplies a focus area (for example: "architecture only" or "testing and concerns"):
1. Always run Phase 1 in full.
2. Fully complete focus-area documents first.
3. For non-focus documents not yet analyzed, keep required sections present and mark unknowns as `[TODO]`.
4. Still run the Phase 4 validation loop on all seven documents before final output.
### Phase 1: Scan and Read Intent
1. Run the scan script from the target project root:
```bash
python3 "$SKILL_ROOT/scripts/scan.py" --output docs/codebase/.codebase-scan.txt
```
Where `$SKILL_ROOT` is the absolute path to the skill folder. Works on Windows, macOS, and Linux.
**Quick start:** If you have the path inline:
```bash
python3 /absolute/path/to/skills/acquire-codebase-knowledge/scripts/scan.py --output docs/codebase/.codebase-scan.txt
```
2. Search for `PRD`, `TRD`, `README`, `ROADMAP`, `SPEC`, `DESIGN` files and read them.
3. Summarise the stated project intent before reading any source code.
### Phase 2: Investigate
Use the scan output to answer questions for each of the seven templates. Load [`references/inquiry-checkpoints.md`](references/inquiry-checkpoints.md) for the full per-template question list.
If the stack is ambiguous (multiple manifest files, unfamiliar file types, no `package.json`), load [`references/stack-detection.md`](references/stack-detection.md).
### Phase 3: Populate Templates
Copy each template from `assets/templates/` into `docs/codebase/`. Fill in this order:
1. [STACK.md](assets/templates/STACK.md) — language, runtime, frameworks, all dependencies
2. [STRUCTURE.md](assets/templates/STRUCTURE.md) — directory layout, entry points, key files
3. [ARCHITECTURE.md](assets/templates/ARCHITECTURE.md) — layers, patterns, data flow
4. [CONVENTIONS.md](assets/templates/CONVENTIONS.md) — naming, formatting, error handling, imports
5. [INTEGRATIONS.md](assets/templates/INTEGRATIONS.md) — external APIs, databases, auth, monitoring
6. [TESTING.md](assets/templates/TESTING.md) — frameworks, file organization, mocking strategy
7. [CONCERNS.md](assets/templates/CONCERNS.md) — tech debt, bugs, security risks, perf bottlenecks
Use `[TODO]` for anything that cannot be determined from code. Use `[ASK USER]` where the right answer requires team intent.
### Phase 4: Validate, Repair, Verify
Run this mandatory validation loop before finalizing:
1. Validate each doc against `references/inquiry-checkpoints.md`.
2. For each non-trivial claim, confirm at least one evidence reference exists.
3. If any required section is missing or unsupported:
- Fix the document.
- Re-run validation.
4. Repeat until all seven docs pass.
Then present a summary of all seven documents, list every `[ASK USER]` item as a numbered question, and highlight any Intent vs. Reality divergences from Phase 1.
Validation pass criteria:
- No unsupported claims.
- No empty required sections.
- Unknowns use `[TODO]` rather than assumptions.
- Team-intent gaps are explicitly marked `[ASK USER]`.
---
## Gotchas
**Monorepos:** Root `package.json` may have no source — check for `workspaces`, `packages/`, or `apps/` directories. Each workspace may have independent dependencies and conventions. Map each sub-package separately.
**Outdated README:** README often describes intended architecture, not the current one. Cross-reference with actual file structure before treating any README claim as fact.
**TypeScript path aliases:** `tsconfig.json` `paths` config means imports like `@/foo` don't map directly to the filesystem. Map aliases to real paths before documenting structure.
**Generated/compiled output:** Never document patterns from `dist/`, `build/`, `generated/`, `.next/`, `out/`, or `__pycache__/`. These are artefacts — document source conventions only.
**`.env.example` reveals required config:** Secrets are never committed. Read `.env.example`, `.env.template`, or `.env.sample` to discover required environment variables.
**`devDependencies` ≠ production stack:** Only `dependencies` (or equivalent, e.g. `[tool.poetry.dependencies]`) runs in production. Document linters, formatters, and test frameworks separately as dev tooling.
**Test TODOs ≠ production debt:** TODOs inside `test/`, `tests/`, `__tests__/`, or `spec/` are coverage gaps, not production technical debt. Separate them in `CONCERNS.md`.
**High-churn files = fragile areas:** Files appearing most in recent git history have the highest modification rate and likely hidden complexity. Always note them in `CONCERNS.md`.
---
## Anti-Patterns
| ❌ Don't | ✅ Do instead |
|---------|--------------|
| "Uses Clean Architecture with Domain/Data layers." (when no such directories exist) | State only what directory structure actually shows. |
| "This is a Next.js project." (without checking `package.json`) | Check `dependencies` first. State what's actually there. |
| Guess the database from a variable name like `dbUrl` | Check manifest for `pg`, `mysql2`, `mongoose`, `prisma`, etc. |
| Document `dist/` or `build/` naming patterns as conventions | Source files only. |
---
## Enhanced Scan Output Sections
The `scan.py` script now produce the following sections in addition to the original output:
- **CODE METRICS** — Total files, lines of code by language, largest files (complexity signals)
- **CI/CD PIPELINES** — Detected GitHub Actions, GitLab CI, Jenkins, CircleCI, etc.
- **CONTAINERS & ORCHESTRATION** — Docker, Docker Compose, Kubernetes, Vagrant configs
- **SECURITY & COMPLIANCE** — Snyk, Dependabot, SECURITY.md, SBOM, security policies
- **PERFORMANCE & TESTING** — Benchmark configs, profiling markers, load testing tools
Use these sections during Phase 2 to inform investigation questions and identify tool-specific patterns.
---
## Bundled Assets
| Asset | When to load |
|-------|-------------|
| [`scripts/scan.py`](scripts/scan.py) | Phase 1 — run first, before reading any code (Python 3.8+ required) |
| [`references/inquiry-checkpoints.md`](references/inquiry-checkpoints.md) | Phase 2 — load for per-template investigation questions |
| [`references/stack-detection.md`](references/stack-detection.md) | Phase 2 — only if stack is ambiguous |
| [`assets/templates/STACK.md`](assets/templates/STACK.md) | Phase 3 step 1 |
| [`assets/templates/STRUCTURE.md`](assets/templates/STRUCTURE.md) | Phase 3 step 2 |
| [`assets/templates/ARCHITECTURE.md`](assets/templates/ARCHITECTURE.md) | Phase 3 step 3 |
| [`assets/templates/CONVENTIONS.md`](assets/templates/CONVENTIONS.md) | Phase 3 step 4 |
| [`assets/templates/INTEGRATIONS.md`](assets/templates/INTEGRATIONS.md) | Phase 3 step 5 |
| [`assets/templates/TESTING.md`](assets/templates/TESTING.md) | Phase 3 step 6 |
| [`assets/templates/CONCERNS.md`](assets/templates/CONCERNS.md) | Phase 3 step 7 |
Template usage mode:
- Default mode: complete only the "Core Sections (Required)" in each template.
- Extended mode: add optional sections only when the repo complexity justifies them.

View File

@@ -0,0 +1,49 @@
# Architecture
## Core Sections (Required)
### 1) Architectural Style
- Primary style: [layered/feature/event-driven/other]
- Why this classification: [short evidence-backed rationale]
- Primary constraints: [2-3 constraints that shape design]
### 2) System Flow
```text
[entry] -> [processing] -> [domain logic] -> [data/integration] -> [response/output]
```
Describe the flow in 4-6 steps using file-backed evidence.
### 3) Layer/Module Responsibilities
| Layer or module | Owns | Must not own | Evidence |
|-----------------|------|--------------|----------|
| [name] | [responsibility] | [non-responsibility] | [file] |
### 4) Reused Patterns
| Pattern | Where found | Why it exists |
|---------|-------------|---------------|
| [singleton/repository/adapter/etc] | [path] | [reason] |
### 5) Known Architectural Risks
- [Risk 1 + impact]
- [Risk 2 + impact]
### 6) Evidence
- [path/to/entrypoint]
- [path/to/main-layer-files]
- [path/to/data-or-integration-layer]
## Extended Sections (Optional)
Add only when needed:
- Startup or initialization order details
- Async/event topology diagrams
- Anti-pattern catalog with refactoring paths
- Failure-mode analysis and resilience posture

View File

@@ -0,0 +1,56 @@
# Codebase Concerns
## Core Sections (Required)
### 1) Top Risks (Prioritized)
| Severity | Concern | Evidence | Impact | Suggested action |
|----------|---------|----------|--------|------------------|
| [high/med/low] | [issue] | [file or scan output] | [impact] | [next action] |
### 2) Technical Debt
List the most important debt items only.
| Debt item | Why it exists | Where | Risk if ignored | Suggested fix |
|-----------|---------------|-------|-----------------|---------------|
| [item] | [reason] | [path] | [risk] | [fix] |
### 3) Security Concerns
| Risk | OWASP category (if applicable) | Evidence | Current mitigation | Gap |
|------|--------------------------------|----------|--------------------|-----|
| [risk] | [A01/A03/etc or N/A] | [path] | [what exists] | [what is missing] |
### 4) Performance and Scaling Concerns
| Concern | Evidence | Current symptom | Scaling risk | Suggested improvement |
|---------|----------|-----------------|-------------|-----------------------|
| [issue] | [path/metric] | [symptom] | [risk] | [action] |
### 5) Fragile/High-Churn Areas
| Area | Why fragile | Churn signal | Safe change strategy |
|------|-------------|-------------|----------------------|
| [path] | [reason] | [recent churn evidence] | [approach] |
### 6) `[ASK USER]` Questions
Add unresolved intent-dependent questions as a numbered list.
1. [ASK USER] [question]
### 7) Evidence
- [scan output section reference]
- [path/to/code-file]
- [path/to/config-or-history-evidence]
## Extended Sections (Optional)
Add only when needed:
- Full bug inventory
- Component-level remediation roadmap
- Cost/effort estimates by concern
- Dependency-risk and ownership mapping

View File

@@ -0,0 +1,52 @@
# Coding Conventions
## Core Sections (Required)
### 1) Naming Rules
| Item | Rule | Example | Evidence |
|------|------|---------|----------|
| Files | [RULE] | [EXAMPLE] | [FILE] |
| Functions/methods | [RULE] | [EXAMPLE] | [FILE] |
| Types/interfaces | [RULE] | [EXAMPLE] | [FILE] |
| Constants/env vars | [RULE] | [EXAMPLE] | [FILE] |
### 2) Formatting and Linting
- Formatter: [TOOL + CONFIG FILE]
- Linter: [TOOL + CONFIG FILE]
- Most relevant enforced rules: [RULE_1], [RULE_2], [RULE_3]
- Run commands: [COMMANDS]
### 3) Import and Module Conventions
- Import grouping/order: [RULE]
- Alias vs relative import policy: [RULE]
- Public exports/barrel policy: [RULE]
### 4) Error and Logging Conventions
- Error strategy by layer: [SHORT SUMMARY]
- Logging style and required context fields: [SUMMARY]
- Sensitive-data redaction rules: [SUMMARY]
### 5) Testing Conventions
- Test file naming/location rule: [RULE]
- Mocking strategy norm: [RULE]
- Coverage expectation: [RULE or TODO]
### 6) Evidence
- [path/to/lint-config]
- [path/to/format-config]
- [path/to/representative-source-file]
## Extended Sections (Optional)
Add only for large or inconsistent codebases:
- Layer-specific error handling matrix
- Language-specific strictness options
- Repo-specific commit/branching conventions
- Known convention violations to clean up

View File

@@ -0,0 +1,48 @@
# External Integrations
## Core Sections (Required)
### 1) Integration Inventory
| System | Type (API/DB/Queue/etc) | Purpose | Auth model | Criticality | Evidence |
|--------|---------------------------|---------|------------|-------------|----------|
| [name] | [type] | [purpose] | [auth] | [high/med/low] | [file] |
### 2) Data Stores
| Store | Role | Access layer | Key risk | Evidence |
|-------|------|--------------|----------|----------|
| [db/cache/etc] | [role] | [module] | [risk] | [file] |
### 3) Secrets and Credentials Handling
- Credential sources: [env/secrets manager/config]
- Hardcoding checks: [result]
- Rotation or lifecycle notes: [known/unknown]
### 4) Reliability and Failure Behavior
- Retry/backoff behavior: [implemented/none/partial]
- Timeout policy: [where configured]
- Circuit-breaker or fallback behavior: [if any]
### 5) Observability for Integrations
- Logging around external calls: [yes/no + where]
- Metrics/tracing coverage: [yes/no + where]
- Missing visibility gaps: [list]
### 6) Evidence
- [path/to/integration-wrapper]
- [path/to/config-or-env-template]
- [path/to/monitoring-or-logging-config]
## Extended Sections (Optional)
Add only when needed:
- Endpoint-by-endpoint catalog
- Auth flow sequence diagrams
- SLA/SLO per integration
- Region/failover topology notes

View File

@@ -0,0 +1,56 @@
# Technology Stack
## Core Sections (Required)
### 1) Runtime Summary
| Area | Value | Evidence |
|------|-------|----------|
| Primary language | [VALUE] | [FILE_PATH] |
| Runtime + version | [VALUE] | [FILE_PATH] |
| Package manager | [VALUE] | [FILE_PATH] |
| Module/build system | [VALUE] | [FILE_PATH] |
### 2) Production Frameworks and Dependencies
List only high-impact production dependencies (frameworks, data, transport, auth).
| Dependency | Version | Role in system | Evidence |
|------------|---------|----------------|----------|
| [NAME] | [VERSION] | [ROLE] | [FILE_PATH] |
### 3) Development Toolchain
| Tool | Purpose | Evidence |
|------|---------|----------|
| [TOOL] | [LINT/FORMAT/TEST/BUILD] | [FILE_PATH] |
### 4) Key Commands
```bash
[install command]
[build command]
[test command]
[lint command]
```
### 5) Environment and Config
- Config sources: [LIST FILES]
- Required env vars: [VAR_1], [VAR_2], [TODO]
- Deployment/runtime constraints: [SHORT NOTE]
### 6) Evidence
- [path/to/manifest]
- [path/to/runtime-config]
- [path/to/build-or-ci-config]
## Extended Sections (Optional)
Add only when needed for complex repos:
- Full dependency taxonomy by category
- Detailed compiler/runtime flags
- Environment matrix (dev/stage/prod)
- Process manager and container runtime details

View File

@@ -0,0 +1,44 @@
# Codebase Structure
## Core Sections (Required)
### 1) Top-Level Map
List only meaningful top-level directories and files.
| Path | Purpose | Evidence |
|------|---------|----------|
| [path/] | [purpose] | [source] |
### 2) Entry Points
- Main runtime entry: [FILE]
- Secondary entry points (worker/cli/jobs): [FILES or NONE]
- How entry is selected (script/config): [NOTE]
### 3) Module Boundaries
| Boundary | What belongs here | What must not be here |
|----------|-------------------|------------------------|
| [module/layer] | [responsibility] | [forbidden logic] |
### 4) Naming and Organization Rules
- File naming pattern: [kebab/camel/Pascal + examples]
- Directory organization pattern: [feature/layer/domain]
- Import aliasing or path conventions: [RULE]
### 5) Evidence
- [path/to/root-tree-source]
- [path/to/entry-config]
- [path/to/key-module]
## Extended Sections (Optional)
Add only when repository complexity requires it:
- Subdirectory deep maps by feature/layer
- Middleware/boot order details
- Generated-vs-source layout boundaries
- Monorepo workspace-level structure maps

View File

@@ -0,0 +1,57 @@
# Testing Patterns
## Core Sections (Required)
### 1) Test Stack and Commands
- Primary test framework: [NAME + VERSION]
- Assertion/mocking tools: [TOOLS]
- Commands:
```bash
[run all tests]
[run unit tests]
[run integration/e2e tests]
[run coverage]
```
### 2) Test Layout
- Test file placement pattern: [co-located/tests folder/etc]
- Naming convention: [pattern]
- Setup files and where they run: [paths]
### 3) Test Scope Matrix
| Scope | Covered? | Typical target | Notes |
|-------|----------|----------------|-------|
| Unit | [yes/no] | [modules/services] | [notes] |
| Integration | [yes/no] | [API/data boundaries] | [notes] |
| E2E | [yes/no] | [user flows] | [notes] |
### 4) Mocking and Isolation Strategy
- Main mocking approach: [module/class/network]
- Isolation guarantees: [what is reset and when]
- Common failure mode in tests: [short note]
### 5) Coverage and Quality Signals
- Coverage tool + threshold: [value or TODO]
- Current reported coverage: [value or TODO]
- Known gaps/flaky areas: [list]
### 6) Evidence
- [path/to/test-config]
- [path/to/representative-test-file]
- [path/to/ci-or-coverage-config]
## Extended Sections (Optional)
Add only when needed:
- Framework-specific suite patterns
- Detailed mock recipes per dependency type
- Historical flaky test catalog
- Test performance bottlenecks and optimization ideas

View File

@@ -0,0 +1,70 @@
# Inquiry Checkpoints
Per-template investigation questions for Phase 2 of the acquire-codebase-knowledge workflow. For each template area, look for answers in the scan output first, then read source files to fill gaps.
---
## 1. STACK.md — Tech Stack
- What is the primary language and exact version? (check `.nvmrc`, `go.mod`, `pyproject.toml`, Docker `FROM` line)
- What package manager is used? (`npm`, `yarn`, `pnpm`, `go mod`, `pip`, `uv`)
- What are the core runtime frameworks? (web server, ORM, DI container)
- What do `dependencies` (production) vs `devDependencies` (dev tooling) contain?
- Is there a Docker image and what base image does it use?
- What are the key scripts in `package.json` / `Makefile` / `pyproject.toml`?
## 2. STRUCTURE.md — Directory Layout
- Where does source code live? (usually `src/`, `lib/`, or project root for Go)
- What are the entry points? (check `main` in `package.json`, `scripts.start`, `cmd/main.go`, `app.py`)
- What is the stated purpose of each top-level directory?
- Are there non-obvious directories (e.g., `eng/`, `platform/`, `infra/`)?
- Are there hidden config directories (`.github/`, `.vscode/`, `.husky/`)?
- What naming conventions do directories follow? (camelCase, kebab-case, domain-based vs layer-based)
## 3. ARCHITECTURE.md — Patterns
- Is the code organized by layer (controllers → services → repos) or by feature?
- What is the primary data flow? Trace one request or command from entry to data store.
- Are there singletons, dependency injection patterns, or explicit initialization order requirements?
- Are there background workers, queues, or event-driven components?
- What design patterns appear repeatedly? (Factory, Repository, Decorator, Strategy)
## 4. CONVENTIONS.md — Coding Standards
- What is the file naming convention? (check 10+ files — camelCase, kebab-case, PascalCase)
- What is the function and variable naming convention?
- Are private methods/fields prefixed (e.g., `_methodName`, `#field`)?
- What linter and formatter are configured? (check `.eslintrc`, `.prettierrc`, `golangci.yml`)
- What are the TypeScript strictness settings? (`strict`, `noImplicitAny`, etc.)
- How are errors handled at each layer? (throw vs. return structured error)
- What logging library is used and what is the log message format?
- How are imports organized? (barrel exports, path aliases, grouping rules)
## 5. INTEGRATIONS.md — External Services
- What external APIs are called? (search for `axios.`, `fetch(`, `http.Get(`, base URLs in constants)
- How are credentials stored and accessed? (`.env`, secrets manager, env vars)
- What databases are connected? (check manifest for `pg`, `mongoose`, `prisma`, `typeorm`, `sqlalchemy`)
- Is there an API gateway, service mesh, or proxy between the app and external services?
- What monitoring or observability tools are used? (APM, Prometheus, logging pipeline)
- Are there message queues or event buses? (Kafka, RabbitMQ, SQS, Pub/Sub)
## 6. TESTING.md — Test Setup
- What test runner is configured? (check `scripts.test` in `package.json`, `pytest.ini`, `go test`)
- Where are test files located? (alongside source, in `tests/`, in `__tests__/`)
- What assertion library is used? (Jest expect, Chai, pytest assert)
- How are external dependencies mocked? (jest.mock, dependency injection, fixtures)
- Are there integration tests that hit real services vs. unit tests with mocks?
- Is there a coverage threshold enforced? (check `jest.config.js`, `.nycrc`, `pyproject.toml`)
## 7. CONCERNS.md — Known Issues
- How many TODOs/FIXMEs/HACKs are in production code? (see scan output)
- Which files have the highest git churn in the last 90 days? (see scan output)
- Are there any files over 500 lines that mix multiple responsibilities?
- Do any services make sequential calls that could be parallelized?
- Are there hardcoded values (URLs, IDs, magic numbers) that should be config?
- What security risks exist? (missing input validation, raw error messages exposed to clients, missing auth checks)
- Are there performance patterns that don't scale? (N+1 queries, in-memory caches in multi-instance setups)

View File

@@ -0,0 +1,131 @@
# Stack Detection Reference
Load this file when the tech stack is ambiguous — e.g., multiple manifest files present, unfamiliar file extensions, or no obvious `package.json` / `go.mod`.
---
## Manifest File → Ecosystem
| File | Ecosystem | Key fields to read |
|------|-----------|--------------------|
| `package.json` | Node.js / JavaScript / TypeScript | `dependencies`, `devDependencies`, `scripts`, `main`, `type`, `engines` |
| `go.mod` | Go | Module path, Go version, `require` block |
| `requirements.txt` | Python (pip) | Package list with pinned versions |
| `Pipfile` | Python (pipenv) | `[packages]`, `[dev-packages]`, `[requires]` python version |
| `pyproject.toml` | Python (poetry / uv / hatch) | `[tool.poetry.dependencies]`, `[project]`, `[build-system]` |
| `setup.py` / `setup.cfg` | Python (setuptools, legacy) | `install_requires`, `python_requires` |
| `Cargo.toml` | Rust | `[dependencies]`, `[[bin]]`, `[lib]` |
| `pom.xml` | Java / Kotlin (Maven) | `<dependencies>`, `<artifactId>`, `<groupId>`, `<java.version>` |
| `build.gradle` / `build.gradle.kts` | Java / Kotlin (Gradle) | `dependencies {}`, `sourceCompatibility` |
| `composer.json` | PHP | `require`, `require-dev` |
| `Gemfile` | Ruby | `gem` declarations, `ruby` version constraint |
| `mix.exs` | Elixir | `deps/0`, `elixir: "~> X.Y"` |
| `pubspec.yaml` | Dart / Flutter | `dependencies`, `dev_dependencies`, `environment.sdk` |
| `*.csproj` | .NET / C# | `<PackageReference>`, `<TargetFramework>` |
| `*.sln` | .NET solution | References multiple `.csproj` projects |
| `deno.json` / `deno.jsonc` | Deno (TypeScript runtime) | `imports`, `tasks` |
| `bun.lockb` | Bun (JavaScript runtime) | Binary lockfile — check `package.json` for deps |
---
## Language Runtime Version Detection
| Language | Where to find the version |
|----------|--------------------------|
| Node.js | `.nvmrc`, `.node-version`, `engines.node` in `package.json`, Docker `FROM node:X` |
| Python | `.python-version`, `pyproject.toml [requires-python]`, Docker `FROM python:X` |
| Go | First line of `go.mod` (`go 1.21`) |
| Java | `<java.version>` in `pom.xml`, `sourceCompatibility` in `build.gradle`, Docker `FROM eclipse-temurin:X` |
| Ruby | `.ruby-version`, `Gemfile` `ruby 'X.Y.Z'` |
| Rust | `rust-toolchain.toml`, `rust-toolchain` file |
| .NET | `<TargetFramework>` in `.csproj` (e.g., `net8.0`) |
---
## Framework Detection (Node.js / TypeScript)
| Dependency in `package.json` | Framework |
|-----------------------------|-----------|
| `express` | Express.js (minimal HTTP server) |
| `fastify` | Fastify (high-performance HTTP server) |
| `next` | Next.js (SSR/SSG React — check for `pages/` or `app/` directory) |
| `nuxt` | Nuxt.js (SSR/SSG Vue) |
| `@nestjs/core` | NestJS (opinionated Node.js framework with DI) |
| `koa` | Koa (middleware-focused, no built-in router) |
| `@hapi/hapi` | Hapi |
| `@trpc/server` | tRPC (type-safe API without REST/GraphQL schemas) |
| `routing-controllers` | routing-controllers (decorator-based Express wrapper) |
| `typeorm` | TypeORM (SQL ORM with decorators) |
| `prisma` | Prisma (type-safe ORM, check `prisma/schema.prisma`) |
| `mongoose` | Mongoose (MongoDB ODM) |
| `sequelize` | Sequelize (SQL ORM) |
| `drizzle-orm` | Drizzle (lightweight SQL ORM) |
| `react` without `next` | Vanilla React SPA (check for `react-router-dom`) |
| `vue` without `nuxt` | Vanilla Vue SPA |
---
## Framework Detection (Python)
| Package | Framework |
|---------|-----------|
| `fastapi` | FastAPI (async REST, auto OpenAPI docs) |
| `flask` | Flask (minimal WSGI web framework) |
| `django` | Django (batteries-included, check `settings.py`) |
| `starlette` | Starlette (ASGI, often used as FastAPI base) |
| `aiohttp` | aiohttp (async HTTP client and server) |
| `sqlalchemy` | SQLAlchemy (SQL ORM; check for `alembic` migrations) |
| `alembic` | Alembic (SQLAlchemy migration tool) |
| `pydantic` | Pydantic (data validation; core to FastAPI) |
| `celery` | Celery (distributed task queue) |
---
## Monorepo Detection
Check these signals in order:
1. `pnpm-workspace.yaml` — pnpm workspaces
2. `lerna.json` — Lerna monorepo
3. `nx.json` — Nx monorepo (also check `workspace.json`)
4. `turbo.json` — Turborepo
5. `rush.json` — Rush (Microsoft monorepo manager)
6. `moon.yml` — Moon
7. `package.json` with `"workspaces": [...]` — npm/yarn workspaces
8. Presence of `packages/`, `apps/`, `libs/`, or `services/` directories with their own `package.json`
If monorepo is detected: each workspace may have **independent** dependencies and conventions. Map each sub-package separately in `STACK.md` and note the monorepo structure in `STRUCTURE.md`.
---
## TypeScript Path Alias Detection
If `tsconfig.json` has a `paths` key, imports with non-relative prefixes are aliases. Map them before documenting structure.
```json
// tsconfig.json example
"paths": {
"@/*": ["./src/*"],
"@components/*": ["./src/components/*"],
"@utils/*": ["./src/utils/*"]
}
```
Imports like `import { foo } from '@/utils/bar'` resolve to `src/utils/bar`. Document as `src/utils/bar`, not `@/utils/bar`.
---
## Docker Base Image → Runtime
If no manifest file is present but a `Dockerfile` exists, the `FROM` line reveals the runtime:
| FROM line pattern | Runtime |
|------------------|---------|
| `FROM node:X` | Node.js X |
| `FROM python:X` | Python X |
| `FROM golang:X` | Go X |
| `FROM eclipse-temurin:X` | Java X (Eclipse Temurin JDK) |
| `FROM mcr.microsoft.com/dotnet/aspnet:X` | .NET X |
| `FROM ruby:X` | Ruby X |
| `FROM rust:X` | Rust X |
| `FROM alpine` (alone) | Check what's installed via `RUN apk add` |

View File

@@ -0,0 +1,712 @@
#!/usr/bin/env python3
"""
scan.py — Collect project discovery information for the acquire-codebase-knowledge skill.
Run from the project root directory.
Usage: python3 scan.py [OPTIONS]
Options:
--output FILE Write output to FILE instead of stdout
--help Show this message and exit
Exit codes:
0 Success
1 Usage error
"""
import os
import sys
import argparse
import subprocess
import json
from pathlib import Path
from typing import List, Set
import re
TREE_LIMIT = 200
TREE_MAX_DEPTH = 3
TODO_LIMIT = 60
MANIFEST_PREVIEW_LINES = 80
RECENT_COMMITS_LIMIT = 20
CHURN_LIMIT = 20
EXCLUDE_DIRS = {
"node_modules", ".git", "dist", "build", "out", ".next", ".nuxt",
"__pycache__", ".venv", "venv", ".tox", "target", "vendor",
"coverage", ".nyc_output", "generated", ".cache", ".turbo",
".yarn", ".pnp", "bin", "obj"
}
MANIFESTS = [
# JavaScript/Node.js
"package.json", "package-lock.json", "yarn.lock", "pnpm-lock.yaml", "bun.lockb",
"deno.json", "deno.jsonc",
# Python
"requirements.txt", "Pipfile", "Pipfile.lock", "pyproject.toml", "setup.py", "setup.cfg",
"poetry.lock", "pdm.lock", "uv.lock",
# Go
"go.mod", "go.sum",
# Rust
"Cargo.toml", "Cargo.lock",
# Java/Kotlin
"pom.xml", "build.gradle", "build.gradle.kts", "settings.gradle", "settings.gradle.kts",
"gradle.properties",
# PHP/Composer
"composer.json", "composer.lock",
# Ruby
"Gemfile", "Gemfile.lock", "*.gemspec",
# Elixir
"mix.exs", "mix.lock",
# Dart/Flutter
"pubspec.yaml", "pubspec.lock",
# .NET/C#
"*.csproj", "*.sln", "*.slnx", "global.json", "packages.config",
# Swift
"Package.swift", "Package.resolved",
# Scala
"build.sbt", "scala-cli.yml",
# Haskell
"*.cabal", "stack.yaml", "cabal.project", "cabal.project.local",
# OCaml
"dune-project", "opam", "opam.lock",
# Nim
"*.nimble", "nim.cfg",
# Crystal
"shard.yml", "shard.lock",
# R
"DESCRIPTION", "renv.lock",
# Julia
"Project.toml", "Manifest.toml",
# Build systems
"CMakeLists.txt", "Makefile", "GNUmakefile",
"SConstruct", "build.xml",
"BUILD", "BUILD.bazel", "WORKSPACE", "bazel.lock",
"justfile", ".justfile", "Taskfile.yml",
"tox.ini", "Vagrantfile"
]
ENTRY_CANDIDATES = [
# JavaScript/Node.js/TypeScript
"src/index.ts", "src/index.js", "src/index.mjs",
"src/main.ts", "src/main.js", "src/main.py",
"src/app.ts", "src/app.js",
"src/server.ts", "src/server.js",
"index.ts", "index.js", "app.ts", "app.js",
"lib/index.ts", "lib/index.js",
# Go
"main.go", "cmd/main.go", "cmd/*/main.go",
# Python
"main.py", "app.py", "server.py", "run.py", "cli.py",
"src/main.py", "src/__main__.py",
# .NET/C#
"Program.cs", "src/Program.cs", "Main.cs",
# Java
"Main.java", "Application.java", "App.java",
"src/main/java/Main.java",
# Kotlin
"Main.kt", "Application.kt", "App.kt",
# Rust
"src/main.rs", "src/lib.rs",
# Swift
"main.swift", "Package.swift", "Sources/main.swift",
# Ruby
"app.rb", "main.rb", "lib/app.rb",
# PHP
"index.php", "app.php", "public/index.php",
# Go
"cmd/*/main.go",
# Scala
"src/main/scala/Main.scala",
# Haskell
"Main.hs", "app/Main.hs",
# Clojure
"src/core.clj", "-main.clj",
# Elixir
"lib/application.ex", "mix.exs",
]
LINT_FILES = [
".eslintrc", ".eslintrc.json", ".eslintrc.js", ".eslintrc.cjs", ".eslintrc.yml", ".eslintrc.yaml",
"eslint.config.js", "eslint.config.mjs", "eslint.config.cjs",
".prettierrc", ".prettierrc.json", ".prettierrc.js", ".prettierrc.yml",
"prettier.config.js", "prettier.config.mjs",
".editorconfig",
"tsconfig.json", "tsconfig.base.json", "tsconfig.build.json",
".golangci.yml", ".golangci.yaml",
"setup.cfg", ".flake8", ".pylintrc", "mypy.ini",
".rubocop.yml", "phpcs.xml", "phpstan.neon",
"biome.json", "biome.jsonc"
]
ENV_TEMPLATES = [".env.example", ".env.template", ".env.sample", ".env.defaults", ".env.local.example"]
SOURCE_EXTS = [
"ts", "tsx", "js", "jsx", "mjs", "cjs",
"py", "go", "java", "kt", "rb", "php",
"rs", "cs", "cpp", "c", "h", "ex", "exs",
"swift", "scala", "clj", "cljs", "lua",
"vim", "vim", "hs", "ml", "ml", "nim", "cr",
"r", "jl", "groovy", "gradle", "xml", "json"
]
MONOREPO_FILES = ["pnpm-workspace.yaml", "lerna.json", "nx.json", "rush.json", "turbo.json", "moon.yml"]
MONOREPO_DIRS = ["packages", "apps", "libs", "services", "modules"]
CI_CD_CONFIGS = {
".github/workflows": "GitHub Actions",
".gitlab-ci.yml": "GitLab CI",
"Jenkinsfile": "Jenkins",
".circleci/config.yml": "CircleCI",
".travis.yml": "Travis CI",
"azure-pipelines.yml": "Azure Pipelines",
"appveyor.yml": "AppVeyor",
".drone.yml": "Drone CI",
".woodpecker.yml": "Woodpecker CI",
"bitbucket-pipelines.yml": "Bitbucket Pipelines"
}
CONTAINER_FILES = [
"Dockerfile", "docker-compose.yml", "docker-compose.yaml",
".dockerignore", "Dockerfile.*",
"k8s", "kustomization.yaml", "Chart.yaml",
"Vagrantfile", "podman-compose.yml"
]
SECURITY_CONFIGS = [
".snyk", "security.txt", "SECURITY.md",
".dependabot.yml", ".whitesource",
"sbom.json", "sbom.spdx", ".bandit.yaml"
]
PERFORMANCE_MARKERS = [
"benchmark", "bench", "perf.data", ".prof",
"k6.js", "locustfile.py", "jmeter.jmx"
]
def parse_args():
"""Parse command-line arguments."""
parser = argparse.ArgumentParser(
description="Scan the current directory (project root) and output discovery information "
"for the acquire-codebase-knowledge skill.",
add_help=True
)
parser.add_argument(
"--output",
type=str,
help="Write output to FILE instead of stdout"
)
return parser.parse_args()
def should_exclude(path: Path) -> bool:
"""Check if a path should be excluded from scanning."""
return any(part in EXCLUDE_DIRS for part in path.parts)
def get_directory_tree(max_depth: int = TREE_MAX_DEPTH) -> List[str]:
"""Get directory tree up to max_depth."""
files = []
def walk(path: Path, depth: int):
if depth > max_depth or should_exclude(path):
return
try:
for item in sorted(path.iterdir()):
if should_exclude(item):
continue
rel_path = item.relative_to(Path.cwd())
files.append(str(rel_path))
if item.is_dir():
walk(item, depth + 1)
except (PermissionError, OSError):
pass
walk(Path.cwd(), 0)
return files[:TREE_LIMIT]
def find_manifest_files() -> List[str]:
"""Find manifest files matching patterns."""
found = []
for pattern in MANIFESTS:
if "*" in pattern:
# Handle glob patterns
for path in Path.cwd().glob(pattern):
if path.is_file() and not should_exclude(path):
found.append(path.name)
else:
path = Path.cwd() / pattern
if path.is_file():
found.append(pattern)
return sorted(set(found))
def read_file_preview(filepath: Path, max_lines: int = MANIFEST_PREVIEW_LINES) -> str:
"""Read file with line limit."""
try:
with open(filepath, 'r', encoding='utf-8', errors='replace') as f:
lines = f.readlines()
if not lines:
return "None found."
preview = ''.join(lines[:max_lines])
if len(lines) > max_lines:
preview += f"\n[TRUNCATED] Showing first {max_lines} of {len(lines)} lines."
return preview
except Exception as e:
return f"[Error reading file: {e}]"
def find_entry_points() -> List[str]:
"""Find entry point candidates."""
found = []
for candidate in ENTRY_CANDIDATES:
if Path(candidate).exists():
found.append(candidate)
return found
def find_lint_config() -> List[str]:
"""Find linting and formatting config files."""
found = []
for filename in LINT_FILES:
if Path(filename).exists():
found.append(filename)
return found
def find_env_templates() -> List[tuple]:
"""Find environment variable templates."""
found = []
for filename in ENV_TEMPLATES:
path = Path(filename)
if path.exists():
found.append((filename, path))
return found
def search_todos() -> List[str]:
"""Search for TODO/FIXME/HACK comments."""
todos = []
patterns = ["TODO", "FIXME", "HACK"]
exclude_dirs_str = "|".join(EXCLUDE_DIRS | {"test", "tests", "__tests__", "spec", "__mocks__", "fixtures"})
try:
for root, dirs, files in os.walk(Path.cwd()):
# Remove excluded directories from dirs to prevent os.walk from descending
dirs[:] = [d for d in dirs if d not in EXCLUDE_DIRS and d not in {"test", "tests", "__tests__", "spec", "__mocks__", "fixtures"}]
for file in files:
# Check file extension
ext = Path(file).suffix.lstrip('.')
if ext not in SOURCE_EXTS:
continue
filepath = Path(root) / file
try:
with open(filepath, 'r', encoding='utf-8', errors='replace') as f:
for line_num, line in enumerate(f, 1):
for pattern in patterns:
if pattern in line:
rel_path = filepath.relative_to(Path.cwd())
todos.append(f"{rel_path}:{line_num}: {line.strip()}")
except Exception:
pass
except Exception:
pass
return todos[:TODO_LIMIT]
def get_git_commits() -> List[str]:
"""Get recent git commits."""
try:
result = subprocess.run(
["git", "log", "--oneline", "-n", str(RECENT_COMMITS_LIMIT)],
capture_output=True,
text=True,
cwd=Path.cwd()
)
if result.returncode == 0:
return result.stdout.strip().split('\n') if result.stdout.strip() else []
return []
except Exception:
return []
def get_git_churn() -> List[str]:
"""Get high-churn files from last 90 days."""
try:
result = subprocess.run(
["git", "log", "--since=90 days ago", "--name-only", "--pretty=format:"],
capture_output=True,
text=True,
cwd=Path.cwd()
)
if result.returncode == 0:
files = [f.strip() for f in result.stdout.split('\n') if f.strip()]
# Count occurrences
from collections import Counter
counts = Counter(files)
churn = sorted(counts.items(), key=lambda x: x[1], reverse=True)
return [f"{count:4d} {filename}" for filename, count in churn[:CHURN_LIMIT]]
return []
except Exception:
return []
def is_git_repo() -> bool:
"""Check if current directory is a git repository."""
try:
subprocess.run(
["git", "rev-parse", "--git-dir"],
capture_output=True,
cwd=Path.cwd(),
timeout=2
)
return True
except Exception:
return False
def detect_monorepo() -> List[str]:
"""Detect monorepo signals."""
signals = []
for filename in MONOREPO_FILES:
if Path(filename).exists():
signals.append(f"Monorepo tool detected: {filename}")
for dirname in MONOREPO_DIRS:
if Path(dirname).is_dir():
signals.append(f"Sub-package directory found: {dirname}/")
# Check package.json workspaces
if Path("package.json").exists():
try:
with open("package.json", 'r') as f:
content = f.read()
if '"workspaces"' in content:
signals.append("package.json has 'workspaces' field (npm/yarn workspaces monorepo)")
except Exception:
pass
return signals
def detect_ci_cd_pipelines() -> List[str]:
"""Detect CI/CD pipeline configurations."""
pipelines = []
for config_path, pipeline_name in CI_CD_CONFIGS.items():
path = Path(config_path)
if path.is_file():
pipelines.append(f"CI/CD: {pipeline_name}")
elif path.is_dir():
# Check for workflow files in directory
try:
if list(path.glob("*.yml")) or list(path.glob("*.yaml")):
pipelines.append(f"CI/CD: {pipeline_name}")
except Exception:
pass
return pipelines
def detect_containers() -> List[str]:
"""Detect containerization and orchestration configs."""
containers = []
for config in CONTAINER_FILES:
path = Path(config)
if path.is_file():
if "Dockerfile" in config:
containers.append("Container: Docker found")
elif "docker-compose" in config:
containers.append("Orchestration: Docker Compose found")
elif config.endswith(".yaml") or config.endswith(".yml"):
containers.append(f"Container/Orchestration: {config}")
elif path.is_dir():
if config in ["k8s", "kubernetes"]:
containers.append("Orchestration: Kubernetes configs found")
try:
if list(path.glob("*.yml")) or list(path.glob("*.yaml")):
containers.append(f"Container/Orchestration: {config}/ directory found")
except Exception:
pass
return containers
def detect_security_configs() -> List[str]:
"""Detect security and compliance configurations."""
security = []
for config in SECURITY_CONFIGS:
if Path(config).exists():
config_name = config.replace(".yml", "").replace(".yaml", "").lstrip(".")
security.append(f"Security: {config_name}")
return security
def detect_performance_markers() -> List[str]:
"""Detect performance testing and profiling markers."""
performance = []
for marker in PERFORMANCE_MARKERS:
if Path(marker).exists():
performance.append(f"Performance: {marker} found")
else:
# Check for directories
try:
if Path(marker).is_dir():
performance.append(f"Performance: {marker}/ directory found")
except Exception:
pass
return performance
def collect_code_metrics() -> dict:
"""Collect code metrics: file counts by extension, total LOC."""
metrics = {
"total_files": 0,
"by_extension": {},
"by_language": {},
"total_lines": 0,
"largest_files": []
}
# Language mapping
lang_map = {
"ts": "TypeScript", "tsx": "TypeScript/React", "js": "JavaScript",
"jsx": "JavaScript/React", "py": "Python", "go": "Go",
"java": "Java", "kt": "Kotlin", "rs": "Rust",
"cs": "C#", "rb": "Ruby", "php": "PHP",
"swift": "Swift", "scala": "Scala", "ex": "Elixir",
"cpp": "C++", "c": "C", "h": "C Header",
"clj": "Clojure", "lua": "Lua", "hs": "Haskell"
}
file_sizes = []
try:
for root, dirs, files in os.walk(Path.cwd()):
dirs[:] = [d for d in dirs if d not in EXCLUDE_DIRS]
for file in files:
filepath = Path(root) / file
ext = filepath.suffix.lstrip('.')
if not ext or ext in {"pyc", "o", "a", "so"}:
continue
try:
size = filepath.stat().st_size
file_sizes.append((filepath.relative_to(Path.cwd()), size))
metrics["total_files"] += 1
metrics["by_extension"][ext] = metrics["by_extension"].get(ext, 0) + 1
lang = lang_map.get(ext, "Other")
metrics["by_language"][lang] = metrics["by_language"].get(lang, 0) + 1
# Count lines for text files
if ext in SOURCE_EXTS and size < 1_000_000: # Skip huge files
try:
with open(filepath, 'r', encoding='utf-8', errors='ignore') as f:
metrics["total_lines"] += len(f.readlines())
except Exception:
pass
except Exception:
pass
# Top 10 largest files
file_sizes.sort(key=lambda x: x[1], reverse=True)
metrics["largest_files"] = [
f"{str(f)}: {s/1024:.1f}KB" for f, s in file_sizes[:10]
]
except Exception:
pass
return metrics
def print_section(title: str, content: List[str], output_file=None) -> None:
"""Print a section with title and content."""
lines = [f"\n=== {title} ==="]
if isinstance(content, list):
lines.extend(content if content else ["None found."])
elif isinstance(content, str):
lines.append(content)
text = '\n'.join(lines) + '\n'
if output_file:
output_file.write(text)
else:
print(text, end='')
def main():
"""Main entry point."""
args = parse_args()
output_file = None
if args.output:
output_dir = Path(args.output).parent
output_dir.mkdir(parents=True, exist_ok=True)
output_file = open(args.output, 'w', encoding='utf-8')
print(f"Writing output to: {args.output}", file=sys.stderr)
try:
# Directory tree
print_section(
f"DIRECTORY TREE (max depth {TREE_MAX_DEPTH}, source files only)",
get_directory_tree(),
output_file
)
# Stack detection
manifests = find_manifest_files()
if manifests:
manifest_content = [""]
for manifest in manifests:
manifest_path = Path(manifest)
manifest_content.append(f"--- {manifest} ---")
if manifest == "bun.lockb":
manifest_content.append("[Binary lockfile — see package.json for dependency details.]")
else:
manifest_content.append(read_file_preview(manifest_path))
print_section("STACK DETECTION (manifest files)", manifest_content, output_file)
else:
print_section("STACK DETECTION (manifest files)", ["No recognized manifest files found in project root."], output_file)
# Entry points
entries = find_entry_points()
if entries:
entry_content = [f"Found: {e}" for e in entries]
print_section("ENTRY POINTS", entry_content, output_file)
else:
print_section("ENTRY POINTS", ["No common entry points found. Check 'main' or 'scripts.start' in manifest files above."], output_file)
# Linting config
lint = find_lint_config()
if lint:
lint_content = [f"Found: {l}" for l in lint]
print_section("LINTING AND FORMATTING CONFIG", lint_content, output_file)
else:
print_section("LINTING AND FORMATTING CONFIG", ["No linting or formatting config files found in project root."], output_file)
# Environment templates
envs = find_env_templates()
if envs:
env_content = []
for filename, filepath in envs:
env_content.append(f"--- {filename} ---")
env_content.append(read_file_preview(filepath))
print_section("ENVIRONMENT VARIABLE TEMPLATES", env_content, output_file)
else:
print_section("ENVIRONMENT VARIABLE TEMPLATES", ["No .env.example or .env.template found. Identify required environment variables by searching the code and config for environment variable reads."], output_file)
# TODOs
todos = search_todos()
if todos:
print_section("TODO / FIXME / HACK (production code only, test dirs excluded)", todos, output_file)
else:
print_section("TODO / FIXME / HACK (production code only, test dirs excluded)", ["None found."], output_file)
# Git info
if is_git_repo():
commits = get_git_commits()
if commits:
print_section("GIT RECENT COMMITS (last 20)", commits, output_file)
else:
print_section("GIT RECENT COMMITS (last 20)", ["No commits found."], output_file)
churn = get_git_churn()
if churn:
print_section("HIGH-CHURN FILES (last 90 days, top 20)", churn, output_file)
else:
print_section("HIGH-CHURN FILES (last 90 days, top 20)", ["None found."], output_file)
else:
print_section("GIT RECENT COMMITS (last 20)", ["Not a git repository or no commits yet."], output_file)
print_section("HIGH-CHURN FILES (last 90 days, top 20)", ["Not a git repository."], output_file)
# Monorepo detection
monorepo = detect_monorepo()
if monorepo:
print_section("MONOREPO SIGNALS", monorepo, output_file)
else:
print_section("MONOREPO SIGNALS", ["No monorepo signals detected."], output_file)
# Code metrics
metrics = collect_code_metrics()
metrics_output = [
f"Total files scanned: {metrics['total_files']}",
f"Total lines of code: {metrics['total_lines']}",
""
]
if metrics["by_language"]:
metrics_output.append("Files by language:")
for lang, count in sorted(metrics["by_language"].items(), key=lambda x: x[1], reverse=True):
metrics_output.append(f" {lang}: {count}")
if metrics["largest_files"]:
metrics_output.append("")
metrics_output.append("Top 10 largest files:")
metrics_output.extend(metrics["largest_files"])
print_section("CODE METRICS", metrics_output, output_file)
# CI/CD Detection
ci_cd = detect_ci_cd_pipelines()
if ci_cd:
print_section("CI/CD PIPELINES", ci_cd, output_file)
else:
print_section("CI/CD PIPELINES", ["No CI/CD pipelines detected."], output_file)
# Container Detection
containers = detect_containers()
if containers:
print_section("CONTAINERS & ORCHESTRATION", containers, output_file)
else:
print_section("CONTAINERS & ORCHESTRATION", ["No containerization configs detected."], output_file)
# Security Configs
security = detect_security_configs()
if security:
print_section("SECURITY & COMPLIANCE", security, output_file)
else:
print_section("SECURITY & COMPLIANCE", ["No security configs detected."], output_file)
# Performance Markers
performance = detect_performance_markers()
if performance:
print_section("PERFORMANCE & TESTING", performance, output_file)
else:
print_section("PERFORMANCE & TESTING", ["No performance testing configs detected."], output_file)
# Final message
final_msg = "\n=== SCAN COMPLETE ===\n"
if output_file:
output_file.write(final_msg)
else:
print(final_msg, end='')
return 0
except Exception as e:
print(f"Error: {e}", file=sys.stderr)
return 1
finally:
if output_file:
output_file.close()
if __name__ == "__main__":
sys.exit(main())