* Add visual-pr plugin: screenshot capture, annotation, PR embedding, and screen recording Four skills that teach Copilot to capture UI screenshots (Playwright + PIL), annotate them with algorithmic label placement, embed before/after images in PR descriptions, and record animated GIF demos. Includes demo images showing the annotation engine on GitHub Issues. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Update generated README tables and marketplace.json Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Embed annotate.py module in image-annotations skill The full working module (annotate_image, grid_image, diff_images) is now included as a code block so users can save it as annotate.py and import directly. Scrubbed project-specific labels from examples. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Address review feedback: mss.mss() context manager, fix RECT struct, consistent placeholder - Use mss.mss() context manager instead of mss.MSS() (ui-screenshots, screen-recording) - Fix broken RECT struct in window+GIF combining example (screen-recording) - Consistent projectId placeholder in AzDO upload example (pr-screenshots) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7.3 KiB
name, description
| name | description |
|---|---|
| ui-screenshots | Capture screenshots of web apps during development using Playwright and PIL. Supports full-page captures, interactive states, and an iterate-on-crop workflow that avoids slow re-screenshots. |
UI Screenshots
Capture screenshots of web apps and graphical UIs during development to document visual changes.
When to Use This Skill
Use this skill when you need to:
- Capture the current state of a running web app
- Document a UI before and after a code change
- Screenshot interactive states (tooltips, hovers, selected elements)
- Capture specific sections of a page without re-screenshotting
Prerequisites
pip install playwright Pillow -q
playwright install chromium
Core Workflow
1. Take a raw full-page screenshot
from playwright.async_api import async_playwright
async def capture(url="http://localhost:3000", out="screenshot-raw.png", width=1400, height=5000):
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page(viewport={"width": width, "height": height})
await page.goto(url, wait_until="networkidle")
await page.wait_for_timeout(4000) # let charts/animations render
await page.screenshot(path=out, full_page=True)
await browser.close()
- Use a tall viewport (height=5000) so the page renders everything without scrolling
wait_until="networkidle"+wait_for_timeout(4000)ensures async charts loadfull_page=Truecaptures the entire scrollable content
2. View the raw image, then crop with PIL
Do NOT try to get perfect crops via Playwright's clip parameter. It's unreliable with full-page captures.
from PIL import Image
img = Image.open("screenshot-raw.png")
cropped = img.crop((left, top, right, bottom)) # adjust based on what you see
cropped.save("screenshot-final.png")
- Take the raw screenshot
- View it to see actual pixel positions
- Crop with PIL based on what you see
- View the result — if not right, re-crop (instant, no re-screenshot needed)
3. Iterate on crop, not on capture
- Re-screenshotting is slow (browser launch + page load + render wait)
- Re-cropping is instant (just PIL)
- Get one good raw capture, then slice it as many ways as needed
4. Interactive states
element = page.locator("selector").first
await element.hover()
await page.wait_for_timeout(1000) # let tooltip appear
await page.screenshot(path="screenshot-hover.png", full_page=True)
For "selected" state without hover effect, move the mouse away after clicking:
await element.click()
await page.mouse.move(300, 300) # move away so hover doesn't show
await page.wait_for_timeout(500)
await page.screenshot(path="screenshot-selected.png", full_page=True)
5. Section-specific captures
Crop different sections from a single full-page screenshot:
img.crop((0, 200, 920, 900)).save("screenshot-header.png")
img.crop((0, 900, 920, 1600)).save("screenshot-main.png")
Guidelines
- Always capture before state BEFORE making any changes — if you forget, you have to revert code to get a before shot
- Before/after pairs must use the same viewport width and crop — otherwise the comparison is useless
- To get a "before" after you already changed code: use
git checkout HEAD~1 -- <files>to revert, screenshot, thengit checkout HEAD -- <files>to restore - For interactive states: capture before AND after for each state — don't assume the "normal" before covers all cases
- Use
device_scale_factor=1in Playwright to force 1x pixels so screenshots match what users see at 100% zoom - Charts need extra wait time — Plotly, D3, etc. render asynchronously; 4s minimum after networkidle
- Narrow viewport reveals rendering bugs — some border/alignment issues only appear at specific widths
Non-Web App Screenshots
For desktop apps (VS, WPF, WinForms, console apps, terminals) where Playwright can't reach.
mss + ctypes (recommended for desktop windows)
Find a window by title via Win32 API, capture its region with mss. Tested at ~33ms per capture.
import ctypes
from ctypes import c_int, Structure, byref, windll
import mss
from PIL import Image
user32 = windll.user32
def find_window(title_contains):
"""Find visible windows matching a title substring."""
results = []
WNDENUMPROC = ctypes.WINFUNCTYPE(ctypes.c_bool, ctypes.c_void_p, ctypes.c_void_p)
def cb(hwnd, _):
if user32.IsWindowVisible(hwnd):
buf = ctypes.create_unicode_buffer(256)
user32.GetWindowTextW(hwnd, buf, 256)
if title_contains.lower() in buf.value.lower():
results.append((hwnd, buf.value))
return True
user32.EnumWindows(WNDENUMPROC(cb), 0)
return results
def capture_window(title_contains, output_path):
"""Capture a window by title substring."""
windows = find_window(title_contains)
if not windows:
raise ValueError(f"No window matching '{title_contains}'")
hwnd = windows[0][0]
class RECT(Structure):
_fields_ = [('left', c_int), ('top', c_int), ('right', c_int), ('bottom', c_int)]
rect = RECT()
user32.GetWindowRect(hwnd, byref(rect))
w, h = rect.right - rect.left, rect.bottom - rect.top
with mss.mss() as sct:
shot = sct.grab({'left': rect.left, 'top': rect.top, 'width': w, 'height': h})
img = Image.frombytes('RGB', shot.size, shot.rgb)
img.save(output_path)
return img
# Usage:
capture_window('Visual Studio Code', 'vscode-capture.png')
Prerequisites: pip install mss pillow
Limitation: Window must be visible (not behind other windows or minimized).
Electron apps (VS Code, etc.)
Node.js Playwright only — Python Playwright has no electron API. Captures via CDP (Chrome DevTools Protocol), not from the screen — works even while minimized.
const { _electron: electron } = require('playwright');
const app = await electron.launch({
executablePath: 'C:\\Program Files\\Microsoft VS Code\\Code.exe',
args: ['--new-window', '--disable-extensions', '--user-data-dir=' + tmpDir]
});
const window = await app.firstWindow();
await window.waitForLoadState('domcontentloaded');
// Minimize immediately — captures still work via CDP
await app.evaluate(({ BrowserWindow }) => {
BrowserWindow.getAllWindows()[0].minimize();
});
await window.screenshot({ path: 'capture.png' }); // works while minimized!
await app.close();
Critical: --user-data-dir=<temp> is required or VS Code hands off to the existing instance and the launched process exits immediately.
Decision tree
| Scenario | Tool | Notes |
|---|---|---|
| Web app (localhost) | Playwright | Proven, full DOM access |
| Electron app (VS Code) | Playwright Electron (Node.js) | Works minimized via CDP |
| Desktop app, visible window | mss + ctypes (find by title) | ~33ms per capture |
| Desktop app, behind windows | Windows Graphics Capture API | Complex setup, Win10 1903+ |
| Quick full-screen | mss | ~68ms |
Limitations
- Web capture requires a locally running app or accessible URL
- Desktop capture (mss) requires the window to be visible and unobstructed
- Electron capture requires Node.js Playwright (not Python)
- Some SPAs with heavy client-side rendering may need custom wait logic beyond networkidle