Add visual-pr plugin — screenshot capture, annotation, and PR embedding (#1804)

* Add visual-pr plugin: screenshot capture, annotation, PR embedding, and screen recording

Four skills that teach Copilot to capture UI screenshots (Playwright + PIL),
annotate them with algorithmic label placement, embed before/after images
in PR descriptions, and record animated GIF demos.

Includes demo images showing the annotation engine on GitHub Issues.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Update generated README tables and marketplace.json

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Embed annotate.py module in image-annotations skill

The full working module (annotate_image, grid_image, diff_images) is now
included as a code block so users can save it as annotate.py and import
directly. Scrubbed project-specific labels from examples.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* Address review feedback: mss.mss() context manager, fix RECT struct, consistent placeholder

- Use mss.mss() context manager instead of mss.MSS() (ui-screenshots, screen-recording)
- Fix broken RECT struct in window+GIF combining example (screen-recording)
- Consistent projectId placeholder in AzDO upload example (pr-screenshots)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This commit is contained in:
Jakub Jareš
2026-05-25 03:22:39 +02:00
committed by GitHub
parent 7f7599a716
commit 5cab59b03e
13 changed files with 1264 additions and 0 deletions
+600
View File
@@ -0,0 +1,600 @@
---
name: image-annotations
description: 'Annotate screenshots, diagrams, and images with callout rectangles, arrows, labels, and color-coded highlights using PIL. Includes rules for animated GIF annotations with timing and pacing.'
---
# Image Annotations
Add visual callouts to any image — screenshots, diagrams, architecture docs, demo frames — using PIL/Pillow. Highlights what changed or what to look at, so reviewers don't have to guess.
## When to Use This Skill
Use this skill when you need to:
- Highlight a specific area in a screenshot for a PR description
- Annotate before/after images to show what changed
- Add labels and callouts to diagrams or architecture images
- Create annotated frames for animated GIF demos
## Prerequisites
```bash
pip install Pillow -q
```
## Color Rules
- **Red (`#E63946`)** — only for "bad" / "removed" things (e.g., circling a bug being fixed)
- **Yellowish-orange (`#FF9F1C`)** — for neutral highlights ("look here", "new feature", etc.)
- Never use red just because it's eye-catching — red = bad/removed
## Font
- Use **Ink Free** (`C:/Windows/Fonts/Inkfree.ttf`) for a handwritten look on Windows
- On Linux/macOS, fall back to `ImageFont.load_default()`
- Size **36** for annotations on ~1400px-wide images
- `stroke_width=1` with `stroke_fill=<same color as fill>` — gives body without being too thick
- Do NOT use white stroke — looks like a bad glow effect
## Shapes
- Prefer **rounded rectangles** over circles/ellipses — less pixelation at edges
- `draw.rounded_rectangle([x1, y1, x2, y2], radius=14, outline=color, width=5)`
- **Padding 18px** around the target content
## Reference Snippet
```python
from PIL import Image, ImageDraw, ImageFont
# Setup
font = ImageFont.truetype('C:/Windows/Fonts/Inkfree.ttf', 36) # or load_default()
color = '#FF9F1C' # orange for highlights
stroke = 5
pad = 18
img = Image.open('screenshot.png')
draw = ImageDraw.Draw(img)
# Rounded rect with padding
draw.rounded_rectangle(
[x1 - pad, y1 - pad, x2 + pad, y2 + pad],
radius=14, outline=color, width=stroke
)
# Leader line (same thickness as rect)
draw.line([x2 + pad, cy, x2 + pad + 40, cy - 30], fill=color, width=stroke)
# Label — same-color stroke for body, NO white stroke
draw.text(
(x2 + pad + 45, cy - 60), 'label text',
fill=color, font=font, stroke_width=1, stroke_fill=color
)
img.save('annotated.png')
```
## Algorithmic Annotation — `annotate.py`
For images with multiple elements to annotate, use the `annotate.py` module below. Save it next to your script and import from it. It handles automatic label placement without overlapping.
### Quick start
```python
from annotate import annotate_image
result = annotate_image(
'screenshot.png',
[
{'elem': (560, 275, 635, 390), 'label': 'button', 'draw_box': True},
{'elem': (105, 453, 236, 470), 'label': 'status text'},
],
debug=True,
)
result.save('annotated.png')
```
- `elem`: `(x1, y1, x2, y2)` tight bounding box — must be exact pixel coordinates
- `label`: text label (supports `\n` for multi-line)
- `draw_box`: if `True`, draws a rounded rectangle around the element. If `False` (default), draws a V-arrowhead pointing at the element
- `debug`: shows targeting rectangles and candidate heatmap for placement validation
### Coordinate grid helper
**Always use `grid_image()` before annotating an unfamiliar image.** Scaled-down previews display images smaller than actual pixel dimensions — the error compounds as you move away from (0,0).
```python
from annotate import grid_image
grid = grid_image('screenshot.png', step=100)
grid.save('grid.png')
```
Then verify with small crops:
```python
from PIL import Image
img = Image.open('screenshot.png')
crop = img.crop((x1 - 20, y1 - 20, x2 + 20, y2 + 20))
crop.save('verify.png')
```
### Algorithm overview
1. **Ring search**: candidates between MIN_ARROW (25px) and MAX_ARROW (120px) from element edge
2. **Contrast scoring**: prefers placements where label text is readable — `abs(avg_brightness - 147) - std * 0.3 - dist * 0.02`
3. **Joint resolution**: candidates computed independently, placed greedily (best score first)
4. **Hard blocks**: labels cannot overlap any other annotation's element or breathing box
5. **Proximity penalty**: labels within 40px of other placed boxes get a score penalty
6. **Arrow crossing penalty**: -50 for arrows crossing already-placed arrows
### Debug mode colors
| Color | Meaning |
|-------|---------|
| Cyan | Target element box (elem + padding) |
| Gray | Exclusion zone (MIN_ARROW buffer) |
| Red→Green | Candidate heatmap (red=bad, green=good) |
| Magenta | Chosen label position |
| Orange | Final rendered annotation |
### Arrow styles
- **`draw_box=True`**: rounded rectangle + straight line to label, no arrowhead
- **`draw_box=False`**: V-shaped arrowhead with rounded line caps
### `annotate.py` — full module
Save this as `annotate.py` and import from it:
```python
"""
Algorithmic screenshot annotation with automatic label placement.
pip install Pillow numpy
Optional for diff_images: pip install scipy
"""
import math
import numpy as np
from PIL import Image, ImageDraw, ImageFont
# --- Defaults ---
DEFAULT_FONT = 'C:/Windows/Fonts/Inkfree.ttf'
DEFAULT_FONT_SIZE = 32
DEFAULT_COLOR = '#FF9F1C'
DEFAULT_STROKE = 5
MIN_ARROW = 25
MAX_ARROW = 120
TEXT_PAD = 6
BREATH = 18
CROSSING_PENALTY = 50
PROXIMITY_MARGIN = 40
PROXIMITY_PENALTY = 50
def _rect_intersects(a, b):
return a[0] < b[2] and a[2] > b[0] and a[1] < b[3] and a[3] > b[1]
def _segments_intersect(p1, p2, p3, p4):
def cross(o, a, b):
return (a[0] - o[0]) * (b[1] - o[1]) - (a[1] - o[1]) * (b[0] - o[0])
d1, d2 = cross(p3, p4, p1), cross(p3, p4, p2)
d3, d4 = cross(p1, p2, p3), cross(p1, p2, p4)
return ((d1 > 0 and d2 < 0) or (d1 < 0 and d2 > 0)) and \
((d3 > 0 and d4 < 0) or (d3 < 0 and d4 > 0))
def _line_rect_exit(cx, cy, tx, ty, rect):
x1, y1, x2, y2 = rect
dx, dy = tx - cx, ty - cy
tmin, tmax = 0.0, 1.0
for lo, hi, p, d in [(x1, x2, cx, dx), (y1, y2, cy, dy)]:
if abs(d) < 1e-9:
continue
t0, t1 = (lo - p) / d, (hi - p) / d
if t0 > t1:
t0, t1 = t1, t0
tmin, tmax = max(tmin, t0), min(tmax, t1)
return (cx + dx * tmax, cy + dy * tmax)
def _rect_gap(a, b):
dx = max(a[0] - b[2], b[0] - a[2], 0)
dy = max(a[1] - b[3], b[1] - a[3], 0)
if dx == 0 and dy == 0:
return 0
return math.sqrt(dx**2 + dy**2)
def _find_candidates(pixels, W, H, cyan, pw, ph, font):
cx, cy = (cyan[0] + cyan[2]) / 2, (cyan[1] + cyan[3]) / 2
excl_zone = (cyan[0] - MIN_ARROW, cyan[1] - MIN_ARROW,
cyan[2] + MIN_ARROW, cyan[3] + MIN_ARROW)
sx1 = max(0, cyan[0] - MAX_ARROW - pw)
sy1 = max(0, cyan[1] - MAX_ARROW - ph)
sx2 = min(W - pw, cyan[2] + MAX_ARROW)
sy2 = min(H - ph, cyan[3] + MAX_ARROW)
step_x = max(8, min(pw // 2, MAX_ARROW // 3))
step_y = max(8, min(ph // 2, MAX_ARROW // 3))
cands = []
for px in range(sx1, sx2, step_x):
for py in range(sy1, sy2, step_y):
pink = (px, py, px + pw, py + ph)
if _rect_intersects(pink, excl_zone):
continue
gl, gr = cyan[0] - pink[2], pink[0] - cyan[2]
gt, gb = cyan[1] - pink[3], pink[1] - cyan[3]
hd, vd = max(gl, gr, 0), max(gt, gb, 0)
ed = math.sqrt(hd**2 + vd**2) if (hd > 0 and vd > 0) else max(hd, vd)
if ed > MAX_ARROW:
continue
region = pixels[py:py + ph, px:px + pw, :3].astype(float)
score = abs(np.mean(region) - 147) - np.std(region) * 0.3
dist = math.sqrt((px + pw/2 - cx)**2 + (py + ph/2 - cy)**2)
score -= dist * 0.02
cands.append(((px, py), score))
return cands
def _resolve_placements(annots, font):
placed = []
all_elem_zones = []
for ann in annots:
all_elem_zones.append(ann['cyan'])
if ann.get('draw_box', False):
c = ann['cyan']
all_elem_zones.append((c[0]-BREATH, c[1]-BREATH, c[2]+BREATH, c[3]+BREATH))
for ann in sorted(annots, key=lambda a: -a['best_score']):
pw, ph = ann['pw'], ann['ph']
cyan = ann['cyan']
cx, cy = ann['cyan_center']
draw_box = ann.get('draw_box', False)
best_pos, best_score = None, -999
valid = []
for (px, py), score in ann['candidates']:
pink = (px, py, px + pw, py + ph)
ok = True
for ez in all_elem_zones:
if ez == cyan:
continue
if ann.get('draw_box', False):
own_viz = (cyan[0]-BREATH, cyan[1]-BREATH, cyan[2]+BREATH, cyan[3]+BREATH)
if ez == own_viz:
continue
if _rect_intersects(pink, ez):
ok = False; break
if not ok:
continue
for p_pink, p_excl, p_viz, _ in placed:
if _rect_intersects(pink, p_pink) or _rect_intersects(pink, p_excl):
ok = False; break
if p_viz and _rect_intersects(pink, p_viz):
ok = False; break
if not ok:
continue
for p_pink, p_excl, p_viz, _ in placed:
for rect in [p_pink, p_excl, p_viz]:
if rect is None:
continue
gap = _rect_gap(pink, rect)
if gap < PROXIMITY_MARGIN:
score -= PROXIMITY_PENALTY * (1 - gap / PROXIMITY_MARGIN)
for ez in all_elem_zones:
if ez == cyan:
continue
gap = _rect_gap(pink, ez)
if gap < PROXIMITY_MARGIN:
score -= PROXIMITY_PENALTY * (1 - gap / PROXIMITY_MARGIN)
tcx, tcy = px + pw/2, py + ph/2
cand_start = _line_rect_exit(tcx, tcy, cx, cy, pink)
if draw_box:
viz = (cyan[0]-BREATH, cyan[1]-BREATH, cyan[2]+BREATH, cyan[3]+BREATH)
cand_end = _line_rect_exit(cx, cy, tcx, tcy, viz)
else:
cand_end = _line_rect_exit(cx, cy, tcx, tcy, cyan)
for _, _, _, pa in placed:
if pa and _segments_intersect(cand_start, cand_end, pa[0], pa[1]):
score -= CROSSING_PENALTY; break
valid.append(((px, py), score))
if score > best_score:
best_score, best_pos = score, (px, py)
ann['valid_candidates'] = valid
if best_pos is None:
ann['pink'] = ann['tpos'] = ann['astart'] = ann['aend'] = ann['viz'] = None
continue
px, py = best_pos
pink = (px, py, px + pw, py + ph)
ann['pink'] = pink
ann['tpos'] = (px + TEXT_PAD, py + TEXT_PAD)
tcx, tcy = px + pw/2, py + ph/2
ann['astart'] = _line_rect_exit(tcx, tcy, cx, cy, pink)
if draw_box:
viz = (cyan[0]-BREATH, cyan[1]-BREATH, cyan[2]+BREATH, cyan[3]+BREATH)
ann['viz'] = viz
ann['aend'] = _line_rect_exit(cx, cy, tcx, tcy, viz)
else:
ann['viz'] = None
ann['aend'] = _line_rect_exit(cx, cy, tcx, tcy, cyan)
placed.append((pink, ann['excl_zone'], ann['viz'], (ann['astart'], ann['aend'])))
def _draw_debug(img, annots, color):
overlay = Image.new('RGBA', img.size, (0, 0, 0, 0))
od = ImageDraw.Draw(overlay)
for ann in annots:
cands = ann.get('valid_candidates', ann['candidates'])
if not cands:
continue
pw, ph = ann['pw'], ann['ph']
scores = [s for _, s in cands]
smin, smax = min(scores), max(scores)
rng = smax - smin if smax > smin else 1
for (px, py), score in cands:
t = (score - smin) / rng
if t < 0.5:
r_c, g_c, b_c = 220, int(180 * (t * 2)), 0
else:
r_c, g_c, b_c = int(220 * (1 - (t-0.5)*2)), 200, 0
alpha_fill = int(40 + 70 * t)
alpha_out = int(80 + 120 * t)
od.rectangle((px, py, px + pw, py + ph),
fill=(r_c, g_c, b_c, alpha_fill), outline=(r_c, g_c, b_c, alpha_out), width=1)
for ann in annots:
ez = ann['excl_zone']
od.rectangle(ez, fill=(120, 120, 120, 50), outline=(160, 160, 160, 160), width=1)
od.rectangle(ann['cyan'], fill=(0, 255, 255, 30), outline=(0, 255, 255, 180), width=2)
if ann.get('pink'):
od.rectangle(ann['pink'], fill=(255, 0, 255, 50),
outline=(255, 0, 255, 180), width=2)
return Image.alpha_composite(img, overlay)
def _draw_annotations(img, annots, font, color, stroke_width):
draw = ImageDraw.Draw(img)
for ann in annots:
if ann.get('viz'):
draw.rounded_rectangle(ann['viz'], radius=12, outline=color, width=stroke_width)
tpos = ann.get('tpos')
astart, aend = ann.get('astart'), ann.get('aend')
if not (tpos and astart and aend):
continue
sx, sy = int(astart[0]), int(astart[1])
ex, ey = int(aend[0]), int(aend[1])
draw.line([(sx, sy), (ex, ey)], fill=color, width=4, joint='curve')
r = 2
draw.ellipse([(sx-r, sy-r), (sx+r, sy+r)], fill=color)
draw.ellipse([(ex-r, ey-r), (ex+r, ey+r)], fill=color)
if not ann.get('draw_box', False):
angle = math.atan2(ey - sy, ex - sx)
al, spread = 18, 0.45
ax = ex - al * math.cos(angle - spread)
ay = ey - al * math.sin(angle - spread)
bx = ex - al * math.cos(angle + spread)
by = ey - al * math.sin(angle + spread)
draw.line([(int(ax), int(ay)), (ex, ey)], fill=color, width=4)
draw.line([(int(bx), int(by)), (ex, ey)], fill=color, width=4)
for px_, py_ in [(int(ax), int(ay)), (int(bx), int(by))]:
draw.ellipse([(px_-r, py_-r), (px_+r, py_+r)], fill=color)
draw.text(tpos, ann['label'], fill=color, font=font,
stroke_width=1, stroke_fill=color)
return img
def annotate_image(image_path, annotations, *,
debug=False,
font_path=DEFAULT_FONT,
font_size=DEFAULT_FONT_SIZE,
color=DEFAULT_COLOR,
stroke_width=DEFAULT_STROKE):
"""
Annotate a screenshot with automatic label placement.
Args:
image_path: path to the input image
annotations: list of dicts with keys:
- elem: (x1, y1, x2, y2) tight bounding box of element
- label: text label string
- draw_box: (optional, default False) draw rounded rect around element
debug: if True, draw developer rectangles (cyan/pink/gray/heatmap)
font_path: path to TTF font file
font_size: font size in pixels
color: hex color for annotations (default orange #FF9F1C)
stroke_width: width of orange highlight box outline
Returns:
PIL.Image with annotations drawn
"""
font = ImageFont.truetype(font_path, font_size)
img = Image.open(image_path).convert('RGBA')
pixels = np.array(img)
W, H = img.size
annots = []
for i, spec in enumerate(annotations):
eb = spec['elem']
em_pad = min(20, max(10, (eb[2] - eb[0]) // 10))
cyan = (eb[0] - em_pad, eb[1] - em_pad, eb[2] + em_pad, eb[3] + em_pad)
lines = spec['label'].split('\n')
tw = max(font.getbbox(line)[2] - font.getbbox(line)[0] for line in lines)
line_h = font.getbbox('Ay')[3] - font.getbbox('Ay')[0]
th = line_h * len(lines) + 4 * (len(lines) - 1)
pw, ph = tw + 2 * TEXT_PAD, th + 2 * TEXT_PAD
cands = _find_candidates(pixels, W, H, cyan, pw, ph, font)
annots.append({
'id': i,
'label': spec['label'],
'draw_box': spec.get('draw_box', False),
'cyan': cyan,
'cyan_center': ((cyan[0]+cyan[2])/2, (cyan[1]+cyan[3])/2),
'excl_zone': (cyan[0]-MIN_ARROW, cyan[1]-MIN_ARROW,
cyan[2]+MIN_ARROW, cyan[3]+MIN_ARROW),
'pw': pw, 'ph': ph,
'candidates': cands,
'best_score': max((s for _, s in cands), default=-999),
})
_resolve_placements(annots, font)
annots.sort(key=lambda a: a['id'])
if debug:
img = _draw_debug(img, annots, color)
img = _draw_annotations(img, annots, font, color, stroke_width)
return img
def diff_images(before_path, after_path, *, threshold=30, min_pixels=300,
dilate=5, debug=False):
"""Find changed regions between two screenshots and return cluster boxes.
Returns (clusters, debug_img_or_None):
clusters: list of (x1, y1, x2, y2, pixel_count) sorted largest-first
debug_img: if debug=True, PIL Image with heatmap overlay and cluster boxes
"""
from scipy import ndimage
img_a = Image.open(before_path).convert('RGB')
img_b = Image.open(after_path).convert('RGB')
if img_a.size != img_b.size:
raise ValueError(f"Image sizes differ: {img_a.size} vs {img_b.size}")
arr_a = np.array(img_a, dtype=np.float32)
arr_b = np.array(img_b, dtype=np.float32)
W, H = img_a.size
diff = np.abs(arr_b - arr_a).max(axis=2)
mask = diff > threshold
dilated = ndimage.binary_dilation(mask, iterations=dilate)
labeled, n_clusters = ndimage.label(dilated)
clusters = []
for i in range(1, n_clusters + 1):
ys, xs = np.where(labeled == i)
if len(ys) < min_pixels:
continue
clusters.append((int(xs.min()), int(ys.min()),
int(xs.max()), int(ys.max()), len(ys)))
clusters.sort(key=lambda c: -c[4])
debug_img = None
if debug:
overlay = img_b.copy().convert('RGBA')
norm = np.clip(diff / 255.0, 0, 1)
show_mask = diff > 10
r = np.clip((norm * 2) * 255, 0, 255).astype(np.uint8)
g = np.clip((1 - np.abs(norm - 0.5) * 2) * 200, 0, 200).astype(np.uint8)
b = np.clip((1 - norm) * 255, 0, 255).astype(np.uint8)
a = np.where(show_mask, np.clip(norm * 200 + 40, 40, 220).astype(np.uint8), 0)
heat = Image.fromarray(np.stack([r, g, b, a], axis=2), 'RGBA')
overlay = Image.alpha_composite(overlay, heat)
draw = ImageDraw.Draw(overlay)
try:
font = ImageFont.truetype('C:/Windows/Fonts/consola.ttf', 18)
except OSError:
font = ImageFont.load_default()
for idx, (x1, y1, x2, y2, px_count) in enumerate(clusters):
draw.rectangle([x1, y1, x2, y2], outline=(0, 255, 255, 200), width=3)
label = f"#{idx+1} {px_count:,}px"
bbox = font.getbbox(label)
tw, th = bbox[2] - bbox[0], bbox[3] - bbox[1]
lx, ly = x1, max(0, y1 - th - 8)
draw.rectangle([lx, ly, lx + tw + 8, ly + th + 4], fill=(0, 0, 0, 180))
draw.text((lx + 4, ly + 2), label, fill=(0, 255, 255, 255), font=font)
debug_img = overlay
return clusters, debug_img
def grid_image(image_path, step=100):
"""Draw a coordinate grid on an image for precise element location."""
img = Image.open(image_path).convert('RGBA')
draw = ImageDraw.Draw(img)
W, H = img.size
try:
font = ImageFont.truetype('C:/Windows/Fonts/consola.ttf', 14)
except OSError:
font = ImageFont.load_default()
for x in range(0, W, step):
draw.line([(x, 0), (x, H)], fill=(255, 0, 0, 120), width=1)
draw.text((x + 2, 2), str(x), fill=(255, 0, 0, 200), font=font)
for y in range(0, H, step):
draw.line([(0, y), (W, y)], fill=(255, 0, 0, 120), width=1)
draw.text((2, y + 2), str(y), fill=(255, 0, 0, 200), font=font)
return img
```
## Image Diffing
Find what changed between two screenshots programmatically. Use as a safety net for subtle changes — when the difference is obvious, annotate directly instead.
```python
from annotate import diff_images
clusters, debug_img = diff_images(
'before.png', 'after.png',
threshold=30, # pixel difference floor (0-255)
min_pixels=300, # ignore tiny noise clusters
dilate=5, # merge nearby changed pixels
debug=True, # render heatmap overlay
)
# clusters = [(x1, y1, x2, y2, pixel_count), ...] sorted largest-first
if debug_img:
debug_img.save('diff-debug.png')
# Feed clusters into annotate_image:
annotations = [
{'elem': (x1, y1, x2, y2), 'label': f'Change #{i+1}', 'draw_box': True}
for i, (x1, y1, x2, y2, _) in enumerate(clusters[:3])
]
```
**Debug heatmap colors:** Blue = small difference, Yellow = medium, Red = large, Cyan boxes = cluster bounding boxes.
**When to use:** subtle opacity changes, dashed lines, minor color shifts, anti-aliasing differences.
**When NOT to use:** any change you can see by eye — annotate directly for better labels.
## Animated GIF Annotations
Different from static images — animations have timing, transitions, and competing visual motion.
### Element highlighting
1. **Rects for big areas, arrows for small elements** — 500x300px area = rect, 200x25px element = arrow
2. **Labels go RIGHT NEXT to what they describe** — short arrow (30-80px), label adjacent. Viewer's eye shouldn't travel more than ~100px
3. **Arrow must not cross its own label** — pick the edge closest to the target
4. **No bottom bar / subtitle approach** — eyes jump between content and bar. Contextual placement only
5. **Hero message gets a bigger font** — main takeaway 64pt+, detail annotations 38pt
### Timing and pacing
6. **Fade: 2-frame pop-in at 10fps** — 50% → 100% opacity (0.2s total). Easing curves look bad at low FPS
7. **Type → pause → annotate** — during fast action, show NO annotation. Pause, then add it
8. **Variable frame duration** — fast during action (100ms), slow during pauses (600-800ms), long hold for hero (500ms)
9. **Higher FPS for smooth motion** — 10fps minimum for typing/interaction
### Pop-in fade implementation
```python
# 2-frame pop-in at 10fps
FADE_ALPHAS = [0.50, 1.00]
for frame_idx in range(total_frames):
if annotation_just_changed and local_idx < len(FADE_ALPHAS):
alpha = FADE_ALPHAS[local_idx]
else:
alpha = 1.0
# Apply alpha to annotation elements:
# - pill background: fill=(r, g, b, int(base_alpha * alpha))
# - text: fill=(*color, int(255 * alpha))
# - rect outline: outline=(*color, int(255 * alpha))
```
## Guidelines
1. **All elements same thickness** — rect `width`, line `width`, and visual text weight should feel consistent (~5px)
2. Place labels **close to the rect** — short leader line (25-35px)
3. Labels can overlap content — the stroke gives enough contrast
4. **Show locally first** — verify before uploading to a PR
5. **Take screenshots at native 1x, control display size in HTML** — use `<img width="300">` in markdown, never resize with PIL (creates artifacts)
6. **Always check `Image.open(path).size` first** — HiDPI screenshots are larger than they appear (150% scaling = 1.5x CSS pixel dimensions)
7. **Short labels work better** — wide labels have fewer valid placements. Use 1-3 words when possible
8. **Verify with debug=True** — always check the first annotation of a new image with debug mode
## Limitations
- Ink Free font is Windows-only; other platforms need a fallback font
- PIL text rendering is basic — no rich text, no markdown
- Animated GIF annotations require frame-by-frame processing which can be slow for long recordings
- Algorithmic placement works best with 2-6 annotations; more than that may produce crowded results
+130
View File
@@ -0,0 +1,130 @@
---
name: pr-screenshots
description: 'Embed before/after screenshots and annotated images in pull request descriptions. Covers PR description patterns, image upload for Azure DevOps and GitHub, and sizing best practices.'
---
# PR Screenshots
Embed before/after screenshots in pull request descriptions so reviewers can see the visual change without checking out the branch.
## When to Use This Skill
Use this skill when a PR changes something visible:
- Layout, styling, CSS
- Charts, dashboards, data visualizations
- UI components, forms, modals
- Error messages, CLI output, log formatting
## PR Description Pattern
Place screenshots directly in the PR description body. Avoid wrapping them in `<details>` collapse — reviewers are more likely to look at images they can see without clicking.
```markdown
**Before** — brief description of the problem:
![before](url-to-before-image)
**After** — brief description of the fix:
![after](url-to-after-image)
```
Keep the text brief. A sentence or two per image describing what the reader should notice. Let the image carry most of the communication.
### Multiple changes
For PRs with several visual changes, use separate before/after pairs with headings:
```markdown
## Filter bar alignment
**Before** — 1px border clash between adjacent buttons:
![before-filters](url)
**After** — borders overlap cleanly, hover tint added:
![after-filters](url)
## Chart tooltip
**Before** — tooltip clipped at container edge:
![before-tooltip](url)
**After** — tooltip repositions to stay visible:
![after-tooltip](url)
```
## Image Sizing
- **Take screenshots at native 1x resolution** — don't resize with PIL (creates artifacts)
- **Control display size in HTML** when images are too large:
```html
<img src="url" width="600" alt="description">
```
- **Before/after pairs must use the same viewport width and crop** — otherwise the comparison is meaningless
## Uploading Images
### Azure DevOps
Upload images as PR attachments via the REST API:
```powershell
$token = az account get-access-token `
--resource "499b84ac-1321-427f-aa17-267ca6975798" `
--query accessToken -o tsv
$base = "https://{org}.visualstudio.com/{projectId}/_apis/git/repositories/{repoId}"
$url = "$base/pullRequests/{prId}/attachments/screenshot.png?api-version=7.1-preview.1"
# Use HttpClient — Invoke-RestMethod can corrupt binary data
$client = New-Object System.Net.Http.HttpClient
$client.DefaultRequestHeaders.Authorization = `
New-Object System.Net.Http.Headers.AuthenticationHeaderValue("Bearer", $token)
$content = New-Object System.Net.Http.ByteArrayContent(
, [System.IO.File]::ReadAllBytes("screenshot.png")
)
$content.Headers.ContentType = `
[System.Net.Http.Headers.MediaTypeHeaderValue]::new("application/octet-stream")
$resp = $client.PostAsync($url, $content).Result
```
Reference in the PR description:
```markdown
![description](https://{org}.visualstudio.com/{projectId}/_apis/git/repositories/{repoId}/pullRequests/{prId}/attachments/screenshot.png)
```
**Azure DevOps gotchas:**
- **Use `{org}.visualstudio.com` NOT `dev.azure.com/{org}`** — AzDO's markdown renderer uses `.visualstudio.com`. The `dev.azure.com` format loads noticeably slower
- Use `POST` not `PUT` (PUT returns 405)
- API version must be `7.1-preview.1`
- Can't re-upload with the same filename — use a new name (e.g. `screenshot-v2.png`)
- Use `HttpClient` not `Invoke-RestMethod` — IRM can corrupt binary data
- Repo-relative paths don't work in PR descriptions — must use full URLs
- Don't commit images to the branch just for PR screenshots
### GitHub
> **⚠️ Work in progress.** GitHub's drag-and-drop image upload uses internal endpoints that require browser cookies. There's no clean public API for uploading images to PR descriptions yet.
**Current workaround:** Commit images to a `pr-assets` orphan branch and reference via blob URLs (`github.com/{owner}/{repo}/blob/pr-assets/{file}?raw=true`). It works but is clunky — contributions for a better approach are welcome.
## Guidelines
1. **Capture before state BEFORE making changes** — it's easy to forget, and reconstructing the original state later is slow and error-prone
2. **Keep descriptions brief** — a sentence or two per image pointing out what changed is enough
3. **Prefer visible images over collapsed sections** — screenshots behind `<details>` tags are easy to skip
4. **Annotate when the change is subtle** — use the `image-annotations` skill to add callouts when the difference isn't immediately obvious
5. **Match viewport and crop** between before/after pairs so the comparison is meaningful
## Limitations
- GitHub image upload requires workarounds (no public API for PR description images)
- Azure DevOps attachment filenames can't be reused — plan naming ahead
- Very large images (>10MB) may not render inline on some platforms
+238
View File
@@ -0,0 +1,238 @@
---
name: screen-recording
description: 'Create annotated animated GIF demos and screen recordings for pull requests and documentation. Covers frame capture, timing, imageio-based GIF creation, and per-frame annotation workflows.'
---
# Screen Recording
Create animated GIF demos that show a feature or workflow in action — with annotations, variable timing, and proper pacing. Useful for PR descriptions, documentation, and release notes.
## When to Use This Skill
Use this skill when you need to:
- Record a multi-step UI interaction as an animated GIF
- Create a demo showing before/after behavior
- Build annotated walkthroughs for documentation or release notes
- Show a bug reproduction or fix in action
## Prerequisites
```bash
pip install playwright Pillow imageio numpy scipy mss -q
playwright install chromium
```
## Core Workflow
### 1. Capture frames
Use Playwright to step through the interaction and capture each frame:
```python
from playwright.async_api import async_playwright
async def record_frames(url, steps, width=1400, height=900):
"""
steps: list of dicts with 'action' (async callable taking page)
and 'name' (frame filename)
"""
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page(viewport={"width": width, "height": height})
await page.goto(url, wait_until="networkidle")
for step in steps:
if step.get("action"):
await step["action"](page)
await page.wait_for_timeout(step.get("wait", 500))
await page.screenshot(path=step["name"])
await browser.close()
```
### 2. Assemble GIF with imageio
**Use imageio, not PIL, for GIF writing** — PIL's GIF encoder merges visually similar frames, which kills animations.
```python
import imageio.v3 as iio
from PIL import Image
import numpy as np
frames = []
durations = []
for frame_path, duration_ms in frame_list:
img = Image.open(frame_path)
frames.append(np.array(img))
durations.append(duration_ms)
iio.imwrite("demo.gif", frames, duration=durations, loop=0)
```
### 3. Variable frame timing
Uniform timing makes everything feel either too fast or too slow. Use variable durations:
| Phase | Duration | Why |
|-------|----------|-----|
| Fast action (typing, clicking) | 100ms | Feels natural, keeps energy |
| Pause after action | 600-800ms | Let the viewer process what happened |
| Hero/final message | 500ms+ | Main takeaway needs time to land |
### 4. Annotate frames
Apply annotations to specific frames using the `image-annotations` skill:
```python
from PIL import Image, ImageDraw, ImageFont
def annotate_frame(frame_path, annotations, out_path):
img = Image.open(frame_path)
draw = ImageDraw.Draw(img)
for ann in annotations:
# Apply annotation (rect, arrow, label, etc.)
pass
img.save(out_path)
```
### 5. Fade-in annotations
For smooth annotation appearance:
```python
def apply_fade(base_frame, annotation_layer, alpha):
"""Blend annotation onto frame at given alpha (0.0 to 1.0)"""
blended = Image.blend(
base_frame.convert("RGBA"),
annotation_layer.convert("RGBA"),
alpha
)
return blended.convert("RGB")
# 2-frame pop-in at 10fps: 50% then 100%
faded_frames = [
apply_fade(base, annotations, 0.5), # frame 1: half opacity
apply_fade(base, annotations, 1.0), # frame 2: full opacity
]
```
At 10fps, use 2 fade frames (0.2s total). At 30fps, use 3-4 frames. Easing curves look bad at low FPS — simple pop-in is snappier and more readable.
## Build as a Script
The annotation logic gets complex for anything beyond trivial demos. Write a dedicated script (e.g., `annotate_gif.py`) with functions instead of inline code. You'll iterate on timing and placement.
## Testing Animations
**Always test in isolation first** — don't rebuild the full demo to test a fade tweak:
```python
# Small test GIF: 10 bare frames → fade frames → 15 hold frames
# Add a frame counter overlay for debugging:
draw.text((10, height - 30), f"F{i}/{total} a={alpha:.0%} FADE",
fill="white", font=small_font)
```
## Desktop Screen Recording (mss)
For recording desktop apps, terminals, or anything outside a browser. Uses `mss` for fast screen capture.
```python
import mss
from PIL import Image
import time
def record_gif(output_path, region=None, duration=5, fps=8):
"""Record screen region to GIF. region = {left, top, width, height} or None for full screen."""
with mss.mss() as sct:
if region is None:
region = sct.monitors[1] # primary monitor
frames = []
t_end = time.time() + duration
while time.time() < t_end:
t0 = time.time()
shot = sct.grab(region)
frames.append(Image.frombytes('RGB', shot.size, shot.rgb))
time.sleep(max(0, 1 / fps - (time.time() - t0)))
frames[0].save(output_path, save_all=True, append_images=frames[1:],
duration=int(1000 / fps), loop=0, optimize=True)
return len(frames)
record_gif('demo.gif', region={'left': 0, 'top': 0, 'width': 800, 'height': 500}, duration=3)
```
Tested: 3s at 8fps → 24 frames, ~31KB. Keep fps ≤ 10 for reasonable file sizes.
**Note:** `PIL.save(save_all=True)` works for simple recordings but merges visually similar frames. For annotated GIFs with fade effects, use `imageio.v3.imwrite` instead.
### Combining with window capture
```python
# Find window rect, then record it as a GIF
# Reuse find_window() from the ui-screenshots skill
import ctypes
from ctypes import c_int, Structure, byref, windll
class RECT(Structure):
_fields_ = [('left', c_int), ('top', c_int), ('right', c_int), ('bottom', c_int)]
hwnd = find_window('My App')[0][0]
rect = RECT()
windll.user32.GetWindowRect(hwnd, byref(rect))
region = {'left': rect.left, 'top': rect.top,
'width': rect.right - rect.left, 'height': rect.bottom - rect.top}
record_gif('app-demo.gif', region=region, duration=5, fps=8)
```
## Diff-Based Cluster Detection
Programmatically find changed regions between frames to decide what to annotate:
```python
import numpy as np
from scipy import ndimage
def find_changed_clusters(frame_a, frame_b, threshold=30, min_pixels=300, dilate=5):
"""Find bounding boxes of changed regions between two frames."""
diff = np.abs(frame_b.astype(float) - frame_a.astype(float)).max(axis=2)
mask = diff > threshold
dilated = ndimage.binary_dilation(mask, iterations=dilate)
labeled, n = ndimage.label(dilated)
clusters = []
for i in range(1, n + 1):
ys, xs = np.where(labeled == i)
if len(ys) < min_pixels:
continue
clusters.append((xs.min(), ys.min(), xs.max(), ys.max(), len(ys)))
return sorted(clusters, key=lambda c: -c[4]) # largest first
```
## Format Compatibility
| Format | VS Code Preview | GitHub | Browser |
|--------|----------------|--------|---------|
| GIF | ✅ Animates | ✅ | ✅ |
| WebP | ⚠️ Static only | ✅ | ✅ |
| MP4 | ❌ Broken | ⚠️ | ✅ |
**GIF is the only universally supported animated format** across VS Code preview, GitHub markdown, and browsers.
## Guidelines
1. **Type → pause → annotate** — during fast action, show NO annotation. Pause first, then annotate
2. **Hero message gets the biggest font** — 64pt+ for the main takeaway, 38pt for details
3. **GIF palette does NOT kill gradients** — 20 distinct alpha steps survive 256-color palette
4. **10fps minimum** for typing/interaction — lower looks stuttery
5. **Build iteratively** — get the frame sequence right first, add annotations second, tune timing last
## Limitations
- GIF is limited to 256 colors per frame — fine for UI screenshots, may show banding on photographic content
- Large GIFs (50+ frames at high resolution) can be several MB — consider cropping to the relevant area
- No audio support in GIF — use MP4 for narrated demos (but lose VS Code preview support)
+202
View File
@@ -0,0 +1,202 @@
---
name: ui-screenshots
description: 'Capture screenshots of web apps during development using Playwright and PIL. Supports full-page captures, interactive states, and an iterate-on-crop workflow that avoids slow re-screenshots.'
---
# UI Screenshots
Capture screenshots of web apps and graphical UIs during development to document visual changes.
## When to Use This Skill
Use this skill when you need to:
- Capture the current state of a running web app
- Document a UI before and after a code change
- Screenshot interactive states (tooltips, hovers, selected elements)
- Capture specific sections of a page without re-screenshotting
## Prerequisites
```bash
pip install playwright Pillow -q
playwright install chromium
```
## Core Workflow
### 1. Take a raw full-page screenshot
```python
from playwright.async_api import async_playwright
async def capture(url="http://localhost:3000", out="screenshot-raw.png", width=1400, height=5000):
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page(viewport={"width": width, "height": height})
await page.goto(url, wait_until="networkidle")
await page.wait_for_timeout(4000) # let charts/animations render
await page.screenshot(path=out, full_page=True)
await browser.close()
```
- Use a **tall viewport** (height=5000) so the page renders everything without scrolling
- `wait_until="networkidle"` + `wait_for_timeout(4000)` ensures async charts load
- `full_page=True` captures the entire scrollable content
### 2. View the raw image, then crop with PIL
**Do NOT try to get perfect crops via Playwright's `clip` parameter.** It's unreliable with full-page captures.
```python
from PIL import Image
img = Image.open("screenshot-raw.png")
cropped = img.crop((left, top, right, bottom)) # adjust based on what you see
cropped.save("screenshot-final.png")
```
1. Take the raw screenshot
2. View it to see actual pixel positions
3. Crop with PIL based on what you see
4. View the result — if not right, re-crop (instant, no re-screenshot needed)
### 3. Iterate on crop, not on capture
- Re-screenshotting is slow (browser launch + page load + render wait)
- Re-cropping is instant (just PIL)
- Get one good raw capture, then slice it as many ways as needed
### 4. Interactive states
```python
element = page.locator("selector").first
await element.hover()
await page.wait_for_timeout(1000) # let tooltip appear
await page.screenshot(path="screenshot-hover.png", full_page=True)
```
For "selected" state without hover effect, move the mouse away after clicking:
```python
await element.click()
await page.mouse.move(300, 300) # move away so hover doesn't show
await page.wait_for_timeout(500)
await page.screenshot(path="screenshot-selected.png", full_page=True)
```
### 5. Section-specific captures
Crop different sections from a single full-page screenshot:
```python
img.crop((0, 200, 920, 900)).save("screenshot-header.png")
img.crop((0, 900, 920, 1600)).save("screenshot-main.png")
```
## Guidelines
1. **Always capture before state BEFORE making any changes** — if you forget, you have to revert code to get a before shot
2. **Before/after pairs must use the same viewport width and crop** — otherwise the comparison is useless
3. **To get a "before" after you already changed code**: use `git checkout HEAD~1 -- <files>` to revert, screenshot, then `git checkout HEAD -- <files>` to restore
4. **For interactive states**: capture before AND after for each state — don't assume the "normal" before covers all cases
5. **Use `device_scale_factor=1`** in Playwright to force 1x pixels so screenshots match what users see at 100% zoom
6. **Charts need extra wait time** — Plotly, D3, etc. render asynchronously; 4s minimum after networkidle
7. **Narrow viewport reveals rendering bugs** — some border/alignment issues only appear at specific widths
## Non-Web App Screenshots
For desktop apps (VS, WPF, WinForms, console apps, terminals) where Playwright can't reach.
### mss + ctypes (recommended for desktop windows)
Find a window by title via Win32 API, capture its region with `mss`. Tested at ~33ms per capture.
```python
import ctypes
from ctypes import c_int, Structure, byref, windll
import mss
from PIL import Image
user32 = windll.user32
def find_window(title_contains):
"""Find visible windows matching a title substring."""
results = []
WNDENUMPROC = ctypes.WINFUNCTYPE(ctypes.c_bool, ctypes.c_void_p, ctypes.c_void_p)
def cb(hwnd, _):
if user32.IsWindowVisible(hwnd):
buf = ctypes.create_unicode_buffer(256)
user32.GetWindowTextW(hwnd, buf, 256)
if title_contains.lower() in buf.value.lower():
results.append((hwnd, buf.value))
return True
user32.EnumWindows(WNDENUMPROC(cb), 0)
return results
def capture_window(title_contains, output_path):
"""Capture a window by title substring."""
windows = find_window(title_contains)
if not windows:
raise ValueError(f"No window matching '{title_contains}'")
hwnd = windows[0][0]
class RECT(Structure):
_fields_ = [('left', c_int), ('top', c_int), ('right', c_int), ('bottom', c_int)]
rect = RECT()
user32.GetWindowRect(hwnd, byref(rect))
w, h = rect.right - rect.left, rect.bottom - rect.top
with mss.mss() as sct:
shot = sct.grab({'left': rect.left, 'top': rect.top, 'width': w, 'height': h})
img = Image.frombytes('RGB', shot.size, shot.rgb)
img.save(output_path)
return img
# Usage:
capture_window('Visual Studio Code', 'vscode-capture.png')
```
**Prerequisites:** `pip install mss pillow`
**Limitation:** Window must be visible (not behind other windows or minimized).
### Electron apps (VS Code, etc.)
**Node.js Playwright only** — Python Playwright has no `electron` API. Captures via CDP (Chrome DevTools Protocol), not from the screen — works even while minimized.
```javascript
const { _electron: electron } = require('playwright');
const app = await electron.launch({
executablePath: 'C:\\Program Files\\Microsoft VS Code\\Code.exe',
args: ['--new-window', '--disable-extensions', '--user-data-dir=' + tmpDir]
});
const window = await app.firstWindow();
await window.waitForLoadState('domcontentloaded');
// Minimize immediately — captures still work via CDP
await app.evaluate(({ BrowserWindow }) => {
BrowserWindow.getAllWindows()[0].minimize();
});
await window.screenshot({ path: 'capture.png' }); // works while minimized!
await app.close();
```
**Critical**: `--user-data-dir=<temp>` is required or VS Code hands off to the existing instance and the launched process exits immediately.
### Decision tree
| Scenario | Tool | Notes |
|---|---|---|
| Web app (localhost) | Playwright | Proven, full DOM access |
| Electron app (VS Code) | Playwright Electron (Node.js) | Works minimized via CDP |
| Desktop app, visible window | mss + ctypes (find by title) | ~33ms per capture |
| Desktop app, behind windows | Windows Graphics Capture API | Complex setup, Win10 1903+ |
| Quick full-screen | mss | ~68ms |
## Limitations
- Web capture requires a locally running app or accessible URL
- Desktop capture (mss) requires the window to be visible and unobstructed
- Electron capture requires Node.js Playwright (not Python)
- Some SPAs with heavy client-side rendering may need custom wait logic beyond networkidle