Files
awesome-copilot/skills/web-coder/references/data-formats-encoding.md
John Haugabook 8fedf95507 new skill web-coder (#881)
* new skill web-coder

* codespellrc: add aNULL HTTPS config cipher string

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Apply suggestions from code review

* Apply suggestion from @jhauga

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2026-03-05 21:43:04 +11:00

8.1 KiB

Data Formats & Encoding Reference

Data formats, character encodings, and serialization for web development.

JSON (JavaScript Object Notation)

Lightweight data interchange format.

Syntax

{
  "string": "value",
  "number": 42,
  "boolean": true,
  "null": null,
  "array": [1, 2, 3],
  "object": {
    "nested": "value"
  }
}

Permitted Types: string, number, boolean, null, array, object
Not Permitted: undefined, functions, dates, RegExp

JavaScript Methods

// Parse JSON string
const data = JSON.parse('{"name":"John","age":30}');

// Stringify object
const json = JSON.stringify({ name: 'John', age: 30 });

// Pretty print (indentation)
const json = JSON.stringify(data, null, 2);

// Custom serialization
const json = JSON.stringify(obj, (key, value) => {
  if (key === 'password') return undefined; // Exclude
  return value;
});

// toJSON method
const obj = {
  name: 'John',
  date: new Date(),
  toJSON() {
    return {
      name: this.name,
      date: this.date.toISOString()
    };
  }
};

JSON Type Representation

How JavaScript types map to JSON:

  • String → string
  • Number → number
  • Boolean → boolean
  • null → null
  • Array → array
  • Object → object
  • undefined → omitted
  • Function → omitted
  • Symbol → omitted
  • Date → ISO 8601 string

XML (Extensible Markup Language)

Markup language for encoding documents.

<?xml version="1.0" encoding="UTF-8"?>
<users>
  <user id="1">
    <name>John Doe</name>
    <email>john@example.com</email>
  </user>
  <user id="2">
    <name>Jane Smith</name>
    <email>jane@example.com</email>
  </user>
</users>

Use Cases:

  • Configuration files
  • Data exchange
  • RSS/Atom feeds
  • SOAP web services

Parsing XML in JavaScript

// Parse XML string
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(xmlString, 'text/xml');

// Query elements
const users = xmlDoc.querySelectorAll('user');
users.forEach(user => {
  const name = user.querySelector('name').textContent;
  console.log(name);
});

// Create XML
const serializer = new XMLSerializer();
const xmlString = serializer.serializeToString(xmlDoc);

Character Encoding

UTF-8

Universal character encoding (recommended for web).

Characteristics:

  • Variable-width (1-4 bytes per character)
  • Backward compatible with ASCII
  • Supports all Unicode characters
<meta charset="UTF-8">

UTF-16

2 or 4 bytes per character.

Use: JavaScript internally uses UTF-16

'A'.charCodeAt(0); // 65
String.fromCharCode(65); // 'A'

// Emoji (requires surrogate pair in UTF-16)
'😀'.length; // 2 (in JavaScript)

ASCII

7-bit encoding (128 characters).

Range: 0-127
Includes: English letters, digits, common symbols

Code Point vs Code Unit

  • Code Point: Unicode character (U+0041 = 'A')
  • Code Unit: 16-bit value in UTF-16
// Code points
'A'.codePointAt(0); // 65
String.fromCodePoint(0x1F600); // '😀'

// Iterate code points
for (const char of 'Hello 😀') {
  console.log(char);
}

Base64

Binary-to-text encoding scheme.

// Encode
const encoded = btoa('Hello World'); // "SGVsbG8gV29ybGQ="

// Decode
const decoded = atob('SGVsbG8gV29ybGQ='); // "Hello World"

// Handle Unicode (requires extra step)
const encoded = btoa(unescape(encodeURIComponent('Hello 世界')));
const decoded = decodeURIComponent(escape(atob(encoded)));

// Modern approach
const encoder = new TextEncoder();
const decoder = new TextDecoder();

const bytes = encoder.encode('Hello 世界');
const decoded = decoder.decode(bytes);

Use Cases:

  • Embed binary data in JSON/XML
  • Data URLs (data:image/png;base64,...)
  • Basic authentication headers

URL Encoding (Percent Encoding)

Encode special characters in URLs.

// encodeURIComponent (encode everything except: A-Z a-z 0-9 - _ . ! ~ * ' ( ))
const encoded = encodeURIComponent('Hello World!'); // "Hello%20World%21"
const decoded = decodeURIComponent(encoded); // "Hello World!"

// encodeURI (encode less - for full URLs)
const url = encodeURI('http://example.com/search?q=hello world');

// Modern URL API
const url = new URL('http://example.com/search');
url.searchParams.set('q', 'hello world');
console.log(url.toString()); // Automatically encoded

MIME Types

Media type identification.

Common MIME Types

Type MIME Type
HTML text/html
CSS text/css
JavaScript text/javascript, application/javascript
JSON application/json
XML application/xml, text/xml
Plain Text text/plain
JPEG image/jpeg
PNG image/png
GIF image/gif
SVG image/svg+xml
PDF application/pdf
ZIP application/zip
MP4 Video video/mp4
MP3 Audio audio/mpeg
Form Data application/x-www-form-urlencoded
Multipart multipart/form-data
<link rel="stylesheet" href="styles.css" type="text/css">
<script src="app.js" type="text/javascript"></script>
Content-Type: application/json; charset=utf-8
Content-Type: text/html; charset=utf-8
Content-Type: multipart/form-data; boundary=----WebKitFormBoundary

Serialization & Deserialization

Converting data structures to/from storable format.

JSON Serialization

// Serialize
const obj = { name: 'John', date: new Date() };
const json = JSON.stringify(obj);

// Deserialize
const parsed = JSON.parse(json);

Serializable Objects

Objects that can be serialized by structured clone algorithm:

  • Basic types
  • Arrays, Objects,
  • Date, RegExp
  • Map, Set
  • ArrayBuffer, TypedArrays

Not Serializable:

  • Functions
  • DOM nodes
  • Symbols (as values)
  • Objects with prototype methods

Character References

HTML entities for special characters.

&lt;    <!-- < -->
&gt;    <!-- > -->
&amp;   <!-- & -->
&quot;  <!-- " -->
&apos;  <!-- ' -->
&nbsp;  <!-- non-breaking space -->
&copy;  <!-- © -->
&#8364; <!-- € -->
&#x20AC; <!-- € (hex) -->

Data URLs

Embed data directly in URLs.

<!-- Inline image -->
<img src="data:image/png;base64,iVBORw0KGgoAAAANS..." alt="Icon">

<!-- Inline SVG -->
<img src="data:image/svg+xml,%3Csvg xmlns='...'%3E...%3C/svg%3E" alt="Logo">

<!-- Inline CSS -->
<link rel="stylesheet" href="data:text/css,body%7Bmargin:0%7D">
// Create data URL from canvas
const canvas = document.querySelector('canvas');
const dataURL = canvas.toDataURL('image/png');

// Create data URL from blob
const blob = new Blob(['Hello'], { type: 'text/plain' });
const reader = new FileReader();
reader.onload = () => {
  const dataURL = reader.result;
};
reader.readAsDataURL(blob);

Escape Sequences

// String escapes
'It\'s a string'; // Single quote
"He said \"Hello\""; // Double quote
'Line 1\nLine 2'; // Newline
'Column1\tColumn2'; // Tab
'Path\\to\\file'; // Backslash

Data Structures

Arrays

Ordered collections:

const arr = [1, 2, 3];
arr.push(4); // Add to end
arr.pop(); // Remove from end

Objects

Key-value pairs:

const obj = { key: 'value' };
obj.newKey = 'new value';
delete obj.key;

Map

Keyed collections (any type as key):

const map = new Map();
map.set('key', 'value');
map.set(obj, 'value');
map.get('key');
map.has('key');
map.delete('key');

Set

Unique values:

const set = new Set([1, 2, 2, 3]); // {1, 2, 3}
set.add(4);
set.has(2); // true
set.delete(1);

Glossary Terms

Key Terms Covered:

  • ASCII
  • Base64
  • Character
  • Character encoding
  • Character reference
  • Character set
  • Code point
  • Code unit
  • Data structure
  • Deserialization
  • Enumerated
  • Escape character
  • JSON
  • JSON type representation
  • MIME
  • MIME type
  • Percent-encoding
  • Serialization
  • Serializable object
  • Unicode
  • URI
  • URL
  • URN
  • UTF-8
  • UTF-16

Additional Resources