What is JSON?
JSON stands for JavaScript Object Notation. It is a lightweight, text-based, language-independent data interchange format — designed to be easy for both humans to read and write, and for machines to parse and generate.
Despite having "JavaScript" in its name, JSON is completely language-agnostic. It is natively supported in Python, Go, Rust, Java, Ruby, PHP, Swift, Kotlin, and dozens of other languages. It has become the dominant format for data exchange on the web, replacing XML in most modern applications.
Per the ECMA-404 specification, JSON was designed to be minimal, portable, textual, and a safe subset of JavaScript. It shares a small subset of ECMAScript syntax with all other programming languages, making it the universal language of structured data.
History & Official Standards
JSON was created by Douglas Crockford at State Software in 2001. It was derived from the object literal syntax of JavaScript (ECMAScript), but was designed to be usable from any language. The first public JSON website, json.org, went online in 2002.
The 6 JSON Data Types
JSON supports exactly six primitive value types. This small set — and nothing more — is what makes JSON both powerful and interoperable. No functions, no dates, no binary data, no comments. Just these six types.
JSON has no date type (use ISO 8601 strings like "2025-03-21T10:00:00Z"), no comments (despite popular demand), no undefined, no binary, no NaN or Infinity, and no trailing commas. These are the most common gotchas for developers coming from JavaScript.
Syntax Rules & Anatomy
JSON's grammar is intentionally minimal. Whitespace (spaces, tabs, newlines) between tokens is ignored and used purely for readability. The entire format is defined by just a few structural characters: { } [ ] : ,
{ ← object opens "model": "gpt-4o", ← string value "context_window": 128000, ← number value "multimodal": true, ← boolean value "fine_tune_id": null, ← null value "capabilities": [ ← array value "text", "vision", "code" ← no trailing comma on last item ], "pricing": { ← nested object "input_per_1m": 5.00, "output_per_1m": 15.00 } ← no comma after last member } ← object closes
Objects
A JSON object is an unordered collection of name/value pairs wrapped in curly braces {}. Each name must be a string (in double quotes), followed by a colon :, then the value. Pairs are separated by commas. The order of members is not significant per the spec — parsers may return them in any order.
RFC 8259 recommends against duplicate keys within a single object. The behavior of implementations that encounter duplicate names is "unpredictable" per the spec. For interoperability, always use unique keys in your JSON objects.
Arrays
A JSON array is an ordered sequence of values wrapped in square brackets []. Values are separated by commas. Arrays are zero-indexed and can contain values of mixed types — including other objects and arrays (enabling arbitrarily deep nesting).
// Homogeneous array (all strings) ["user", "system", "assistant"] // Heterogeneous array (mixed types — valid!) [1, "hello", true, null, {"key": "val"}] // Array of objects (the most common pattern in APIs) [ {"role": "user", "content": "Hello"}, {"role": "assistant", "content": "Hi! How can I help?"} ]
Strings & Escape Sequences
JSON strings must use double quotes (single quotes are invalid). Any Unicode character can appear in a string. Special characters must be escaped with a backslash \.
| Escape Sequence | Character | Notes |
|---|---|---|
\" | Double quote | Required — unescaped " ends the string |
\\ | Backslash | Required — single \ starts an escape |
\/ | Forward slash | Optional — useful in HTML contexts |
\n | Newline (LF) | Most common whitespace escape |
\r | Carriage return | Used with \n for CRLF |
\t | Tab | Horizontal tab character |
\b | Backspace | Rarely used |
\f | Form feed | Rarely used |
\uXXXX | Unicode code point | e.g. \u00e9 = é |
Numbers
JSON makes no distinction between integers and floating-point numbers — there is only "number." Numbers may be positive or negative, integer or decimal, with optional scientific notation. JSON does not allow NaN, Infinity, or -Infinity. Leading zeros (like 007) are prohibited except for 0 itself.
{ "integer": 42, ✓ valid "negative": -17, ✓ valid "float": 3.14159, ✓ valid "scientific": 1.6e-19, ✓ valid (Planck's constant-ish) "zero": 0, ✓ valid // "leading_zero": 007 ✗ INVALID — leading zeros banned // "nan": NaN ✗ INVALID — NaN not in JSON spec // "inf": Infinity ✗ INVALID — Infinity not in JSON spec }
JSON in AI & LLMs
JSON is the invisible infrastructure of every AI system. It is the format for API requests and responses, model configurations, evaluation datasets, agent tool calls, and structured output modes. Understanding JSON deeply means understanding how AI systems are built and communicate.
The OpenAI Chat Completions API
Every call to the OpenAI, Anthropic, or Google AI API sends and receives JSON. Here is a complete real-world API request and response:
{ "model": "gpt-4o", "messages": [ { "role": "system", "content": "You are a helpful assistant." }, { "role": "user", "content": "What is JSON?" } ], "temperature": 0.7, "max_tokens": 256 }
{ "id": "chatcmpl-abc123", "object": "chat.completion", "choices": [{ "message": { "role": "assistant", "content": "JSON is a..." }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 28, "completion_tokens": 96 } }
JSON Mode & Structured Output
Modern LLMs support a JSON mode — a setting that forces the model to output valid JSON every time. This is critical for building AI agents that need to interface with other systems, databases, or APIs.
// Force structured JSON output from GPT-4o { "model": "gpt-4o", "response_format": { "type": "json_object" }, "messages": [{ "role": "user", "content": "Extract: name, age, email from: 'Hi I am Alice, 32, [email protected]'" }] } // Guaranteed response — valid JSON every time: // { "name": "Alice", "age": 32, "email": "[email protected]" }
HuggingFace config.json
Every model on HuggingFace Hub stores its architecture and configuration in a config.json file — a JSON document that defines the model's architecture, vocabulary size, layer counts, and attention parameters.
{ "architectures": ["LlamaForCausalLM"], "model_type": "llama", "hidden_size": 4096, "intermediate_size": 11008, "num_attention_heads": 32, "num_hidden_layers": 32, "vocab_size": 32000, "max_position_embeddings": 4096, "torch_dtype": "float16" }
Python: Parse, Build & Validate
Python's built-in json module handles all standard JSON operations. For large-scale or high-performance scenarios, the third-party orjson library offers 5–10× faster parsing.
Parsing JSON (Reading)
import json # ── From a string ───────────────────────────────────────────── json_str = '{"model": "gpt-4o", "temperature": 0.7, "active": true}' data = json.loads(json_str) # loads = load from String print(data["model"]) # → "gpt-4o" print(type(data["temperature"])) # → <class 'float'> print(type(data["active"])) # → <class 'bool'> # ── From a file ─────────────────────────────────────────────── with open("config.json", "r", encoding="utf-8") as f: config = json.load(f) # load = load from File # ── Type mapping: JSON → Python ─────────────────────────────── # JSON object → dict # JSON array → list # JSON string → str # JSON number → int or float (Python decides) # JSON true → True # JSON false → False # JSON null → None
Serializing JSON (Writing)
import json data = { "model": "gpt-4o", "temperature": 0.7, "messages": [{"role": "user", "content": "Hello"}], "active": True, "notes": None } # ── To a string ─────────────────────────────────────────────── compact = json.dumps(data) # → '{"model": "gpt-4o", "temperature": 0.7, ...}' pretty = json.dumps(data, indent=2, ensure_ascii=False) # → nicely indented, Unicode preserved sorted_keys = json.dumps(data, sort_keys=True) # → keys in alphabetical order (good for diffs) # ── To a file ───────────────────────────────────────────────── with open("output.json", "w", encoding="utf-8") as f: json.dump(data, f, indent=2, ensure_ascii=False) # ── Python → JSON type mapping ──────────────────────────────── # dict → JSON object # list/tuple → JSON array # str → JSON string # int/float → JSON number # True → true # False → false # None → null
JSON Schema Validation with Pydantic
from pydantic import BaseModel, ValidationError from typing import Optional import json # Define expected schema as a Pydantic model class LLMConfig(BaseModel): model: str temperature: float = 0.7 max_tokens: int stream: bool = False system_prompt: Optional[str] = None # Validate JSON from an API or file raw_json = '{"model": "gpt-4o", "max_tokens": 1024}' try: config = LLMConfig(**json.loads(raw_json)) print(config.model) # → "gpt-4o" print(config.temperature) # → 0.7 (default) except ValidationError as e: print(f"Schema validation failed: {e}") # This pattern is the foundation of structured LLM output — # force the model to output JSON, then validate with Pydantic.
JSON vs XML vs JSONL
| Property | JSON | XML | JSONL |
|---|---|---|---|
| Human readable | ✓ Very readable | ⚠ Verbose | ✓ Line-by-line |
| Verbosity | Minimal | High (opening + closing tags) | Minimal |
| Comments | ✗ Not supported | ✓ Supported | ✗ Not supported |
| Streaming | ⚠ Needs full parse | ✓ SAX parser | ✓ Line-by-line |
| Schema standard | JSON Schema (draft) | XSD (W3C standard) | Per-line JSON Schema |
| Namespaces | ✗ Not supported | ✓ Full namespace support | ✗ Not supported |
| AI / LLM usage | ✓ APIs, configs, output | ✗ Legacy, rarely used | ✓ Training datasets |
| File size | Small | Large (2–3× JSON) | Small |
Valid vs Invalid JSON Examples
{
"name": "Alice",
"age": 30,
"active": true,
"score": null
}{
'name': 'Alice',
'age': 30
}{
"user": {
"id": 1,
"tags": ["admin", "user"]
}
}{
"name": "Alice",
"age": 30,
}"just a string" 42 true null
{
// This is a comment
"name": "Alice",
/* block comment */
"age": 30
}Common Mistakes
- Single quotes instead of double quotes — JSON requires double quotes for all strings and all object keys. Single quotes are JavaScript syntax, not JSON syntax.
- Trailing commas — A comma after the last element of an object or array is valid JavaScript but invalid JSON. This trips up many developers copying JS code into a JSON file.
- Comments — JSON has no comment syntax. Using
// commentor/* comment */makes a file invalid JSON. Use a README or external documentation instead. - Unquoted keys — JavaScript allows
{ name: "Alice" }but JSON requires{ "name": "Alice" }. All keys must be quoted strings. - Using
NaNorInfinity— These JavaScript number values have no representation in JSON. Usenullas a sentinel value or handle them before serialization. - Dates as raw Date objects — JSON has no date type. Always serialize dates as ISO 8601 strings:
"2025-03-21T10:00:00Z". - Forgetting
ensure_ascii=False— Python'sjson.dumps()escapes non-ASCII characters by default. Addensure_ascii=Falseto preserve Unicode characters as-is. - Assuming key order is preserved — The JSON spec says object member ordering is not significant. Most modern parsers do preserve insertion order, but you should never rely on it.
- Using JSON for binary data — JSON is a text format. Binary data (images, audio) must be base64-encoded before embedding in JSON, which increases size by ~33%. Consider a separate binary channel instead.
Frequently Asked Questions
Does JSON support comments?
No. Comments were deliberately excluded from JSON by Douglas Crockford. He later explained that allowing comments would enable people to use JSON as a config file format and add parsing directives — which would break interoperability. If you need comments in config files, use JSONC (JSON with Comments, used by VS Code) or JSON5. For everything else, keep documentation in a separate file.
Is JSON the same as a JavaScript object literal?
No — JSON is a strict subset of JavaScript object syntax. Key differences: JSON requires double quotes on all keys (JS allows unquoted), JSON forbids trailing commas (JS allows them), JSON forbids comments (JS allows them), and JSON forbids undefined as a value. You can always safely embed valid JSON in a JavaScript file, but not vice versa.
What's the difference between RFC 8259 and ECMA-404?
Both are authoritative standards for JSON, and they describe the same grammar. The key difference is scope: ECMA-404 is a pure grammar specification — it defines only what is syntactically valid JSON. RFC 8259 adds interoperability guidance on top: it mandates UTF-8 for networked JSON, recommends against duplicate keys, and addresses security concerns. For building real systems, follow RFC 8259.
How do I handle large JSON files in Python without running out of memory?
For large JSON files, use a streaming parser like ijson — it lets you parse incrementally without loading the whole file into memory. Alternatively, consider whether your data should be in JSONL format instead, which is natively streamable line-by-line.
Why is JSON preferred over XML in modern APIs?
JSON is lighter (no opening/closing tags), maps directly to data structures in most programming languages, is faster to parse, and is more readable at a glance. XML remains useful for documents with mixed content (text and tags), namespace requirements, or rich schema validation — but for data-only APIs, JSON won every practical comparison.