JSON vs YAML: Which Format to Choose for Configs, APIs, and Data

2 March, 2026 Backend

JSON vs YAML: Which Format to Choose for Configs, APIs, and Data

JSON and YAML are the two dominant serialisation formats in modern software development. You see JSON in REST APIs, configuration files, and data pipelines. You see YAML in Kubernetes manifests, CI/CD configs, and Ansible playbooks. Both serialise structured data, but they make radically different trade-offs. This article breaks down those trade-offs so you can choose the right tool for each job.


What Is JSON

JSON - JavaScript Object Notation - is defined by RFC 8259. Despite its name, it is not a strict subset of JavaScript (JSON does not allow unquoted keys, trailing commas, or certain Unicode escape sequences that JavaScript does); it is derived from JavaScript object literal syntax but occupies its own specification.

JSON was designed by Douglas Crockford in the early 2000s with three explicit goals:

  • Simplicity - the entire grammar fits on a single page. There are exactly six value types: string, number, object, array, boolean (true/false), and null.
  • Interoperability - any conforming parser in any language produces the same result for the same input.
  • Strict parsing - there is no optional syntax, no comments, no trailing commas. Either the input is valid JSON or it is not.

These constraints make JSON predictable. Every JSON parser behaves identically for valid input. That predictability is its primary strength.


What Is YAML

YAML stands for "YAML Ain't Markup Language" - a recursive acronym chosen to emphasize that YAML is about data, not document markup. YAML 1.2 (published in 2009) is a strict superset of JSON: every valid JSON document is also valid YAML 1.2.

YAML was designed with different goals:

  • Human readability - minimal punctuation, indentation-based structure, no mandatory quoting for simple strings.
  • Multi-document support - a single file can contain multiple documents separated by ---.
  • Comments - # starts a comment anywhere on a line.
  • Anchors and aliases - reuse values without repetition using &anchor and *alias.
  • Rich type system - dates, binary data, sets, ordered maps, and more are expressible natively.

The price for this richness is complexity. YAML's specification spans multiple documents, and its grammar has dozens of known ambiguities.


Syntax Differences

Here is the same application configuration expressed in both formats.

YAML:

# Application configuration
app:
  name: MyService
  version: "2.1.0"
  debug: false

server:
  host: 0.0.0.0
  port: 8080
  timeout: 30

database:
  host: localhost
  port: 5432
  name: mydb
  pool:
    min: 2
    max: 20

features:
  - name: payments
    enabled: true
  - name: analytics
    enabled: false

logging:
  level: info
  format: json
  output: stdout

JSON:

{
  "app": {
    "name": "MyService",
    "version": "2.1.0",
    "debug": false
  },
  "server": {
    "host": "0.0.0.0",
    "port": 8080,
    "timeout": 30
  },
  "database": {
    "host": "localhost",
    "port": 5432,
    "name": "mydb",
    "pool": {
      "min": 2,
      "max": 20
    }
  },
  "features": [
    { "name": "payments", "enabled": true },
    { "name": "analytics", "enabled": false }
  ],
  "logging": {
    "level": "info",
    "format": "json",
    "output": "stdout"
  }
}

The YAML version is noticeably shorter. It has no curly braces, no commas, fewer quotation marks, and crucially it has a comment. The JSON version is more explicit about every delimiter, which is precisely what makes it machine-friendly.


Readability

YAML wins on readability for humans, but that advantage comes with nuances.

Comments

JSON has no comment syntax. The original specification excluded comments intentionally - Crockford argued that comments would be used for parser directives, breaking interoperability. In practice this causes real pain: you cannot annotate why a configuration value is set the way it is. Workarounds like "_comment" keys are a hack.

YAML supports # comments on any line:

database:
  pool:
    max: 20  # Tuned for c5.2xlarge; revisit if instance type changes

Multiline Strings

YAML provides two multiline string syntaxes:

# Literal block scalar - preserves newlines
message: |
  Dear user,
  Your account has been activated.
  Welcome aboard.

# Folded block scalar - folds newlines into spaces
description: >
  This is a long description that will be
  folded into a single line when parsed.

JSON requires explicit \n escaping:

{
  "message": "Dear user,\nYour account has been activated.\nWelcome aboard."
}

Anchors and Aliases

YAML allows defining a value once and referencing it multiple times:

defaults: &defaults
  timeout: 30
  retries: 3
  log_level: info

production:
  <<: *defaults
  log_level: warn  # override just this key

staging:
  <<: *defaults

This eliminates repetition in complex configurations. JSON has no equivalent mechanism.


Strictness and Parsing

JSON: One Canonical Behaviour

RFC 8259 is short and unambiguous. For any valid JSON input, every compliant parser produces the same output. This is not an accident - it was a design goal. The only area of permitted variation is handling of duplicate keys (the spec says parsers "should" reject them but permits accepting the last value).

YAML: Specification Complexity

YAML's specification history is more complex. YAML 1.1 (2004) and YAML 1.2 (2009) differ meaningfully. YAML 1.2 aligned YAML with JSON and removed several implicit type coercions, but as of 2024, the majority of widely used YAML parsers still implement YAML 1.1 behaviour by default:

  • PyYAML (Python) - YAML 1.1
  • go-yaml v2 (Go) - YAML 1.1
  • Ruby's Psych - partially YAML 1.2 since Ruby 3.1

The YAML specification itself acknowledges 23 known areas where parser behaviour is undefined or ambiguous. This means two conforming YAML parsers can produce different results for the same input.


Performance

JSON parsing is consistently 2-5x faster than YAML parsing across major runtimes. The reason is grammar complexity.

JSON's grammar is regular enough that a parser can process it in a single linear pass with minimal lookahead. V8 (Node.js, Chrome) has a hand-optimised JSON parser that processes hundreds of megabytes per second. CPython's json module is backed by a C extension for the same reason.

YAML's grammar requires tracking indentation levels, handling multiple string syntaxes, performing implicit type detection, and supporting anchors and aliases (which require building a node graph before resolving references). All of this adds overhead.

Rough benchmarks (parsing a 1 MB document):

Runtime JSON YAML
Node.js (V8) ~5 ms ~25-40 ms
Python (CPython) ~20 ms ~80-120 ms
PHP 8 ~15 ms ~60-90 ms

These numbers vary with document complexity, but the ratio is consistent. For hot paths - API responses parsed thousands of times per second - JSON's performance advantage is significant.


Use Cases

Configuration Files - YAML Wins

When humans write and read config files, YAML's readability advantages are decisive. Comments, multiline strings, anchors, and less punctuation noise make YAML significantly better for files maintained by developers. This is why Kubernetes, Docker Compose, GitHub Actions, GitLab CI, Ansible, and most modern DevOps tooling chose YAML.

REST APIs - JSON Wins

REST APIs are machine-to-machine. Performance matters, parsing is deterministic, and no comments are needed. JSON is the universal choice. Every HTTP client library, every browser, every mobile SDK handles JSON natively.

Data Exchange Between Systems - JSON Wins

When systems exchange structured data, interoperability is paramount. JSON's strict specification and universal support make it the right choice. EDI replacement, webhooks, message queues - JSON dominates.

Kubernetes and Docker Compose - YAML

Kubernetes objects are almost universally written in YAML (JSON is technically valid but almost never used). Docker Compose uses YAML exclusively. The repetitive, deeply nested nature of these configs benefits from YAML's anchors and aliases.

OpenAPI and Swagger - Both

The OpenAPI specification supports both JSON and YAML. YAML is preferred for hand-authored specs (comments, readability). JSON is preferred when the spec is generated by tooling or consumed programmatically.


YAML Pitfalls

YAML's expressiveness creates a category of bugs that JSON simply cannot have.

The Norway Problem

This is the most famous YAML pitfall. In YAML 1.1, the following values are parsed as booleans:

# These are all boolean true or false in YAML 1.1
YES: true
NO: false   # Norway's ISO country code
ON: true
OFF: false
TRUE: true
FALSE: false
Y: true
N: false

The Norway problem: a developer writes a list of country codes as YAML keys or values. The code NO (Norway's ISO 3166-1 alpha-2 code) is silently parsed as boolean false. The same issue affects YES (not a country code, but a common config value).

Real-world impact: configuration files that used country codes as YAML keys had silent data corruption. The fix is to quote the value:

countries:
  - "NO"
  - "YES"
  - "ON"
  - SE
  - DE

YAML 1.2 (2009) removed this behaviour - only true and false (lowercase) are booleans. But most parsers still default to YAML 1.1.

Octal Interpretation

In YAML 1.1, an unquoted number starting with 0 followed by digits is interpreted as octal:

file_permissions: 0755  # Parsed as 493 (decimal), not 755
port: 0600              # Parsed as 384, not 600

Again, quoting fixes this, but the silent coercion is a trap:

file_permissions: "0755"  # String "0755" - correct

Tabs vs Spaces

YAML explicitly forbids tabs for indentation. A file that looks perfectly aligned in an editor using tab characters will fail to parse. This is a deliberate spec decision, but editors that mix tabs and spaces silently produce invalid YAML.

Duplicate Keys - Silent Overwrite

JSON parsers may warn on duplicate keys. YAML parsers typically silently take the last value:

server:
  port: 8080
  host: localhost
  port: 9090  # Silently overwrites 8080; no error

CVE-2013-4073 - YAML Arbitrary Code Execution

Ruby's default YAML parser (Syck, and early versions of Psych) deserialized arbitrary Ruby objects from YAML input. Attackers could craft YAML payloads that executed arbitrary code when parsed. This affected Rails applications that accepted user-supplied YAML and was assigned CVE-2013-4073.

The root cause: YAML's type system supports "tags" that instruct the parser to construct specific language objects. !!python/object, !!ruby/object, and similar tags are supported by many parsers. When parsing untrusted input, this is a serious vulnerability. JSON has no equivalent mechanism - it only produces primitive types.

Never parse untrusted YAML with unrestricted object construction enabled.


Code Examples

PHP

<?php

declare(strict_types=1);

// JSON encoding and decoding
$data = [
    'name' => 'MyService',
    'version' => '2.1.0',
    'debug' => false,
    'port' => 8080,
];

$json = json_encode($data, JSON_PRETTY_PRINT | JSON_THROW_ON_ERROR);
$decoded = json_decode($json, associative: true, flags: JSON_THROW_ON_ERROR);

// YAML (requires ext-yaml or symfony/yaml)
// Using symfony/yaml:
use Symfony\Component\Yaml\Yaml;

$yaml = Yaml::dump($data, indent: 2);
$decoded = Yaml::parse($yaml);

// Using ext-yaml (PECL):
$yaml = yaml_emit($data);
$decoded = yaml_parse($yaml);

Python

import json
import yaml  # pip install pyyaml

data = {
    'name': 'MyService',
    'version': '2.1.0',
    'debug': False,
    'port': 8080,
}

# JSON
json_str = json.dumps(data, indent=2)
decoded_json = json.loads(json_str)

# YAML - note: PyYAML uses YAML 1.1 by default
yaml_str = yaml.dump(data, default_flow_style=False)
decoded_yaml = yaml.safe_load(yaml_str)  # Always use safe_load for untrusted input

# For YAML 1.2 compliance, use ruamel.yaml:
# from ruamel.yaml import YAML
# yml = YAML()
# yml.version = (1, 2)

JavaScript

// JSON - native browser and Node.js support
const data = {
  name: 'MyService',
  version: '2.1.0',
  debug: false,
  port: 8080,
};

const jsonStr = JSON.stringify(data, null, 2);
const decodedJson = JSON.parse(jsonStr);

// YAML - requires js-yaml (npm install js-yaml)
import yaml from 'js-yaml';

const yamlStr = yaml.dump(data);
const decodedYaml = yaml.load(yamlStr);

// For production use, prefer the schema option to avoid implicit type coercions:
const safeDecoded = yaml.load(yamlStr, { schema: yaml.JSON_SCHEMA });

When to Use Each

Criterion JSON YAML
Human-authored config files Poor - no comments, verbose Excellent
Machine-generated / consumed data Excellent Poor - slower, riskier
REST API responses Excellent - universal support Not used
Configuration with comments Not possible Native support
Kubernetes / Docker Compose Technically valid, not idiomatic Standard
OpenAPI specs (hand-authored) Acceptable Preferred
Webhooks and event payloads Excellent Rare
High-throughput parsing Excellent Avoid
Untrusted input Safe Use safe mode only
Multi-document files Not supported Native
Reusable config blocks (DRY) Not supported Anchors and aliases
Interoperability across languages Excellent Good (with caveats)

Decision Criteria

Choose JSON when:

  • The data is consumed by machines more than humans
  • Performance is important (high-frequency parsing)
  • You need maximum interoperability across languages and tools
  • The data is transmitted over HTTP (APIs, webhooks)
  • You need strict, predictable parsing behaviour

Choose YAML when:

  • Humans write and maintain the file
  • You need comments to document intent
  • The format is a well-established YAML context (Kubernetes, Docker Compose, GitHub Actions)
  • You have complex repeated structures that benefit from anchors
  • Multiline strings would improve readability

Avoid YAML when:

  • Parsing untrusted input without careful safe-mode configuration
  • Performance is a constraint
  • You need guaranteed identical behaviour across different parsers

The Practical Rule

JSON and YAML solve the same core problem - serialising structured data - but optimise for different consumers. JSON optimises for machines: strict, fast, universally supported. YAML optimises for humans: readable, flexible, expressive.

The practical rule is simple: use JSON for data that flows between systems, use YAML for files that developers maintain. When you need to work with both or convert between them, knowing the pitfalls (the Norway problem, octal interpretation, implicit type coercions) prevents subtle bugs.

Convert between formats instantly with the JSON to YAML converter. Paste a JSON API response to understand it as YAML, or convert a YAML config to JSON for programmatic consumption - without installing any tools.

More Articles

CSV vs JSON for Data Exchange: When Each Format Wins

A practical comparison of CSV and JSON for APIs, data pipelines, and file exports. Covers structure, parsing, streaming, schema enforcement, size, tooling, and clear guidelines for choosing the right format.

15 April, 2026

SEO for AI Search: How to Optimise for ChatGPT, Perplexity, and Google AI Overviews

How AI-powered search engines discover, evaluate, and cite web content. Practical strategies for optimising your pages for ChatGPT Browse, Perplexity, Google AI Overviews, and other AI answer engines.

14 April, 2026

Image to Base64 Data URIs: When to Inline and When Not To

A practical guide to embedding images as Base64 data URIs. Covers the data URI format, size overhead, performance trade-offs, browser caching, Content Security Policy, and clear rules for when inlining helps vs hurts.

10 April, 2026