YAML is more than JSON without brackets (part 1)

Without a doubt, YAML makes my blood run faster. There is no other data format causing so plenty of mixed feelings in my mind. I came to both love it and hate it. The powerful side of this language pleases me, yet, as people say, with great power comes great responsibility.

Working in the Stoplight team as part of 11Sigma crew I contributed to Stoplight Studio & Stoplight Spectral, both of which operate with YAML spec heavily. That allowed me to dive very deep into its specs.

Table of contents

Without a doubt, YAML makes my blood run faster. There is no other data format causing so plenty of mixed feelings in my mind. I came to both love it and hate it. The powerful side of this language pleases me, yet, as people say, with great power comes great responsibility.

Working in the Stoplight team as part of 11Sigma crew I contributed to Stoplight Studio & Stoplight Spectral, both of which operate with YAML spec heavily. That allowed me to dive very deep into its specs.

Why should you read this, and who is it for?

The point of this post is to shed more light on YAML and show what YAML has to offer. I assume that people reading this post already know what JSON & YAML are, and know their basics – but worry not, I don’t expect that you are a YAML expert.

This article will be something between "YAML tutorial" and "what YAML has, but JSON not" since the YAML 1.2 version, YAML is a superset of JSON. Therefore JSON does not quite have any feature YAML wouldn’t have.

Where possible, I’ll try to make a side-by-side comparison, albeit I have to state right away that at times it will be not achievable, as YAML is just by far more powerful than JSON.

I intend to dig into certain areas that are rather lesser-known and explore the capabilities YAML has. I firmly believe that although YAML is undoubtedly wonderful, you should familiarize yourself with it before using it.

Hopefully, my article will clarify certain aspects of this language you might have always been confused about.

To fulfill the intentions of this article, I might occasionally sneak in Stoplight-specific implementation details I can publicly disclose, along with some shameless plugs, as most of the code I’ll be referring to was authored by me. In consequence, a blend of JS code is to be expected.

Although I’m remarkably torn when it comes to YAML, I’ll do my best to put my opinions towards YAML aside. Therefore there’ll be no ranting here – only facts.

The Stoplight Footprint

Before I kicked off my first work at Stoplight, my attitude towards YAML was entirely neutral, as my expertise about YAML was arguably mediocre.

However, the time I spent handling YAML-specific requests happened to cause disruptions in my mind. While I was discovering more and more of its features, I realized how robust this language is and how by far superior it is if you attempt to compare it with JSON.

In parallel to that process, I observed that the majority of the people I had contact with, whether users or just my peers,
had equally average knowledge about YAML and treated YAML as JSON with different formatting.

This sort of reasoning tends to pose a challenge for a variety of reasons, with the most crucial ones being:

  • YAML is ubiquitous in our space, it’s presumably more popular than JSON, and it’s commonly used by non-engineers,
  • we provide tooling that needs decent YAML support.

If you’re eager to learn what makes YAML different than JSON, please read on!

Selected differences

I decided to split the differences into smaller subsections to ensure the integrity of the article. Apart from the "vocabulary" section, each one illustrates features that have a bigger or lesser weight on the final output.

We’ll start from very general aspects and then go down the rabbit hole to unleash more powerful capabilities YAML/JSON has to offer.

Here’s a set of initial assumptions applicable throughout the rest of the reading:

  • RFC8259 is the JSON spec we rely on in JSON examples,
  • YAML 1.2 is the default YAML spec we use in examples unless stated otherwise. The Core Schema is mainly in use,
  • for the sake of simplification, "newline" is going to be interchangeably used with the line feed character (\x0A), so – to put it differently – a new line in the following article equals LF.



Perhaps the most trivial distinction out of them all is the naming. Although this may not particularly matter in daily life, I found it essential to pinpoint how things stand from the spec’s point of view because I’m going to rely on them in other parts of the post.

Plus, the variety of tools leverages the terminology introduced by the spec. Thus it’s certainly not a bad idea to grasp such basics.


  • structured
    • array
    • object
  • primitive
    • string
    • number
    • boolean
    • null


  • sequence
  • mapping
  • scalar – the data type representable as a series of Unicode characters. In practice, a string, a number, a boolean, and similar are scalar values.

The shortlist of node types in YAML might be surprising at first glance, but there’s a lot to surface, so don’t be misled by that. This section is primarily oriented on vocabulary, and any assumptions here don’t rule out anything.

In reality, YAML has more data types than JSON – they will have more exposure soon.

Differences go further beyond the type names, as YAML also somewhat imposes the naming of the data processor should adhere to.

Rather than parse and stringify, YAML uses the following terms:

  • load
  • dump

For the curious ones, the process of translating a character stream to data structures (the actual data you operate with) is called load.

Parsing is a step of that process, yet the YAML processor has additional tasks to perform than to simply parse a source.

The reverse operation of serializing data back to text is called dump.

Hopefully, this clarifies why js-yaml decided to use these names instead of the familiar parse and stringify that are natively available in any ES-compliant environment.

Document(s) in a single character stream

This one is likely to be one of the least known YAML possibilities.

YAML allows you to specify multiple documents in a single stream (e.g. file), while JSON does not offer such an option.

title: 'I am a document!'
title: 'I am a different document!'

A single stream, but 2 documents. More documents can be inlined if needed, as there’s no upper limit enforced.

Now, let’s consider this character stream.

# I have no documents!

Although it may make little to no sense, it’s still a valid YAML character stream. It has simply no documents in it.

Since we’re introducing some potentially new syntax, it feels more than appropriate to clarify it.

Despite what its usage may suggest, --- is not a "document start" marker. This particular notation stands for "directives end".

In short, a directive is an instruction you can supply to the YAML processor. One can pick from two directives:

  • YAML
  • TAG

The YAML directive is explained in another section, while the TAG one will be fully extracted in a separate article.

The second new bit of syntax we used is ..., and this one is indeed the "document end" marker.

Most of the tools that consume both JSON & YAML text formats will probably assume that a single character stream is equal to a single document.

One has to bear in mind the above fact does not imply that the underlying YAML processors are not prepared to process such a stream.

To back my words with a real-life case, I can say the most popular YAML processor written in JS, namely js-yaml, is capable of recognizing (via the loadAll function) multiple documents in a single character stream. However, as stated earlier, most of the consumers of the aforementioned are unlikely to.

JSON, on the other hand, is limited to a single document, as it doesn’t expose any way to define a document explicitly.

Character Encoding

In JSON, according to RFC8259, the character encoding is set to UTF-8. Besides, the byte order mark cannot be included at the beginning of the stream. JSON processors may ignore it, but character stream cannot contain it.

YAML supports UTF-8, UTF-16, as well as UTF-32. In addition, the YAML processor must support character streams starting with the byte order mark.

Interestingly, both UTF-16 and UTF-32 encodings were added for the sake of JSON compatibility, which, as we described, accepts only UTF-8.

It has to be stated that prior to RFC8259 (written in 2017.12), JSON had no such requirement. The last revision of YAML 1.2 spec took place in 2009.10, so back then, it could matter.

Spec Versioning

Another, presumably also appearing to be a trivial difference, is that YAML lets you specify a version of the YAML spec you want to use in your document by using a directive called YAML.

Intriguingly, if a YAML processor wants to be spec-compliant it’s supposed to support older versions.

%YAML 1.1
This is a valid document that should be processed according to YAML 1.1.

On the other hand, according to the specification, processors should raise warnings for higher minor versions, i.e. 1.3, 1.4, etc., and should bail upon major versions, i.e. 2.x, 3.x, etc.

%YAML 1.3
This one should raise a warning
%YAML 2.0
while this one should result in an error

JSON does not offer you any of the above, but at the same time, one has to notice JSON spec hasn’t changed significantly, therefore there was not such a need.

YAML vs JSON usability

These differences will mostly cover the usability aspects of each spec.


YAML has them, JSON does not. Boom. As simple as that.

It’s worth mentioning that there are JSON abbreviations that support comments as well, such as JSONC (JSON with JS style comments).

Comments in YAML start with #. There are no multiline comments – you are expected to write a few comments instead, each separated by a new line.

Anchors and Aliases

In my opinion, this is the best feature YAML has.

austrian-cities: &austrian-cities // this is an anchor
  - Vienna
  - Graz
  - Linz
  - Salzburg

  austria: *austrian-cities // this is an alias

In short, aliases begin with *, while anchors with &. On top of that, anchors cannot contain [, ], ,, { and }.

An anchor is used to indicate that a given node is supposed to be reusable in the future. In other words, this is a way to tell that you might want to reference that node using an alias node.

Any node can be anchored, but there’s no requirement of them being referenced by alias nodes, thus it’s not an error if you denote a given node with an anchor, but you don’t actually refer to it later on using an alias.

This feature has several benefits:

  • Ability to reuse certain nodes. This helps us mitigate the repetitions in configuration files and other files that see plenty of patterns. In JSON, you’d need to include everything n times.
    Not only may this happen to be quite tedious, but the resulting document is also more sparse than it could have been as if anchors and aliases were available.
  • Is a rescue boat where circular data structures need to be represented. Thanks to the aliases and anchors, the dump of such data back to text is a piece of cake. JSON is on the opposite side here in this area, as it doesn’t come up with any standardized approach to this problem, and you need to resort to other solutions such as JSON Schema $ref (but such a document is simply less portable.)

As a person who constantly works with JSON Schema $refs (side note – albeit they cannot be circular, in reality, you have to support that scenario), I still see YAML anchors and aliases as a true life-saver. Tons of users expect to receive a document that’s accepted by their tool, and using a custom approach doesn’t assert that.

The ability to represent circular data can also be a downside when it comes to loading. After all, the data you will work with may have circular references you now need to account for.

If you’re an implementer, the easiest way is to break these circular refs after a document is loaded. This is what we did in a utility called dereferenceAnchor.

Despite not being fully compliant with the spec due to that, you often have no other choice because plenty of tooling just gives up upon circular references. How such an output looks like in practice can be checked

Flow and Block Scalars / Multiline Strings

In the vocabulary section, we already briefly explained what the scalar in YAML is, but let’s dig more.

To recap, a scalar is a set of Unicode characters, and therefore it’s the most ubiquitous node type you’ll see in YAML.

Although it empowers you to define a variety of data types, in this section our focus will be primarily oriented on strings, as this is where the difference is most notable.

Don’t worry, though, as we’ll certainly explain the topic even further later on.

Flow Scalars

Overall, there are 3 types of flow scalars

  • plain,
  • single-quoted,
  • double-quoted.
plain: I am plain style!
single-quoted: 'I am surrounded by single quotes!'
double-quoted: "I am surrounded by double quotes!"

Personally, when in doubt, I try to refrain from using the plain style (funnily though, the keys of mapping’s pairs I used in the document above are plain flow scalars), as it may lead to unpredictable results at times.

I will touch this topic in a bit, but to show an example and see what we get…


If we took only available recommended schemas, this value could be represented as both Infinity (float) or plain string.

Unluckily for us, not every library provides information on the schema they use, therefore it’s best to be more explicit and always use quotes, assuming we indeed want a string.

One has to bear in mind that these cases aren’t rather often, therefore leveraging plain style isn’t any kind of anti-pattern.

It’s just occasionally risky if you have a value that might happen to be resolved differently.

Plain Style

Generally speaking, this is the most limited style.

Since it’s not quoted, you cannot quite use any sign that might lead to ambiguity.

This means signs such as - or # must not be used, as their roles are different.

Newlines, however, are still respected. As always, indentation is the key here.

some-key: I am a plain scalar
  span across
  multiple lines

I’ll resort to JSON one again to visualize it.

  "some-key": "I am a plain scalar span across multiple lines"

The caveat is that all leading and trailing whitespaces get trimmed.

some-key:    I am a plain scalar
  span across
  multiple lines

The JSON output remained unchanged.

  "some-key": "I am a plain scalar span across multiple lines"

Single-Quoted Style

Single-quoted style is more robust than plain style, in the way it allows more control over whitespaces.

some-key: 'I am a single-quoted scalar
  span across
  multiple lines


Now, the JSON output finally has the newlines!

  "some-key": "I am a single-quoted scalar span across multiple lines\n\n"

There’s one caveat we need to keep in mind.
Escaping does not work, hence the following document is not valid.

'It\'s a great day'

To fix it, you’d need to add another quote.

'It''s a great day'
"It's a great day"

Because of that, line feeds (\n) won’t yield expected results (assuming you expect a new line, obviously).

'It\n''s a great day'
"It\\n's a great day"

As you can see, \n got escaped.

Double-Quoted Style

Close to single-quoted, but with fully working escaping.
Working escaping is particularly useful for line feeds.

"It\n a great day"


"It\n a great day"

It has to be noted that spanning single-quoted and double-quoted scalars across multiple lines is sometimes prohibited.

For instance, it’s forbidden to have a multiline mapping key, i.e.

Block Scalars

There are three factors that have an influence on the final shape of your outcome:

  • Style
  • Chomping
  • Indentation

Each of them affects the document differently, with style being the most impactful, and indentation least.

The block header is a "combination" of chomping and indentation, and you have to place that header after the style, but right before the content itself.

Here’s how it looks like using simplified BNF (Backus-Naur form) notation.

<style> ::= "|" | ">"
<chomping> ::= ["-" | "+"]
<indentation> ::= ["1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"]

<block-header> ::= <chomping> <indentation> | <indentation> <chomping>
<SYNTAX> ::= <style> <block-header>

Style always has to be placed at the very beginning, while the order of chomping and indentation does not matter.

You can use all of them or style alone.

If you prefer to see the code, you’re in luck, as some time ago I wrote a small function that extracts all the information listed in this section.

You can find it here.

Moreover, a decent amount of examples is available as well.

Sadly, certain converters appear to disregard chomping and indentation, thus one needs to be aware of that when using them to convert your YAML document to JSON.

Sections below should explain why that extra verification is crucial.


  • | – literal
  • > – folded

The style determines whether line folding applies to scalar or not. To put it differently, one could say that newlines are treated differently.

When a literal style is used, newlines are preserved, while folded styles make a scalar subject to line folding.

To illustrate, given the following literal style block scalar


you would get such a JSON document.


Now, if the folded style was used, the output would be considerably different.

"Winnie-the-Pooh Винни-Пух\n"

As you can see, the newline, excepts the trailing one, was replaced with spaces. That means newlines are always stripped. You can still have a new empty line.



This is certainly one of the most misunderstood features YAML offers. People appear to be often confused about these newlines differences.


Unfortunately, the "newlines nightmare" does not end on style. There is one additional factor, namely chomping.

This indicator can be used to instrument the processor what should happen to newlines located at the end of the scalar.

  • - – strip
  • + – keep
  • Ø – clip

Luckily, this is rather negligible influence, as the default chomping is clip (explained a lower down the page), therefore it’s one’s conscious decision to change it.

When strip chomping is used, all trailing newlines are stripped, as well as the last line break.



Keep chomping, as the name implies, preserves all trailing newlines together with the line break.



Now, clip chomping causes all trailing newlines to be removed, but the final line break does get preserved.




The indentation we adhere to here is the numeric value you can specify on each header.

If no explicit indentation is specified, it is automatically deducted from the source, from the first empty line of the given scalar, to be more detailed.

The valid range is between 1 and 9.

For most of the time, you won’t need to provide any indentation, but there are cases where you may need to.

One of them is when your sentence begins with empty spaces.


     I want some whitespaces before myself!
  Me not.

This is valid YAML.

If we converted it to JSON, we’d get

"   I want some whitespaces before myself!\nMe not.\n"

Now, if we removed the indentation…

     I want some whitespaces before myself!
  Me not.

we’d get an error because the inferred indentation would equal 6, and the "Me not" part of the document has only 2 leading spaces.

The fixed version would need to look as below

     I want some whitespaces before myself!
     Me not.

yet likewise, the JSON output would be quite different.

Moreover, it’s also invalid if the given value is too high. The syntax explicitly imposes that indentation is a digit, placed in the range of 1-9.

          this is invalid

Furthermore, it’s incorrect when the indentation does not match with an actual indentation used.

this is invalid
Line Folding

Certain styles such as flow and folded block undergo the process called line folding.

In short, it’s an operation where whitespaces get mangled in a particular way.

The intended outcome of this is to improve the readability by allowing lengthy lines to be broken into multiple lines. Overall, the process is relatively simple.

Per spec,

If a line break is followed by an empty line, it is trimmed; the first line break is discarded and the rest are retained as content.

'I am
a multiline

Oh cool.'
"I am a multiline string\n\nOh cool."

If we placed any whitespaces in-between "string" and "oh cool" words, they’d get discarded.

Despite being prone to the same action, yet again the actual final behavior varies depending on the indicators you used (if it’s a folded block) as well as the style of flow scalar.

Yes, you guessed it. Once more it all boils down to whitespaces.

As we learned moments ago, block scalars have a thing called chomping that determines the way trailing whitespaces are handled.

Another difference is that in case of block scalars leading and trailing whitespaces placed on each line are preserved.


    I have 2 whitespace before!


"look!\n  I have 2 whitespace before!\n"


    I have 2 whitespace before in the document as well, but since I"m a flow scalar I will end up having a single one!
    The new line will also disappear (note the space at the end in the JSON output).


"look! I have 2 whitespace before in the document as well, but since I\"m a flow scalar I will end up having a single one! The new line will also disappear (note the space at the end in the JSON output). "



          I have no whitespaces now, but new line is still is here, because there's an empty line before me!


"look!\nI have no whitespaces now, but new line is still is here, because there's an empty line before me!"
Is there an equivalent of block scalars in JSON?

Needless to say, JSON does not offer anything similar. This is a true drawback for certain users, as reading lengthy sentences is somewhat troublesome if you do not have any line-wrapping in your viewer.

JSON is much more opinionated, as it explicitly demands a new line to be explicitly placed within a string.


Data / Node Types

This partially falls under usability, therefore in the spirit of improving the readability, they landed under this chunk of the entire document.


I couldn’t quite decide whether I should come up with a dedicated section for collections or not. Technically they fall under the tags and schemas section, but having them in a different spot is perhaps a bit better.

Sequences / Arrays

In other programming languages, such a data type is described as an array, list, vector, sequences. In YAML it’s called sequence and in JSON it’s an array.

YAML and JSON arrays have something in common, namely, they’re both ordered, and they can hold any nodes. So, if you key about ordering, this is the right data structure.

This is how things stand from a syntax perspective In JSON, you use brackets [and ] to denote an array.

["Thailand", "Laos", "Myanmar"]

YAML offers two ways to write a sequence, with the most popular one being block sequence:

- 'Thailand'
- 'Laos'
- 'Myanmar'

and the other commonly used is flow sequence, which is very similar or same to JSON:

['Thailand', 'Laos', 'Myanmar']

In block sequence, - is used to denote a single entry. In flow sequence, , indicates the end of an entry.

A sequence may point at itself, thus the following document is valid.

- *users
Mappings / Objects

Commonly named as an object, dictionary, hash table, struct, record, keyed list, or associative array.

Same as sequences/arrays, YAML mappings and JSON objects also share some similarities, and yet again the most significant one is the ordering, and to be more precise, the lack of it. They are both unordered collections.

From a syntax point of view, the situation is similar. In JSON, curly braces { and } are meant to express the object.

  "Golden Triangle": ["Thailand", "Laos", "Myanmar"]

In YAML, like with sequences, you are free to choose from 2 styles


'Golden Triangle':
  - 'Thailand'
  - 'Laos'
  - 'Myanmar'

and flow, which is yet again usually very close or equal to JSON:

{ 'Golden Triangle': ['Thailand', 'Laos', 'Myanmar'] }

In some aspects, mappings are notably different from JSON objects.

In YAML, any node type can be used as a mapping key. This implies you can very well use another mapping or sequence, or a numeric scalar, null scalar, etc.

? wow: much complex
: wow: much fun

As you might see I made use of ? to indicate that a given mapping key will be complex.

Usually ? does not need to be used, albeit it generally means a mapping key, so you can use it with simple keys either.

? wow: such mapping

which, unsurprisingly, in JSON would equal

  "wow": "such mapping"

You can also use flow styles, if you would like to, in example:

{ wow: much complex }: { wow: much fun }

If you are a JS developer and use a library like js-yaml you won’t be able to process such data.

JS objects take only strings, and, as of ECMAScript 2015, Symbols as a property key. The trick here would be to consume the AST js-yaml produces yourself, and use Maps instead that are perfect for such data types.

Furthermore, certain tooling may just naively assume that mapping is used as the value.

This may potentially lead to invalid results.

For instance, at Stoplight, we have a function getJsonPathForPosition, and it bails out upon such input. It’s a conscious decision, and we didn’t see a need to support these particular cases.

The aforementioned utility tries to generate a JSON path leading to a value at the current position.

import YAML from '@stoplight/yaml';
import chai from 'chai';

const { expect } = chai;

const document = `hello: world
  street: 123
? address: street
: 123

expect( YAML.getJsonPathForPosition(YAML.parseWithPointers(document), {
    // we follow LSP (Language Server Protocol), hence all values are 0-based
    character: 10,
    line: 2,
).to.deep.equal(['address', 'street']);

// However, if you use a complex key...
expect( YAML.getJsonPathForPosition(YAML.parseWithPointers(document), {
    character: 2,
    line: 3,
).to.deep.equal(['address']); // this is not quite valid, we have no way to represent such path, as `type Segment = number | string; type JSONPath = Segment[]``

Fortunately, I haven’t observed complex mapping keys being used often, especially when the consumer can also be given a different format such as JSON.

The real troubles begin when you realize users start inserting values that get resolved to numbers. This is a ubiquitous pattern.

People avoid quoting keys as it’s both more convenient and more readable. The problem here is that such a scalar value is usually resolved to a numeric scalar, and hence the semantics is different.

# this is a portion of some OAS document
  # rest

Such a document cannot be expressed in JSON.

# this is a portion of some OAS document
  # what now?

responses property has two perfectly valid pairs that are different.

Interestingly or not, certain specs such as OAS explicitly tell users that keys need to be string, so prohibit such usage. The linter @stoplight/spectral I’ve been working on actually yells when non-JSONish keys are present.

Apart from all the above, duplicate keys are strictly prohibited, while JSON has no such restriction in place.

Instead, RFC8259 recommends that each object has a unique key, but does not impose it. This means that certain JSON documents may not be valid YAML documents.

  "foo": true,
  "foo": false

The above is somewhat valid JSON, but not valid YAML.

Empty node

It’s perhaps worth pointing out that under certain circumstances a lack of value may actually not equal an actual lack of data.


In the situation above, the value of mapping with key "empty" will be null.
This is how a JSON document would look like

  "empty": null

Sequences are prone to the same.


Such a node can also be used as a key of a mapping pair.

: "empty"

The document above cannot be correctly expressed in JSON, because null cannot be used as a property key.

Tags and Schemas

My initial plan was to include this bit as a part of the following article, yet for the sake of keeping a sane length of the current one, I’ll postpone it until the next one.

I am leaving a smaller teaser. The next article will explain why the below…

012 # is sometimes an integer, and sometimes not
! 12 # why this is always a string
# why this YAML doc might be sometimes valid, and sometimes not

We’ll learn how to leverage merge keys to keep things DRY and finally… how to make this code work:

import chai from 'chai';
import yaml from 'js-yaml';

import schema from './schema.mjs';

const { expect } = chai;

const document = `
- !js/eval Math.PI
- !js/eval Math.abs(Math.cbrt(125) - Math.sqrt(25) + -5) # you should probably just provide the result rather than evaluating the whole expr, but this is for the sake of showing it works.
- !int64 1099511627778 # signed 64bit int
- !int64 9223372036854775808 # overflow
- !js/bigint 20381928192182918291

expect(yaml.load(document, { schema })).to.deep.equal({
  numbers: [Math.PI, 5, 1099511627778n, -9223372036854775808n, 20381928192182918291n],

Closing Thoughts

  • With few exceptions JSON can be converted to YAML, while YAML to JSON not necessarily, so these two formats are not always interchangeable. This means you should use a JSON parser for processing JSON input, and a YAML counterpart for YAML,
  • YAML is excellent for configuration files and alike thanks to anchors and aliases that prevent you from having sparse chunks of text, as well as comments which allow you to clearly explain the logic behind a given setup, etc.,
  • Tooling provisioning support for both JSON and YAML will most likely expect JSON-ish usage of YAML (string as keys of mappings’ pairs, using the data types JSON offers supports, etc.),
  • YAML usually offers a few ways (notations) to accomplish the same result, while JSON very rarely does so.
  • YAML is presumably closer to TOML (which, by the way, is pretty popular in the Rust world) than to JSON

Thanks for reading the article! If you enjoyed it please subscribe to 11Sigma LinkedIn to get notified about new content 🙂

See you later, happy YAMLing!

Photo by Aral Tasher on Unsplash