YAMLish formal syntax
From Test Anything Protocol
[edit] Notes
Within this grammar certain terms are defined as C style strings using the familiar backslash escape notation. Character classes are represented as regular expressions.
I'm sure this isn't strict BNF but hopefully it's unambiguous.
This specification defined the language a YAMLish consumer must be able to parse. YAMLish producers are, of course, free to confine themselves to an even smaller subset of YAML.
Where the grammar shows a single character of non-newline whitespace it is acceptable for the parser to swallow multiple whitespace characters.
I haven't represented the (important) concept of indentation in this grammar because I can't think of a clean way to express it. For now the indentation requirements are described in plain-old-english. If you can improve on this please feel free.
[edit] Syntax
yamlish ::= plain_header body? footer
| inline_header footer
plain_header ::= "---\n"
inline_header ::= "--- " scalar "\n"
footer ::= "...\n"
body ::= hash | array
scalar ::= bare_scalar
| quoted_scalar
| double_quoted_scalar
| wrapped_scaler
| block_scalar
| undefined_scalar
bare_scalar ::= /[^"'>|\s]/ /[^"'\n]*/
quoted_scalar ::= "'" ([^'\n]|)* "'"
double_quoted_scalar ::= "\"" /(\\([\\aefnrtvz]|[0-9a-fA-F]{2}|[^\\"\n])*/ "\""
wrapped_scalar ::= ">\n" block_scalar_body
block_scalar ::= "|\n" block_scalar_body
undefined_scalar ::= "~"
block_scalar_body ::= block_scalar_line
| block_scalar_body "\n" block_scalar_line
block_scalar_line ::= /^\s+(.+?)\s*$/
hash ::= hash_element
| hash "\n" hash_element
hash_element ::= hash_key ": " value
hash_key ::= double_quoted_scalar
| bare_hash_key
bare_hash_key ::= /\w[^"':\s]*/
array ::= array_element
| array array_element
array_element ::= "- " value
value ::= scalar
| hash
| array
ugg boots classic tall so good
[edit] Indentation
The above grammar doesn't capture the notion that nested values must be indented at least one space from their parent. Until I work out how to cleanly represent that requirement here are some examples that should make it clear:
A simple, unquoted scalar
--- Simple scalar ...
A multiline scalar
--- >
All this text ends up wrapped
into a single line
A block scalar
--- |
Newlines are preserved
in this text
An array with another array nested within it
---
- 1
- 2
- 2.1
- 2.2
- 3
...
A hash containing another hash
---
foo: 'The footure is bright'
bar:
say: 'A beer please'
expect: beer
...
Structures may nest arbitrarily
---
-
name: 'Hash one'
value: 1
-
name: 'Hash two'
value: 2
...

