YAMLish formal syntax

From Test Anything Protocol

Jump to: navigation, search

YAMLish is a subset of YAML.

[edit] Notes

Within this grammar certain terms are defined as C style strings using the familiar backslash escape notation. Character classes are represented as regular expressions.

I'm sure this isn't strict BNF but hopefully it's unambiguous.

This specification defined the language a YAMLish consumer must be able to parse. YAMLish producers are, of course, free to confine themselves to an even smaller subset of YAML.

Where the grammar shows a single character of non-newline whitespace it is acceptable for the parser to swallow multiple whitespace characters.

I haven't represented the (important) concept of indentation in this grammar because I can't think of a clean way to express it. For now the indentation requirements are described in plain-old-english. If you can improve on this please feel free.

[edit] Syntax

yamlish                ::= plain_header body? footer
                         | inline_header footer
               
plain_header           ::= "---\n"
               
inline_header          ::= "--- " scalar "\n"
               
footer                 ::= "...\n"
               
body                   ::= hash | array
               
scalar                 ::= bare_scalar
                         | quoted_scalar
                         | double_quoted_scalar 
                         | wrapped_scaler 
                         | block_scalar
                         | undefined_scalar
               
bare_scalar            ::= /[^"'>|\s]/ /[^"'\n]*/
               
quoted_scalar          ::= "'" ([^'\n]|)* "'"

double_quoted_scalar   ::= "\"" /(\\([\\aefnrtvz]|[0-9a-fA-F]{2}|[^\\"\n])*/ "\""
wrapped_scalar         ::= ">\n" block_scalar_body

block_scalar           ::= "|\n" block_scalar_body

undefined_scalar       ::= "~"

block_scalar_body      ::= block_scalar_line 
                         | block_scalar_body "\n" block_scalar_line
                         
block_scalar_line      ::= /^\s+(.+?)\s*$/

hash                   ::= hash_element
                         | hash "\n" hash_element
hash_element           ::= hash_key ": " value

hash_key               ::= double_quoted_scalar
                         | bare_hash_key
bare_hash_key          ::= /\w[^"':\s]*/
                         
array                  ::= array_element
                         | array array_element
                         
array_element          ::= "- " value
value                  ::= scalar
                         | hash
                         | array

ugg boots classic tall so good

[edit] Indentation

The above grammar doesn't capture the notion that nested values must be indented at least one space from their parent. Until I work out how to cleanly represent that requirement here are some examples that should make it clear:

A simple, unquoted scalar

   --- Simple scalar
   ...

A multiline scalar

   --- >
     All this text ends up wrapped
     into a single line
     

A block scalar

   --- |
     Newlines are preserved
     in this text

An array with another array nested within it

   ---
     - 1
     - 2
       - 2.1
       - 2.2
     - 3
   ...

A hash containing another hash

   ---
     foo: 'The footure is bright'
     bar:
        say: 'A beer please'
        expect: beer
   ...
   

Structures may nest arbitrarily

   ---
     -
       name: 'Hash one'
       value: 1
     -
       name: 'Hash two'
       value: 2
   ...