Parsing

Now that we have run a lexer on our source code, it's time to assign meaning, and to further understand how elements in our program interact with each other.

This step should return what we call an AST - an abstract synax tree.

An AST is a tree-like representation of our source code, linking relevant tokens together in a way that makes sense specific to our language.

[
  { type: 'LineBreak' },
  { type: 'VariableDeclaration' },
  { type: 'Literal', value: 'hello' },
  { type: 'AssignmentOperator' },
  { type: 'String', value: 'world' },
  { type: 'LineBreak' },
  { type: 'Log' },
  { type: 'Literal', value: 'hello' },
  { type: 'LineBreak' }
]

Taking our output from our lexer example, we can infer that there's probably 2 different things going on here.

The first one being that we're creating a variable, and then assigning it a value.

  { type: 'VariableDeclaration' },
  { type: 'Literal', value: 'hello' },
  { type: 'AssignmentOperator' },
  { type: 'String', value: 'world' },

You can see we have our VariableDeclaration, then a name of our variable, then finally the actual value we wish to assign to it.

The second action that our code seems to be taking is that we're logging a value.

  { type: 'Log' },
  { type: 'Literal', value: 'hello' },

Can you see how we can semantically recognise how these tokens should be grouped together? This is what our parser is going to.

Now, back to our AST.

Taking the above 2 actions, and giving them the necessary context they require to operate, we may end up with an AST that looks similar to this:

{
  type: 'Program',
  children: [
    {
      type: 'Assignment',
      name: 'hello',
      value: {
        type: 'String',
        value: 'world'
      }
    },
    {
      type: 'Log',
      children: [
        {
          type: 'Literal',
          value: 'hello'
        }
      ]
    }
  ]
}

We end up with 3 nodes.

  1. A root level Program node (remember, our AST is a tree, so it needs something at the very top)

And then as children of that node,

  1. An Assignment node
  2. A Log node

All 3 of these AST entries have context as well, they either have a list of children, or other relevant contextual information (like the required name, and the value for our assignment).