Tokens

For reference, our source:


new hello = 'world'
print hello

Lets break down the above into things.

A thing is any distinct value, property, symbol etc that could change how our program is interpreted.

Things

`new`

This is the start of a variable declaration. Now, remember, in our AST we don't really care about what it's declaring, or what its value is, we only care that it exists in this spot.

`hello`

This is what we will call a Literal. A literal is something that isn’t syntactically special, and may be in our program because it designates something like a variable or a function name.

So why is this not just called Variable?

Looking at the source we can tell that will be a variable name, but when our lexer is doing its thing, it won't know if this is a variable, or a function call, or a file reference, or anything else it could potentially be in our language.

So we come up with a 'higher level' value type that could be any of these.

`=`

We now have an assignment operator. In our language (the same as Javascript), anything on the right side of a = is considered to be a value.

It's entirely possible to remove this if you want, and change our DSL to work with:


new hello 'world'

but for the sake of this book we will keep it in.

`'world'`

This is a string! Fairly simple.

`print`

This is another Literal.

Again, looking at our source we can see that this looks suspiciously like a function call, but our lexer won't know this.

We'll include this in our "standard library" for our langauge, when we get around to parsing our AST and generating code.

`hello`

Surprise, it's another literal. This time we're passing this as an argument, references a variable that we've created previously.

Soo..

Some of these values are static, like our "world" string, some are syntaxically important, like our = assignment character. Some things may eventually control program flow (for example, an if statement), and some may just be Literals.

`LineBreak`

The last token that we’re actually missing in the above are the line breaks in our source.

These may or may not be important depending on the semantics of your language, but they’re useful to keep in your list of tokens nevertheless.

Building a language in Typescript