Processing tokens

Let's modify the while loop from the previous chapter to:


while (currentPosition < input.length) {
  const currentToken = input[currentPosition]

  // Our language doesn't care about whitespace.
  if (currentToken === ' ') {
    currentPosition++
    continue
  }

  let didMatch: boolean = false

  for (const { key, value } of tokenStringMap) {
    if (!lookaheadString(key)) {
      continue
    }

    out.push(value)
    currentPosition += key.length
    didMatch = true
  }

  if (didMatch) continue
}

Our tokeniser now processes approx half of our defined language now!

Lets step through what we’re doing here.


const currentToken = input[currentPosition]

if (currentToken === ' ') {
  currentPosition++
  continue
}

Here’s our first real check.

As our language semantics don’t change on whitespace (unlike, for example, Python), we can completely ignore it if it’s the current token.

Whitespace is still significant if we’re doing things like lookaheads - the space might be part of a string - so we only skip it if the current token is a string.

You might notice that we're assigning the current input character to currentToken, this is just for convenience.


let didMatch: boolean = false

for (const { key, value } of tokenStringMap) {
  if (!lookaheadString(key)) {
    continue
  }

  out.push(value)
  currentPosition += key.length
  didMatch = true
}

if (didMatch) continue

Here we iterate over tokenStringMap and try look for a match.

If we find a match, we add the equivalent token type to our return array (out), increment currentPosition by the number of characters that we "consumed", and then we continue.

Building a language in Typescript

Processing tokens