Processing tokens
Let's modify the while
loop from the previous chapter to:
while (currentPosition < input.length) {
const currentToken = input[currentPosition]
// Our language doesn't care about whitespace.
if (currentToken === ' ') {
currentPosition++
continue
}
let didMatch: boolean = false
for (const { key, value } of tokenStringMap) {
if (!lookaheadString(key)) {
continue
}
out.push(value)
currentPosition += key.length
didMatch = true
}
if (didMatch) continue
}
Our tokeniser now processes approx half of our defined language now!
Lets step through what we’re doing here.
const currentToken = input[currentPosition]
if (currentToken === ' ') {
currentPosition++
continue
}
Here’s our first real check.
As our language semantics don’t change on whitespace (unlike, for example, Python), we can completely ignore it if it’s the current token.
Whitespace is still significant if we’re doing things like lookaheads - the space might be part of a string - so we only skip it if the current token is a string.
You might notice that we're assigning the current input character to currentToken
, this is just for convenience.
let didMatch: boolean = false
for (const { key, value } of tokenStringMap) {
if (!lookaheadString(key)) {
continue
}
out.push(value)
currentPosition += key.length
didMatch = true
}
if (didMatch) continue
Here we iterate over tokenStringMap
and try look for a match.
If we find a match, we add the equivalent token type to our return array (out
), increment currentPosition
by the number of characters that we "consumed", and then we continue.