Discussion forum for David Beazley

Conditional lexing and start conditions in SLY?


#1

I am not sure if this is the right place to ask, but is there conditional lexing in SLY? Currently I am using this quite a lot in PLY.

Many thanks!
Pascal


#2

Ah yes. This is something that I’ve been thinking about in SLY, but the exact API hasn’t been fully determined yet. One way to do it is to force a lexer state change by raising an exception. The attached code sample shows how this works.

import sly

class CalcLexer(sly.Lexer):
    tokens = { NUMBER, PLUS, MINUS, TIMES, DIVIDE, LBRACE }
    ignore = ' \t\n'

    NUMBER = r'\d+'
    PLUS = r'\+'
    TIMES = r'\*'
    MINUS = r'-'
    DIVIDE = r'/'
    LBRACE = r'\{'

    def LBRACE(self, t):
        raise sly.LexerStateChange(BlockLexer, t)

class BlockLexer(sly.Lexer):
    tokens = { RBRACE, NAME, VALUE }
    ignore = ' \t\n'

    NAME = r'[a-zA-Z_][a-zA-Z0-9_]+'
    VALUE = r'\d+'
    RBRACE = r'\}'

    def RBRACE(self, t):
        raise sly.LexerStateChange(CalcLexer, t)


if __name__ == '__main__':
    lexer = CalcLexer()
    for tok in lexer.tokenize('3 + 4 { foo bar 1234 } * 6'):
        print(tok)

The notion of “inclusive” and “exclusive” states really hasn’t been formalized here. However, I could imagine adding it via some kind of inheritance mechanism. This message is probably a good excuse to go look at this further.


#3

Many thanks for your quick reply.

And in particular for implementing this via the begin(cls), push_state(cls), and pop_state(cls) methods in v 0.3 already!

Pascal


#4

I’ve been meaning to add this feature for awhile, but first had to resolve the overall mechanism by which I was going to make “inclusive states” work. SLY does it via inheritance, but there are some tricky facets to it. The current version should be a start in that direction so I’ll be curious to hear how it works.


#5

Dear all,

I tried this code and I am getting an exception, the one that we are rising:

Traceback (most recent call last):
File “prueba.py”, line 31, in
for tok in lexer.tokenize(‘3 + 4 { foo bar 1234 } * 6’):
File “/home/domingo/.local/lib/python3.7/site-packages/sly/lex.py”, line 400, in tokenize
tok = _token_funcs[tok.type](self, tok)
File “prueba.py”, line 15, in LBRACE
raise sly.LexerStateChange(BlockLexer, t)
sly.lex.LexerStateChange: (<class ‘main.BlockLexer’>, Token(type=‘LBRACE’, value=’{’, lineno=1, index=6))

The problem is that the lexer stop, so I don’t get the full stream of tokens.
I am using sly 0.3 in Debian buster.


#6

I wouldn’t even try this unless working off the latest SLY code in GitHub. If it’s some kind of bug in SLY though, submit it as an issue with sample code that illustrates the bug.