Discussion forum for David Beazley

Help required concerning abiguous grammar

I’m writing an emulator for an old computer. It used a BASIC dialect, which is straight forward to implement, with one exception:

Variable names can consist of 1 or 2 letters (second character can also be a digit). Now, there’s a command, which uses single letters to make parameters in a comma-separated list more human-readable; like this

DLOAD <filename>, D<ln>, U<un>

The letters ‘D’ and ‘U’ must literally preceed the parameters and (which are integers), with or without space between the letter and the parameter.

Now I can’t find a way to realize that with PLY, as e.g. ‘D8’ is also a valid variable name.

Is there a way, to successfully parse such a grammar with PLY?

Thanks for any help!
petro4213

You’re probably going to have to do something clever somewhere in the system to disambiguate variable names from the D and U parts. One possibility is to use lexer states. For example, upon seeing the ‘DLOAD’ token, the lexer could switch states to parse the other parts of it in a different way. Another possibility might be some kind of preprocessing step involving a regex or some similar tool. Example, perhaps DLOAD commands could be isolated and rewritten/normalized in some way to have a more easily parsed structure (e.g., get rid of the D/U entirely, replace the D/U with some other kind of token in its place, etc.). You might also be able to write parser rules that accept a variety of possible inputs and attempt to disambiguate it there by looking at the actual token values parsed and seeing if they make sense. For example:

 def p_dln(p):
       '''
       dln : ID INT
             | ID
       '''
       if len(p) == 2 and p[1] == 'D':
             p[0] = p[2]
       elif len(p) == 1 and p[1][0] == 'D' and p[1][1:].isdigit():
             p[0] = int(p[1][1:])
       else:
             print('Syntax error')

That might need to be fleshed out a bit.

Personally, I might be inclined to try some kind of preprocessor approach–normalize the input first and rewrite it a bit to make it easier to parse.

Thanks Dave!
I personally like the idea of switching state a lot. It looks like the cleanest solution to me. I’ll check the docs for help on that topic and come back, if I need more assistance… :wink: