I'm in lexer hell
Published 2008-10-07 @ 20:13
Tagged thoughts
% ruby -e ‘p %=abc=’ -e:1: syntax error, unexpected $end % ruby -e ‘x = %=abc=; p x’ “abc”
There are soooo many stupid little edge cases in the ruby language that it is nearly impossible to write a parser for it that isn’t completely convoluted.
I’ve always wondered how much ruby we’d have left if we just cut out the weird and/or overly complicated stuff.
Possible things to remove:
- Nested interpolation.
"blah#{"blah#{"blah"}blah"}blah"
- Emacs keybinding escapes:
"\C-\M-a" # => "\201"
- 1400 extra
%
thingies:%s(blah) #=> :blah
There are 28 cases of my lexer checking lex_state and 63 cases where the lexer is setting lex_state. That by itself is absolutely fine. It is the additional 17 cases where the parser TELLS the lexer what the lexer state is that absolutely terrifies me.