How does one do lexical analysis (tokenization) of Bash/Perl style string, where insertion in the middle is possible?
Say we have print "Now is {get "time of day"} of {get "current date"}..."
print 'have a nice day!'
And {…} inserts value in the middle of a string.
How Lexer would know which double-quote closes the string and which is part of the string?
Python uses format-like routines, because parsing such strings is hard.
Name:
Anonymous2013-07-15 18:45
Basically you would need to set flags, just as you would if you were parsing it character for character.
So how you parse the double quote depends on what mode you're in.
Name:
Anonymous2013-07-15 19:08
>>2
Ugly. My current code just recursively calls parser from inside of a lexer. (to /string r incut end
! l = nil
! while t
! ! c = $ r peek
! ! unless eq c incut :> $ r next
! ! cond
! ! ! eq c #\\
! ! ! ! setf c ($ r next)
! ! ! ! cond
! ! ! ! ! eq c #\n :> push #\newline l
! ! ! ! ! eq c #\t :> push #\tab l
! ! ! ! ! eq c #\\ :> push #\\ l
! ! ! ! ! or (eq c #\n) (eq c incut) (eq c end) :> push c l
! ! ! ! ! eq c nil :> $ r error "EOF in string"
! ! ! ! ! or t :> $ r error "Invalid escape code: {c}"
! ! ! eq c end :> return-from /string (list (coerce (reverse l) 'string))
! ! ! eq c incut ;interpolate
! ! ! ! l = coerce (reverse l) 'string
! ! ! ! m = cdr (/token r)
! ! ! ! e = /string r incut end
! ! ! ! return-from /string (interp l m e)
! ! ! eq c nil :> $ r error "EOF in string"
! ! ! or t :> push c l)
Name:
Anonymous2013-07-15 19:14
>>3
Are you implying that's not ugly? It looks like a corrupted Common Lisp file. Rewrite it in Lisp so I can actually read it.
HMMM LET'S SEE WE CREATE A STATE MACHINE THAT GOES THROUGH EACH CHARACTER IN THE FUQIN STRING AND WHEN WE ENCOUNTER A FUQIN QUOTE, WE INCREMENT A VARIABLE CALLED QUOTE_COUNT OR SOME SHIT LIKE THAT. AND...