/prog/ - Lexical Analysis: Hard Parts

Name: Anonymous 2013-07-15 16:58

How does one do lexical analysis (tokenization) of Bash/Perl style string, where insertion in the middle is possible?
Say we have
print "Now is {get "time of day"} of {get "current date"}..." print 'have a nice day!'
And {…} inserts value in the middle of a string.

How Lexer would know which double-quote closes the string and which is part of the string?

Python uses format-like routines, because parsing such strings is hard.

Name: Anonymous 2013-07-15 19:08

>>2
Ugly. My current code just recursively calls parser from inside of a lexer.
(to /string r incut end ! l = nil ! while t ! ! c = $ r peek ! ! unless eq c incut :> $ r next ! ! cond ! ! ! eq c #\\ ! ! ! ! setf c ($ r next) ! ! ! ! cond ! ! ! ! ! eq c #\n :> push #\newline l ! ! ! ! ! eq c #\t :> push #\tab l ! ! ! ! ! eq c #\\ :> push #\\ l ! ! ! ! ! or (eq c #\n) (eq c incut) (eq c end) :> push c l ! ! ! ! ! eq c nil :> $ r error "EOF in string" ! ! ! ! ! or t :> $ r error "Invalid escape code: {c}" ! ! ! eq c end :> return-from /string (list (coerce (reverse l) 'string)) ! ! ! eq c incut ;interpolate ! ! ! ! l = coerce (reverse l) 'string ! ! ! ! m = cdr (/token r) ! ! ! ! e = /string r incut end ! ! ! ! return-from /string (interp l m e) ! ! ! eq c nil :> $ r error "EOF in string" ! ! ! or t :> push c l)

Lexical Analysis: Hard Parts

1 Name: Anonymous 2013-07-15 16:58

3 Name: Anonymous 2013-07-15 19:08

Name: Anonymous 2013-07-15 16:58

Name: Anonymous 2013-07-15 19:08