Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Lexical Analysis: Hard Parts

Name: Anonymous 2013-07-15 16:58

How does one do lexical analysis (tokenization) of Bash/Perl style string, where insertion in the middle is possible?
Say we have
print "Now is {get "time of day"} of {get "current date"}..."
print 'have a nice day!'

And {…} inserts value in the middle of a string.

How Lexer would know which double-quote closes the string and which is part of the string?

Python uses format-like routines, because parsing such strings is hard.

Name: Anonymous 2013-07-15 19:08

>>2
Ugly. My current code just recursively calls parser from inside of a lexer.
(to /string r incut end
  ! l = nil
  ! while t
  ! ! c = $ r peek
  ! ! unless eq c incut :> $ r next
  ! ! cond
  ! ! ! eq c #\\
  ! ! ! ! setf c ($ r next)
  ! ! ! ! cond
  ! ! ! ! ! eq c #\n :> push #\newline l
  ! ! ! ! ! eq c #\t :> push #\tab l
  ! ! ! ! ! eq c #\\ :> push #\\ l
  ! ! ! ! ! or (eq c #\n) (eq c incut) (eq c end) :> push c l
  ! ! ! ! ! eq c nil :> $ r error "EOF in string"
  ! ! ! ! ! or t :> $ r error "Invalid escape code: {c}"
  ! ! ! eq c end :> return-from /string (list (coerce (reverse l) 'string))
  ! ! ! eq c incut ;interpolate
  ! ! ! ! l = coerce (reverse l) 'string
  ! ! ! ! m = cdr (/token r)
  ! ! ! ! e = /string r incut end
  ! ! ! ! return-from /string (interp l m e)
  ! ! ! eq c nil :> $ r error "EOF in string"
  ! ! ! or t :> push c l)

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List