Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

Lexical Analysis: Hard Parts

Name: Anonymous 2013-07-15 16:58

How does one do lexical analysis (tokenization) of Bash/Perl style string, where insertion in the middle is possible?
Say we have
print "Now is {get "time of day"} of {get "current date"}..."
print 'have a nice day!'

And {…} inserts value in the middle of a string.

How Lexer would know which double-quote closes the string and which is part of the string?

Python uses format-like routines, because parsing such strings is hard.

Name: Anonymous 2013-07-15 18:45

Basically you would need to set flags, just as you would if you were parsing it character for character.

In C, it would look something like:

int in_string;
int in_string_code;


In flex, it would look something like this:

%x IN_STRING
%x IN_STRING_CODE

%%

...
<INITIAL>\" {
    BEGIN(IN_STRING);
    ...
}
<IN_STRING>\" {
    BEGIN(INITIAL);
    ...
}
<IN_STRING>\{ {
    BEGIN(IN_STRING_CODE);
    ...
}
<IN_STRING_CODE> \} {
    BEGIN(IN_STRING);
    ...
}
<IN_STRING_CODE>\" {
    ...
}
...


So how you parse the double quote depends on what mode you're in.

Name: Anonymous 2013-07-15 19:08

>>2
Ugly. My current code just recursively calls parser from inside of a lexer.
(to /string r incut end
  ! l = nil
  ! while t
  ! ! c = $ r peek
  ! ! unless eq c incut :> $ r next
  ! ! cond
  ! ! ! eq c #\\
  ! ! ! ! setf c ($ r next)
  ! ! ! ! cond
  ! ! ! ! ! eq c #\n :> push #\newline l
  ! ! ! ! ! eq c #\t :> push #\tab l
  ! ! ! ! ! eq c #\\ :> push #\\ l
  ! ! ! ! ! or (eq c #\n) (eq c incut) (eq c end) :> push c l
  ! ! ! ! ! eq c nil :> $ r error "EOF in string"
  ! ! ! ! ! or t :> $ r error "Invalid escape code: {c}"
  ! ! ! eq c end :> return-from /string (list (coerce (reverse l) 'string))
  ! ! ! eq c incut ;interpolate
  ! ! ! ! l = coerce (reverse l) 'string
  ! ! ! ! m = cdr (/token r)
  ! ! ! ! e = /string r incut end
  ! ! ! ! return-from /string (interp l m e)
  ! ! ! eq c nil :> $ r error "EOF in string"
  ! ! ! or t :> push c l)

Name: Anonymous 2013-07-15 19:14

>>3
Are you implying that's not ugly? It looks like a corrupted Common Lisp file. Rewrite it in Lisp so I can actually read it.

Also
  !
  !
  ! !
  ! !
  ! !
  ! ! !
  ! ! ! !
  ! ! ! !
  ! ! ! ! !
  ! ! ! ! !
  ! ! ! ! !
  ! ! ! ! !
  ! ! ! ! !
  ! ! ! ! !
  ! ! !
  ! ! !
  ! ! ! !
  ! ! ! !
  ! ! ! !
  ! ! ! !
  ! ! !
  ! ! !

Name: Anonymous 2013-07-15 19:15

ugh, perl is ugly

Name: Anonymous 2013-07-15 19:19

>>4
Just get used to it, cuz Symta is going to hit mainstream!

Name: Anonymous 2013-07-15 19:20

>>4
Also
Yeah. I too think that indentation could be expressed as a wave-function.

Name: Anonymous 2013-07-15 19:24

>>3
What's that shit? Gopnik coding?

Name: Anonymous 2013-07-15 20:11

HMMM LET'S SEE WE CREATE A STATE MACHINE THAT GOES THROUGH EACH CHARACTER IN THE FUQIN STRING AND WHEN WE ENCOUNTER A FUQIN QUOTE, WE INCREMENT A VARIABLE CALLED QUOTE_COUNT OR SOME SHIT LIKE THAT.  AND...

WAIT, I DON'T FUCKING KNOW WHAT I'M DOING
FUCK.

Name: Anonymous 2013-07-15 20:16

No, that is a good way to do it >>9-sama.

Name: Anonymous 2013-07-15 20:58

>>9
You don't understand the question.

Name: Anonymous 2013-07-15 21:12

Use Perl 6 grammar.

Name: Anonymous 2013-07-15 22:25

ok here Is what to does

loop from begin of str til u hit quote
when u hit quote , stop loop, then 2nd loop from end of str BACKWARD til u hit END quote.

recurse for the winnings

Name: Anonymous 2013-07-15 22:26

>>9
>WE INCREMENT A VARIABLE CALLED QUOTE
what if there are uneven number of quotes?

Name: Anonymous 2013-07-15 22:27

>>14
That would be a syntax error.

Name: Anonymous 2013-07-15 22:28

Name: Anonymous 2013-07-15 22:37

>>15
But it isnt! For example,
http://perl.about.com/od/perltutorials/qt/perlheredoc.htm
can contain arbitrary characters, which is very useful for rapid prototyping.

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-07-16 6:19

>>17
Obviously he intended that to be applied only in the particular case...

Don't change these.
Name: Email:
Entire Thread Thread List