/prog/ - Lexical Analysis: Hard Parts

Name: Anonymous 2013-07-15 16:58

How does one do lexical analysis (tokenization) of Bash/Perl style string, where insertion in the middle is possible?
Say we have
print "Now is {get "time of day"} of {get "current date"}..." print 'have a nice day!'
And {…} inserts value in the middle of a string.

How Lexer would know which double-quote closes the string and which is part of the string?

Python uses format-like routines, because parsing such strings is hard.

Name: Anonymous 2013-07-15 18:45

Basically you would need to set flags, just as you would if you were parsing it character for character.

In C, it would look something like:

int in_string;

int in_string_code;

In flex, it would look something like this:

%x IN_STRING

%x IN_STRING_CODE



%% 



...

<INITIAL>\" {

    BEGIN(IN_STRING);

    ...

}

<IN_STRING>\" {

    BEGIN(INITIAL);

    ...

}

<IN_STRING>\{ {

    BEGIN(IN_STRING_CODE);

    ...

}

<IN_STRING_CODE> \} {

    BEGIN(IN_STRING);

    ...

}

<IN_STRING_CODE>\" {

    ...

}

...

So how you parse the double quote depends on what mode you're in.

Name: Anonymous 2013-07-15 19:08

>>2
Ugly. My current code just recursively calls parser from inside of a lexer.
(to /string r incut end ! l = nil ! while t ! ! c = $ r peek ! ! unless eq c incut :> $ r next ! ! cond ! ! ! eq c #\\ ! ! ! ! setf c ($ r next) ! ! ! ! cond ! ! ! ! ! eq c #\n :> push #\newline l ! ! ! ! ! eq c #\t :> push #\tab l ! ! ! ! ! eq c #\\ :> push #\\ l ! ! ! ! ! or (eq c #\n) (eq c incut) (eq c end) :> push c l ! ! ! ! ! eq c nil :> $ r error "EOF in string" ! ! ! ! ! or t :> $ r error "Invalid escape code: {c}" ! ! ! eq c end :> return-from /string (list (coerce (reverse l) 'string)) ! ! ! eq c incut ;interpolate ! ! ! ! l = coerce (reverse l) 'string ! ! ! ! m = cdr (/token r) ! ! ! ! e = /string r incut end ! ! ! ! return-from /string (interp l m e) ! ! ! eq c nil :> $ r error "EOF in string" ! ! ! or t :> push c l)

Name: Anonymous 2013-07-15 19:14

>>3
Are you implying that's not ugly? It looks like a corrupted Common Lisp file. Rewrite it in Lisp so I can actually read it.

Also
! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

Name: Anonymous 2013-07-15 19:15

ugh, perl is ugly

Name: Anonymous 2013-07-15 19:19

>>4
Just get used to it, cuz Symta is going to hit mainstream!

Name: Anonymous 2013-07-15 19:20

>>4
Also
Yeah. I too think that indentation could be expressed as a wave-function.

Name: Anonymous 2013-07-15 19:24

>>3
What's that shit? Gopnik coding?

Name: Anonymous 2013-07-15 20:11

HMMM LET'S SEE WE CREATE A STATE MACHINE THAT GOES THROUGH EACH CHARACTER IN THE FUQIN STRING AND WHEN WE ENCOUNTER A FUQIN QUOTE, WE INCREMENT A VARIABLE CALLED QUOTE_COUNT OR SOME SHIT LIKE THAT. AND...

WAIT, I DON'T FUCKING KNOW WHAT I'M DOING
FUCK.

Name: Anonymous 2013-07-15 20:16

No, that is a good way to do it >>9-sama.

Name: Anonymous 2013-07-15 20:58

>>9
You don't understand the question.

Name: Anonymous 2013-07-15 21:12

Use Perl 6 grammar.

Name: Anonymous 2013-07-15 22:25

ok here Is what to does

loop from begin of str til u hit quote
when u hit quote , stop loop, then 2nd loop from end of str BACKWARD til u hit END quote.

recurse for the winnings

Name: Anonymous 2013-07-15 22:26

>>9
>WE INCREMENT A VARIABLE CALLED QUOTE
what if there are uneven number of quotes?

Name: Anonymous 2013-07-15 22:27

>>14
That would be a syntax error.

Name: Anonymous 2013-07-15 22:28

>>13
loop from end of str BACKWARD
see http://en.wikipedia.org/wiki/Look-ahead_%28backtracking%29

Name: Anonymous 2013-07-15 22:37

>>15
But it isnt! For example,
http://perl.about.com/od/perltutorials/qt/perlheredoc.htm
can contain arbitrary characters, which is very useful for rapid prototyping.

Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-07-16 6:19

>>17
Obviously he intended that to be applied only in the particular case...

Lexical Analysis: Hard Parts

1 Name: Anonymous 2013-07-15 16:58

2 Name: Anonymous 2013-07-15 18:45

3 Name: Anonymous 2013-07-15 19:08

4 Name: Anonymous 2013-07-15 19:14

5 Name: Anonymous 2013-07-15 19:15

6 Name: Anonymous 2013-07-15 19:19

7 Name: Anonymous 2013-07-15 19:20

8 Name: Anonymous 2013-07-15 19:24

9 Name: Anonymous 2013-07-15 20:11

10 Name: Anonymous 2013-07-15 20:16

11 Name: Anonymous 2013-07-15 20:58

12 Name: Anonymous 2013-07-15 21:12

13 Name: Anonymous 2013-07-15 22:25

14 Name: Anonymous 2013-07-15 22:26

15 Name: Anonymous 2013-07-15 22:27

16 Name: Anonymous 2013-07-15 22:28

17 Name: Anonymous 2013-07-15 22:37

18 Name: Cudder !MhMRSATORI!fR8duoqGZdD/iE5 2013-07-16 6:19