Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Lexical Analysis: Hard Parts

Name: Anonymous 2013-07-15 16:58

How does one do lexical analysis (tokenization) of Bash/Perl style string, where insertion in the middle is possible?
Say we have
print "Now is {get "time of day"} of {get "current date"}..."
print 'have a nice day!'

And {…} inserts value in the middle of a string.

How Lexer would know which double-quote closes the string and which is part of the string?

Python uses format-like routines, because parsing such strings is hard.

Name: Anonymous 2013-07-15 18:45

Basically you would need to set flags, just as you would if you were parsing it character for character.

In C, it would look something like:

int in_string;
int in_string_code;


In flex, it would look something like this:

%x IN_STRING
%x IN_STRING_CODE

%%

...
<INITIAL>\" {
    BEGIN(IN_STRING);
    ...
}
<IN_STRING>\" {
    BEGIN(INITIAL);
    ...
}
<IN_STRING>\{ {
    BEGIN(IN_STRING_CODE);
    ...
}
<IN_STRING_CODE> \} {
    BEGIN(IN_STRING);
    ...
}
<IN_STRING_CODE>\" {
    ...
}
...


So how you parse the double quote depends on what mode you're in.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List