Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon. Entire thread

Parsec (parser combinator)

Name: Anonymous 2008-01-23 19:31

I think this is quite cool. (Parsing an IRC 'NICK foo' message)

special = oneOf "-[]\\`^{}_" -- Taken from RFC1459, except '_',
                             -- which I added (generally accepted character (are there others?)).

nick = do { string "NICK"; space;
            first <- letter; rest <- many1 $ alphaNum <|> special; eof;
            return $ Nick (first:rest) }


Not quite sure what I think of it in comparison to regular expressions, yet.

Name: Anonymous 2008-01-23 19:35

RFC1459 defines a nick to be the following, by the way:


<nick> ::=
    <letter> { <letter> | <number> | <special> }

<letter> ::=
    'a' ... 'z' | 'A' ... 'Z'
<number> ::=
    '0' ... '9'
<special> ::=
    '-' | '[' | ']' | '\' | '`' | '^' | '{' | '}'


I know clients and servers don't strictly adhere to the RFCs, but I think they are a good starting point.

Name: Anonymous 2008-01-23 19:37

What.

Name: Anonymous 2008-01-23 19:56

Why bother? Just accept any non-whitespace character

Name: Anonymous 2008-01-23 19:57

>>1
It seems much more readable than regexps.

Also, http://www.parsec.org/

Name: Anonymous 2008-01-23 19:58

>>4
You don't want to accept @, # or + as they have special meaning.

Name: Anonymous 2008-01-23 20:03

>>6
Or ':' and so forth which would fuck up any IRC messages that used that nickname.

Name: Anonymous 2008-01-23 20:03

Why bother? Just accept any non-whitespace character and anything that appears in 005 PREFIX

Same thing

Name: Anonymous 2008-01-23 20:05

I think it is better to clearly state which characters are allowed.

Name: Anonymous 2008-01-23 20:06

I got a bit excited until I realised that it was Haskell, which is dead. So I mourned for a while instead.

Name: Anonymous 2008-01-23 20:07

I don't see why, really. It's only important when you're parsing something like the NAMES reply.

Name: Anonymous 2008-01-23 20:10

>>11
If you're writing an IRC server

Name: Anonymous 2008-01-23 20:14

OP here; yes, I am writing an IRC server.

Name: Anonymous 2008-01-23 20:15

Oh, my bad then.

Name: Anonymous 2008-01-23 20:18

Question: are characters that go out of a-zA-Z0-9 typically allowed in nicknames on today's IRC servers? Like ø,æ,ß,ł,ħ,æ etc.? I don't think they are, but I will have to research that.

Name: Anonymous 2008-01-23 20:18

>>1
/^NICK ([a-zA-Z][-\w`{}[\]\\\^]*)$/
much more readable, and works in lots more languages.

Name: Anonymous 2008-01-23 20:19

Name: Anonymous 2008-01-23 20:20

>>16
That's what I thought. I assume Haskell has regex support, but I figured I'd give this parser combinator idea a go and see what all the fuss was about.

Name: Anonymous 2008-01-23 20:23

Name: Anonymous 2008-01-23 20:23

>>18
Parsec has no advantage in such simple cases, but when parsing something substantial (like source code) it is pretty nice.

Name: Anonymous 2008-01-23 20:25

>>17
I am not interested in using Perl. I think it is a horrible language. I am writing this mostly as a way to try my hand at concurrency in Haskell.

But thanks you for the link. I may peep at it to compare my code, as I will be doing with Freenode's Hyperion code (written in C--or C++? I will see).

Name: Anonymous 2008-01-23 20:27

>>20
Ah, well, that makes sense. I may just use regexes in this case. IRC is pretty trivial to parse, especially from the server end.

Name: Anonymous 2008-01-23 20:27

>>15

I know UnrealIRCd supports additionals charsets. It's probably a bit tricky to do right, because not all clients use the same character encoding (Shift_JIS, GBK, W1251/W1251 and whathaveyou).

Name: Anonymous 2008-01-23 20:37

>>23
That does sound tricky. I won't implement any additional support initially. Having a look at UnrealIRCd's source code might prove helpful when I do, however.

Name: Anonymous 2008-01-24 0:31

>>23-24
Plus it's not within irc spec. Some clients will fuck up royally when they encounter stuff that's out of the norm.

Xchat's nick list goes slightly haywire when you have someone named '\' and someone named '|' in the same channel and one of them leaves. (or at least it used to last year-ish, can't be bothered checking it now)

Name: Anonymous 2008-01-24 5:13

Some clients appear to support UTF-8 encoding, at least in the chat.

Name: Anonymous 2008-01-24 5:13

>>26
I mean, in PRIVMSG

Name: Anonymous 2008-01-24 5:21

>>15
Anything over 32 that's not an already reserved char like @, #, :, etc. should be OK, this includes the high range (128+)

Name: Anonymous 2008-01-24 6:18

>>26-27
Indeed. In messages. mIRC sort of supports it, last I checked. And XChat seems to support it pretty well. As does Irssi. (Haven't tried other clients.)

>>28
I thought this, but you might potentially fuck up IRC clients that follow the spec strictly. So I think the best course of action is to follow the spec initially, and then investigate extensions later on (after I have a stable IRC server on the go). What do you think?

Name: Anonymous 2008-01-24 7:07

Do notation considered harmful.

nick = (Nick .) . (:) <$ string "NICK" <* space <*> letter <*> many1 (alphaNum <|> special)

Name: Anonymous 2008-01-24 7:10

>>29
You may be able to discern the type of client from an immediate CTCP VERSION just after the client has sent its USER and NICK, and then enable/disable extensions accordingly.

Name: Anonymous 2008-01-24 7:38

i thought shortly about making my own ircd in c but then i checked out the hybrid code i was already using and decided to just modify it to my needs instead

Name: Anonymous 2008-01-24 9:00

>>30
You missed 'eof', but I will forgive you. This notation looks nice.

Name: Anonymous 2008-01-24 10:03

Name: Anonymous 2009-03-06 8:57


The practical implications Please.

Name: Anonymous 2010-12-25 17:32

Name: Anonymous 2013-01-19 0:17

/prog/ will be spammed continuously until further notice. we apologize for any inconvenience this may cause.

Newer Posts
Don't change these.
Name: Email:
Entire Thread Thread List