Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

C, strtok

Name: Anonymous 2010-02-16 13:01

Hello /prog/

Could anyone help me with using the C function strtok and strtok_r?

I've read the man pages and it doesn't provide much help. Try compiling this code example gotten from the man pages:

http://pastebin.com/d11d44ccf

Name: Anonymous 2010-02-16 13:14

strtok is a horribly unsafe function. It uses an internal static variable to keep the state. It would be much safer if it returned a structure to keep the state, or you used an object or (pseudo-)closure (since C doesn't have native support for them, but it's easy to emulate them) to do the same thing. If you still want to use strtok, then doing it that way is okay too. What exactly don't you understand about it?

Name: Anonymous 2010-02-16 13:19

>>2
His code segfaults.

Name: Anonymous 2010-02-16 13:21

IIRC

strtok(STRING, TOKENS);
returns a pointer to first substring in STRING up to a character in TOKENS
strtok(NULL, TOKENS)
returns a pointer to the next substring that contains no characters from TOKENS, keeps the old string internally

if a call to strtok returns NULL then there is no more substrings that contain no characters in TOKENS

strtok modifies the original string by adding \0s so make a copy if you need to keep it.

Name: Anonymous 2010-02-16 13:24

I've never used strtok_r, but it seems from the man page that it uses an additional variable to store the state for the next string instead of modifying the original string.

Name: Anonymous 2010-02-16 13:25

>>4
I just realised how confusing it is that I used TOKENS for the second parameter. Don't think TOKENS, think DELIMITERS.

Name: kinghajj !kiNgHAJjDw 2010-02-16 13:46

strtok() is shit, as >>2 said. Use gettok() instead.


/*******************************************************************************
 * gettok.c -- an improved strtok().                                           *
 * Copyright (c) 2008, Samuel Fredrickson <kinghajj@gmail.com>;                 *
 * All rights reserved.                                                        *
 *                                                                             *
 * Redistribution and use in source and binary forms, with or without          *
 * modification, are permitted provided that the following conditions are met: *
 *     * Redistributions of source code must retain the above copyright        *
 *       notice, this list of conditions and the following disclaimer.         *
 *     * Redistributions in binary form must reproduce the above copyright     *
 *       notice, this list of conditions and the following disclaimer in the   *
 *       documentation and/or other materials provided with the distribution.  *
 *                                                                             *
 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDER ``AS IS'' AND ANY EXPRESS *
 * OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED           *
 * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE      *
 * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER BE LIABLE FOR ANY        *
 * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES  *
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR          *
 * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER  *
 * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT          *
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY   *
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH *
 * DAMAGE.                                                                     *
 ******************************************************************************/

/* strtok() is a neat little function, but it's somewhat ackward to use, and
 * it's not thread-safe. this one is thread-safe, and (I think) makes more
 * sense.
 */

#include <string.h>

#ifdef _GETTOK_DEMO_
#include <stdio.h>
#include <stdlib.h>
#endif

/**
 * @param start  A pointer to a string; keeps track of current position in
 *               original string.
 * @param delims Characters that will deliminate tokens.
 * @return A pointer to the first found token, or NULL if none found.
 *
 * gettok() works by taking in a pointer to a string, searching through that
 * string for deliminators, then updating that string so that on the next call,
 * it will return the next token.
 *
 * Like strtok(), gettok() modifies the original string by inserting NULs where
 * it finds deliminators.
 *
 * @code
 * void foo(char *str)
 * {
 *     char *start = str, *tok;
 *
 *     while((tok = gettok(&start, " ")))
 *         processToken(tok);
 * }
 * @endcode
 */
char *gettok(char **start, char *delims)
{
    char *token = NULL;

    if(start && *start && **start && delims)
        // Find the first occurance of a delimeter.
        if(*start = strpbrk(token = *start, delims))
            // Nullify consecutive delimeters.
            while(**start && strchr(delims, **start))
                *(*start)++ = '\0';

    // if token is NULL, let it pass. if token is not NULL but points to a '\0',
    // then the token hasn't really been found, so recurse to find it.
    return (!token || *token) ? token : gettok(start, delims);
}

#ifdef _GETTOK_DEMO_
int main()
{
    static char s[] = "   ;; This is a;,;  ;string;with; ;delims.";
    static char output[64];
    char *start = s, *tok;

    while((tok = gettok(&start, " ;,"))) {
        printf("tok: \"%s\"\n", tok);
        strcat(output, tok);
        strcat(output, " ");
    }

    printf("Output: %s\n", output);

    return 0;
}
#endif

Name: Anonymous 2010-02-16 13:52

>>5
correction, both modify the original string.

Name: Anonymous 2010-02-16 13:53

>>7
also, "kinghajj", this is not reddit, drop the name

Name: Anonymous 2010-02-16 14:41

>>9
I just wanted to make clear that I was the author, and therefore one should not just take my word that my work (gettok.c) was good; rather, the work should be independently judged, and hopefully others will like it as I do. (gettok() is one of my favorite functions because it's so simple, and recursive at that.)

Name: >>2 2010-02-16 14:53

>>3
Not after I'm done debugging it, and I don't code that much C either. I've also written my own strtok variant which does not have this problem.

Name: Anonymous 2010-02-16 17:33

Here's a tokenize function i wrote years ago. it's not great and not really finished, but works for simple stuff


/*
Tokenization function - Builds an arg[cv] style array from a string.  (C) Anonymous /prog/rider.
*/

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/*
tokp = positions?
tokd = delimiters?
*/

int tok(const char *str, const char *delim, int *tokc, char ***tokv, int **tokp, char ***tokd) {
  const char *s; /* pointer to rest of string */
  const char *t;
  int n;

  char **v;
  int c;

  c = 0;
  s = str;
  n = 0;

 /* first pass: n used as token length counter */
  while(1) {
    if(strchr(delim,*s) || !*s) {
      if(n) {
        c++;
        n=0;
      }
    } else {
      n++;
    }
    if(!*s)
      break;
    s++;
  }
  v = malloc((c+1)*sizeof(char*));

  /* second pass: n is now array index */
  s = t = str;
  n = 0;

  while(1) {
    if(strchr(delim,*s) || !*s) {
      if(s-t) {
        v[n] = malloc((s-t)+1);
        memcpy(v[n],t,s-t);
        v[n][s-t] = 0;
        t = s + 1;
        n++;
      }
      else {
        t++;
      }
    }
    if(!*s)
      break;
    s++;
  }

  v[c] = NULL;
  *tokc = c;
  *tokv = v;
  return 0;
}

#ifdef TEST
int main(int argc, char **argv) {
  int tokc;
  char **tokv;
  int i;

  if(argc<3) {
    fprintf(stderr,"usage: tok STING DELIMITERS\n");
    return 1;
  }

  tok(argv[1],argv[2],&tokc,&tokv,NULL,NULL);

  printf("%d tokens\n",tokc);

  for(i=0;i<=tokc;i++) {
    printf("%02d: %s\n",i,tokv[i]);
  }
  return 0;
}

#endif

Name: Anonymous 2010-02-16 17:45

>>7
copyright comp sci 101 assignments?

Name: >>2 2010-02-16 18:11

Here's a "MADNESS LIES HERE" version I wrote when I was bored a year ago:


toknobj.h:

#ifndef _TOKNOBJ_H_
#define _TOKNOBJ_H_

#define OOPINVOKEBASE(obj,type,method) ((C##type##VTable*)(##obj->VTable))->##method(##obj)
#define OOPINVOKEBASEVAR(obj,type,method) ((C##type##VTable*)(##obj->VTable))->##method(##obj,##args)
#define OOPDESTRUCT(obj,type) ((C##type##VTable*)(##obj->VTable))->Destroy##type(##obj);obj=NULL;

typedef struct _TokenObject
{
        const void *VTable;

        char *separators; // public read-only
        char *string; // public mutable
        char *current; // pointer to current token in string, private mutable
        char *last; // pointer to end-of-string, private read-only
} TokenObject;

void DestroyTokenObject(TokenObject *to);
char *GetToken(TokenObject *to);
TokenObject *CreateTokenObject(char *s, char *seps);

typedef struct _CTokenObjectVTable
{
        void ( * DestroyTokenObject)(TokenObject*) ;
        char* ( * GetToken)(TokenObject*);
} CTokenObjectVTable;


#endif

toknobj.c:

#include <stdlib.h>
#include "toknobj.h"

const CTokenObjectVTable TokenObjectVTable = { DestroyTokenObject, GetToken };

TokenObject *CreateTokenObject(char *s, char *seps)
{
        TokenObject *to;

        to = (TokenObject*)malloc(sizeof(TokenObject));
        to->separators = (char*)strdup(seps);
        to->string = (char*)strdup(s);
        to->current = to->string;
        to->last = (to->string)+strlen(s);
        to->VTable = &TokenObjectVTable;

        return to;
}

void DestroyTokenObject(TokenObject *to)
{
        free(to->separators);
        free(to->string);
        free(to);
}

char *GetToken(TokenObject *to)
{
        char *s;
        char *end;
        char *last;
        char *seps;

        s = to->current;
        seps = to->separators;
        last = to->last;

        if (last <= s ) return NULL;

        for(;*s&&strrchr(seps,*s);s++); // skip separators at the beginning of the token

        if (last <= s ) return NULL;

        for(end=s;*end&&!strrchr(seps,*end);end++);
        if(*end) *end=0;
        end++;
       
        to->current = end;

        return s;

}



// Usage is like this (copy pasted from some real code I wrote, I'm too lazy to write a real example):

char *GetMkvSplitString(char *cutlist, int *splitcount, FRAMERATE *fr)
{
        char *tok;
        char *splits;
        int parity = 0;

        TokenObject *to = CreateTokenObject(cutlist,",");
        splits = calloc(1,1);

        while (tok = OOPINVOKEBASE(to,TokenObject,GetToken))
        {
                int frame = atoi(tok)+parity;
                if (frame)
                {
                        ReallocStrCat(&splits,GetTimeStrFromFrames(frame,fr));
                        ReallocStrCat(&splits,",");
                        (*splitcount)++;
                }
                parity = parity?0:1;
        }

        OOPDESTRUCT(to,TokenObject);
        splits[strlen(splits)-1]=0; // clear last ,

        return splits;
}

// Irrelevant, but in case you're wondering, inn util.c:
char *ReallocStrCat( char *dst[], char src[] ) // outputs new pointer to *dst
{
        char *newstr;

        newstr = malloc(strlen(*dst)+strlen(src)+1);
        strcpy(newstr,*dst);
        strcat(newstr,src);

        free(*dst);

        *dst = newstr;
        return newstr;
}

Name: >>2 2010-02-16 18:37

And I just wrote this right now:

(defun make-tokenizer (string delimiters &aux (delimiters
                                               (coerce delimiters 'list)))
  "MAKE-TOKENIZER returns a tokenizer function, which
   will return the next token in the string separated
   by a delimiter, if there is no next token, return NIL.
   STRING is the input string.
   DELIMITERS is a list of character delimiters to look for." 
  #'(lambda ()
      (dolist (delimiter delimiters)
        (let ((pos (position delimiter string)))         
          (when pos
            (return
              (prog1 (subseq string 0 pos)
                (setf string (subseq string (1+ pos))))))))))

;Usage:
;CL-USER> (make-tokenizer "sup /prog/? I'm bored again." " .")
;#<COMPILED-LEXICAL-CLOSURE (:INTERNAL MAKE-TOKENIZER) #x8F5A5D6>
;CL-USER> (funcall #<COMPILED-LEXICAL-CLOSURE (:INTERNAL MAKE-TOKENIZER) #x8F5A5D6>)
;"sup"
;CL-USER> (funcall #<COMPILED-LEXICAL-CLOSURE (:INTERNAL MAKE-TOKENIZER) #x8F5A5D6>)
;"/prog/?"
;CL-USER> (funcall #<COMPILED-LEXICAL-CLOSURE (:INTERNAL MAKE-TOKENIZER) #x8F5A5D6>)
;"I'm"
;CL-USER> (funcall #<COMPILED-LEXICAL-CLOSURE (:INTERNAL MAKE-TOKENIZER) #x8F5A5D6>)
;"bored"
;CL-USER> (funcall #<COMPILED-LEXICAL-CLOSURE (:INTERNAL MAKE-TOKENIZER) #x8F5A5D6>)
;"again"
;CL-USER> (funcall #<COMPILED-LEXICAL-CLOSURE (:INTERNAL MAKE-TOKENIZER) #x8F5A5D6>)
;NIL
;CL-USER> (funcall #<COMPILED-LEXICAL-CLOSURE (:INTERNAL MAKE-TOKENIZER) #x8F5A5D6>)
;NIL

Having closures makes this problem trivial as you can see.

Name: Anonymous 2010-02-16 20:19

>>14
ENTERPRISE C

Name: Anonymous 2010-02-16 22:47

>>13
No, I wrote >>7 on my own because I needed it for a personal project. I later extracted it to its own file because I thought it useful for other C projects, current and future.

Name: Anonymous 2010-02-17 1:24

>>1
So, since no one else has posted the answer yet, it's because string literals cannot be modified, and strtok modifies its string argument (it replaces the tokens with null bytes). If you attempt to modify a string literal, poof, segfault.

Wrap it up in a strdup(), like this:

char *buf = strdup("5/90/45");

That gives you a mutable copy. Then free() it when you're done. Adding strdup() to both calls fixes it.

Also, building with g++ or gcc in C99 mode would give you warning: deprecated conversion from string constant to ‘char*’. String literals are const, and strtok takes non-const.

And for everyone who complains about strtok(), it's only unsafe in a multi-threaded environment, and only on some platforms. AFAIK most platforms actually store strtok()'s internal state in a thread-local.

Name: Anonymous 2010-02-17 1:33

>>18
Actually now that I think about it, strtok() can still break if e.g. you use it multiple times in a nested loop. Yeah it's not terribly safe. Here's an implementation of strtok_r() that uses only strchr(), as a drop-in replacement for strtok():

http://www.koders.com/c/fid1EA2B77BE11BE3D41E745290C9424DA8A7768D07.aspx

Name: Anonymous 2010-12-21 1:22

Name: Anonymous 2011-01-31 19:53

<-- check em dubz

Don't change these.
Name: Email:
Entire Thread Thread List