/prog/ - A better tr

Name: Anonymous 2010-08-22 14:39

private static readonly HashSet<char> badChars = 

    new HashSet<char> { '!', '@', '#', '$', '%', '_' };



public static string CleanString(this string str)

{

    var result = new StringBuilder(str.Length);

    for (int i = 0; i < str.Length; i++)

    {

        if (!badChars.Contains(str[i]))

            result.Append(str[i]);

    }

    return result.ToString();

}

This algorithm makes use of the .NET 3.5 'HashSet' class to give O(1) look up time for detecting a bad char. This makes the overall algorithm O(n) rather than the O(nm) of your posted one (m being the number of bad chars); it also is lot a better with memory usage, as explained above.

Name: Anonymous 2010-08-22 14:47



char badChars[256] = { /* the contents of 

 this are left as an exercise for the reader */ };

void CleanString(char *s) {

 char *d=s;

 do

  if(badChars[*s])

   *d++ = *s;

 while(*s++);

}

Guaranteed O(1) time with a lower constant.

Name: Anonymous 2010-08-22 15:05

>>1
Thanks man, I think you forgot to factor in the two minute start up time though.

Name: Anonymous 2010-08-22 15:07

>>2
I never thought I'd see this approach used on /prog/, and in fact I've avoided mentioning it in other contexts (to avoid tedious explanations.) Props.

Name: Anonymous 2010-08-22 16:04

>>4
That's because we don't want to actually HELP THEM!!!

Name: Anonymous 2010-08-22 16:12

#include "void.h" // A better tr
void cleanstr(char* src,char* badchars,char fillchar){
int j,srcl=strlen(src),badcl=strlen(badchars);
while(srcl--){j=badcl;
while(j--){if(src[srcl]==badchars[j]){src[srcl]=fillchar;}}}}
int cleanstring_j,cleanstring_srcl,cleanstring_badcl;
#define cleanstring(string,badchars,fillchar) ;cleanstring_srcl=strlen(string),cleanstring_badcl=strlen(badchars);while(cleanstring_srcl--){cleanstring_j=cleanstring_badcl;while(cleanstring_j--){if(string[cleanstring_srcl]==badchars[cleanstring_j]){string[cleanstring_srcl]=fillchar;}}};
mainstart;uquad start,end;
//cleanStr
if(!argv[1]) exit(puts("No arguments"));
char* args1=malloc(strlen(argv[1]));strcpy(args1,argv[1]);
cputime(start);
cleanstr(argv[1],"~!@#$%^&*()",' ');
cputime(end);printf("Clean str:`%s` in %llu cycles\n",argv[1],end-start);
cputime(start);cleanstring(args1,"~!@#$%^&*()",' ');
cputime(end);printf("Clean strInline:`%s` in %llu cycles\n",args1,end-start);
;mainend;

Name: Anonymous 2010-08-22 16:43

>>2
Pointers considered harmful!

Name: Anonymous 2010-08-22 17:38

>>5
That's but a small part of my aversion to tedious explanation. AFAIK to the average /prog/lodyte, >>2 might as well be using a Y-combinator--which I suspect is better understood around here... and at least it has an evocative name and possibly bewildering implementation to wield against people who would write it off.

Name: Anonymous 2010-08-22 18:41

>>8
Hm I'm actually intrigued. How would you implement a filter+lambda using the Y combinator? HIBT?

Name: Anonymous 2010-08-22 18:58

>>9
You seem confused. The purpose of the Y combinator is to show that lambda is the only thing necessary to implement recursion; it cannot be used to implement lambd.

Name: Anonymous 2010-08-22 19:07

>>2
Well, not with Unicode. That said it is a nice solution and reminds me of something I read the other day^[1]

>>8
How so? It just looks like memoisation to me, and I'd hope that was part of any functional programmers toolbelt^[2].

_{--

1. http://jaspervdj.be/posts/2010-07-11-the-dwarfs-and-the-fast-marking-algorithm.html I was suckered in by the title thinking it was something else

2. This assumes that the people on /prog/ who claim to prefer functional programming actually do it, and don't just spend all day jerking it to loeb.}

Name: Anonymous 2010-08-22 19:14

>>9
You haven't been trolled, just uh... look, >>8 is saying that /prog/lodytes aren't any more (or even as) likely to understand what's going on in >>2 than they are to understand code that uses a Y-combinator. It has nothing to do with implementing >>2 with a Y-combinator. (I have no idea where the lambda part in your post comes from.)

Name: >>11 2010-08-22 19:19

actually memoisation is an incorrect description, the phrase I was looking for is "lookup table"

Name: Anonymous 2010-08-22 19:25

>>11
How so?
A lot of languages memoize for you, and my biggest problem is when it's relevant to portable assembly discussions. You can safely bring up the Y-combinator with HASKALites because, at worst, that's what they think fix does.

Here at least, if you bring memoization into a discussion about using memoization in C you'll probably end up facing someone who will not believe it's possible to solve any kind of search in O(1). That is my fear at least. It's only been relevant when this kind of prejudice has been already presented in the discussion so I've never brought it up. Might as well argue religion with someone who firmly believes whatever you don't.

Name: Anonymous 2010-08-22 19:27

>>13
It's close enough. The best examples of this kind of LUT use are to cache searches as needed.

Name: Anonymous 2010-08-22 21:55

>>2
Okay, I think I know what happens but I have a lingering question. After you've changed the contents of the array s points to how do you know the length of the valid array? what happens to the /0 character?

Name: Anonymous 2010-08-22 22:26

>>16
Read the code again, carefully.

Name: Anonymous 2010-08-22 22:49

>>2
> if(badChars[*s])
ಠ_ಠ

Name: Anonymous 2010-08-23 0:09

>>18
Don't stare at it too long, you wouldn't want to learn something by accident.

Name: Anonymous 2010-08-23 1:42

>>2
Now in unicode please.

Name: Anonymous 2010-08-23 1:53

>>20
Same deal, but with a larger (mostly unallocated) array and direct access to a very cooperative MMU. Or, possibly doable with your virtual memory system, who knows?

Name: Anonymous 2010-08-23 6:56

>>20,21
Doable with a two-level lookup.

(Left as an exercise to the reader)

Name: Anonymous 2010-08-23 9:27

>>19
I retract >>18; I failed to notice the *s++ terminating condition of the while loop.

Name: Anonymous 2010-08-23 12:46

Hashtable lookup would be worth it if the number of items you check against is large enough, and those items were large/complex enough (not chars). For item counts like yours, the character compare is faster. An even faster method would be to use a 256character array (for ASCII) or bitvector to mark the characters that should be filtered, lookup would be as fast as one memory access there (or shift+memory access), and it would be much faster than your hashtable lookup. Oh wait, I should have read the thread before I posted this >>2 already gave you this idea.

Name: Anonymous 2010-08-23 13:47

Use JIT.

Name: Anonymous 2010-08-23 15:43

>>22
Two level bloom filter if you want to squeeze the last bit of juice out of your cache lines.

VROOM VROOM

A better tr

1 Name: Anonymous 2010-08-22 14:39

2 Name: Anonymous 2010-08-22 14:47

3 Name: Anonymous 2010-08-22 15:05

4 Name: Anonymous 2010-08-22 15:07

5 Name: Anonymous 2010-08-22 16:04

6 Name: Anonymous 2010-08-22 16:12

7 Name: Anonymous 2010-08-22 16:43

8 Name: Anonymous 2010-08-22 17:38

9 Name: Anonymous 2010-08-22 18:41

10 Name: Anonymous 2010-08-22 18:58

11 Name: Anonymous 2010-08-22 19:07

12 Name: Anonymous 2010-08-22 19:14

13 Name: >>11 2010-08-22 19:19

14 Name: Anonymous 2010-08-22 19:25

15 Name: Anonymous 2010-08-22 19:27

16 Name: Anonymous 2010-08-22 21:55

17 Name: Anonymous 2010-08-22 22:26

18 Name: Anonymous 2010-08-22 22:49

19 Name: Anonymous 2010-08-23 0:09

20 Name: Anonymous 2010-08-23 1:42

21 Name: Anonymous 2010-08-23 1:53

22 Name: Anonymous 2010-08-23 6:56

23 Name: Anonymous 2010-08-23 9:27

24 Name: Anonymous 2010-08-23 12:46

25 Name: Anonymous 2010-08-23 13:47

26 Name: Anonymous 2010-08-23 15:43