Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

small string challenge

Name: Anonymous 2012-05-04 10:39

hi /prog/

I need to make a function that takes two strings, and removes all characters from the first string that are in the second string; we get marked on how efficient our code is. This is what I got so far, it's pretty shit and apparently theres a much better way to do it.


#include <iostream>
#include <string.h>

void rmFromString(std::string & toBeProcessed, const std::string & toBeIgnored)
{
        size_t found;
      std::string newString;

      found = toBeProcessed.find_first_not_of(toBeIgnored);

      while (found != std::string::npos)
      {
            newString += toBeProcessed[found];
            found = toBeProcessed.find_first_not_of(toBeIgnored, found + 1);
      }

    toBeProcessed = newString;

}

int main()
{
    std::string toBeProcessed = "she sells sea shells on the sea shore",
        toBeIgnored = "se";

    rmFromString(toBeProcessed, toBeIgnored);

    std::cout << toBeProcessed << std::endl;
}


anyone want to point me in the right direction?

Name: Anonymous 2012-05-04 10:51

{i]see strtok[/i]

Name: Anonymous 2012-05-04 11:02

thanks bro

Name: Anonymous 2012-05-04 11:46

>>1
Define "efficient".
Small code size? Time the program takes to execute? Big O notation of the algorithm used?

Name: Anonymous 2012-05-04 15:03

literally time taken

Name: Anonymous 2012-05-04 18:59

If you only get marked on time taken and not on correctness, that opens up a few avenues.

Name: Anonymous 2012-05-04 19:17

Sepples considered harmful.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

char table[256];

int main(int argc, char **argv)
{
    int i, j;
    char *out;

    if (argc < 3)
        return 1;

    for (i = 0; argv[2][i]; ++i)
        table[(int)argv[2][i]] = 1;

    out = malloc(strlen(argv[1]) + 1);

    for (i = 0, j = 0; argv[1][i]; ++i) {
        if (!table[(int)argv[1][i]])
            out[j++] = argv[1][i];
    }
    out[i] = 0;

    puts(out);
    return 0;
}


$ gcc dicks.c -Wall -Wextra -pedantic -ansi -o dicks && ./dicks "she sells sea shells on the sea shore" "se"
h ll a hll on th a hor

Name: Anonymous 2012-05-04 19:19

>>7
Oh. out[i] = 0; should be out[j] = 0;, obviously.

Name: Anonymous 2012-05-04 22:52

>>7
You should use unsigned for table indices. Otherwise sign extension will fuck you up.

Here's a solution that gets rid of the condition in the main loop, only inline asm can be more efficient.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int table[256];

int main(int argc, char **argv)
{
    unsigned int i, j;
    char *out, *pout;

    if (argc < 3)
        return 1;

    for (i = 0; i<256; ++i)
        table[i] = 1;
    for (i = 0; argv[2][i]; ++i)
        table[(unsigned int)argv[2][i]]--;
    out = malloc(strlen(argv[1]) + 1);
    for (pout=out,i=0; argv[1][i]; ++i) {
        j = argv[1][i];
        *pout = j;
        pout += table[j];
    }
    out[j] = 0;
    puts(out);
    return 0;
}

Name: Anonymous 2012-05-04 23:23

>>9
Good point, but if the input is in ASCII, sign extension won't be an issue and the only problem is that the table is twice as big as it needs to be. If it's not, the algorithm is likely broken anyway. (Are there any single-bit ASCII extensions still in common use?)
Getting rid of the branch is probably a good idea, but you've introduced new problems. The loop through the table is unnecessary, table[(unsigned int)argv[2][i]]--; should really be table[(unsigned int)argv[2][i]] = 0; (or repeated characters in the second string break things), and out[j] = 0; forgets that j has changed meaning.
So:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int table[256];

int main(int argc, char **argv)
{
    unsigned int i, j;
    char *out, *pout;

    if (argc < 3)
        return 1;

    for (i = 0; argv[2][i]; ++i)
        table[(unsigned int)argv[2][i]] = -1;

    out = malloc(strlen(argv[1]) + 1);

    for (pout=out,i=0; argv[1][i]; ++i) {
        j = argv[1][i];
        *pout = j;
        pout += 1 + table[j];
    }
    *pout = 0;

    puts(out);
    return 0;
}


It's possible this could be further improved by replacing the assignment to our output string with a putchar, since we don't actually need to save the string; that way we don't even need to malloc anything and we can omit the call to strlen, which saves a pass through the first string, but it does reintroduce the branch.
I don't know if that's a net savings and I'm not about to find out.

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-05-05 7:09

>>10
Are there any single-bit ASCII extensions still in common use?)
ISO-8859-1 and Windows-1252. C'est utilisé pour écrire le français.

In fixing >>9's bug and removing the first loop, you've introduced another operation inside the loop; one extra add ain't too bad (probably becomes a LEA) but since the first loop. For short strings getting rid of the first loop might be faster, but since this is O(1) and the second loop is O(n) the second loop time will dominate for n >>> 256.

Asm attempt using dord table. Main loop is only 5 instructions.


 xor eax, eax
 mov ecx, 256
 mov edi, table
 inc eax
 mov edx, argv
 push edi
 rep stosd
 pop edi
 push edx
 mov esi, [edx+8]
 cdq
 jmps loop1t
loop1c:
 mov [edi+eax*4], edx
loop1t:
 lodsb
 or al, al
 jnz loop1c
 pop edx
 mov esi, [edx+4]
 mov ebx, pout
 jmps loop2t
loop2
 mov [ebx], al
 add ebx, [edi+eax*4]
loop2t:
 lodsb
 or al, al
 jnz loop2
 mov [ebx], al


I don't know if using byte table would be faster/shorter. There's no addsx but xlatb might be useful in some way...

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-05-05 7:29

For comparison, OP's function is around 200 bytes, and that does not include the rest of the C++ string shit which amounts to ~1.5KB. (And eventually calls into memchr/memcmp/strlen anyway.) >>11 is <64 bytes, and >>9,10 are in the 70~100 bytes range.

Name: Anonymous 2012-05-05 7:40

>>11
I love how Windows character encoding is stuck in 1985

Name: Anonymous 2012-05-05 7:46

Prelude> filter (\x -> not $ x `elem` "se") "she sells sea shells on the sea shore"
"h ll a hll on th a hor"

Lookup should be replaced by a HAMT or at least a balanced red-black tree. For very small filter strings though, a simple list is likely faster

Name: Anonymous 2012-05-05 8:06

>>10

Hoo hain't red no scipper 2day?


#include <limits.h>
#include <stdio.h>
#include <stdlib.h>

static int table[UCHAR_MAX];

int main(int argc, char *argv[])
{
    const char *ptr;
    char *dst = argv[1];

    if (argc != 3)
        return EXIT_FAILURE;
    for (ptr = argv[2]; *ptr; ptr++)
        table[(unsigned char) *ptr] = 1;
    for (ptr = dst; *ptr; ptr++) {
        *dst = *ptr;
        dst += !table[(unsigned char) *ptr];
    }
    *dst = '\0';
    puts(argv[1]);
    return 0;
}

Name: Anonymous 2012-05-05 8:08

>>15
UCHAR_MAX + 1

ouch!

Name: 10 2012-05-05 17:40

>>11
Moi, j'utilise UTF-8 pour écrire le français.

Keep in mind that we're passing strings from the command line; few if any are going to be longer than 256. It's an assumption I should have made explicit, but I think it's a fair one.

Half-assed Informal testing with the input provided in >>1 suggests the putchar way (with the branch) is faster than the malloc way, so in light of >>15, I'd like to propose the following as our final result:

#include <stdio.h>
#include <stdlib.h>
#include <limits.h>

int table[UCHAR_MAX + 1];

int main(int argc, char **argv)
{
    if (argc != 3)
        return EXIT_FAILURE;

    for (; *argv[2]; ++argv[2])
        table[(unsigned int)*argv[2]] = 1;

    for (; *argv[1]; ++argv[1])
        if (!table[(unsigned char)*argv[1]])
            putchar(*argv[1]);

    putchar('\n');
    return EXIT_SUCCESS;
}


The advantage of this over x86 assembly (aside from portability and readability) is that it's also perfectly valid Sepples, so >>1-kun can hand it in and get a pat on the head from his teacher. I hope he relays said pat to us here.

Name: Anonymous 2012-05-05 18:45

>>17
b-b-but, you reintroduced the conditional into the main loop...

Name: Anonymous 2012-05-05 19:52

ALGOL code:
PROC string filter = (STRING source, STRING remove) STRING : (
    STRING result;
    FOR i FROM LWB source TO UPB source DO
        IF NOT char in string(source[i], LOC INT, remove)
        THEN result +:= source[i] FI
    OD;
    result);
print(string filter("she sells sea shells on the sea shore", "se"))

The char in string function is like strchr in C.

Name: Anonymous 2012-05-05 19:59

>>18
Yes, after testing to see if it was faster. It was.

Name: Anonymous 2012-05-05 20:04

>>19
Slowest one yet, until someone does a Ruby version.

Name: Anonymous 2012-05-05 20:14

>>21
s = ARGV[0].dup
ARGV[1].each_char { |c| s.delete!(c) }
puts s


I don't have an ALGOL compiler so I can't verify.

Name: Anonymous 2012-05-05 20:45

>>22
Here's a comparison to >>17, anyway:

$ gcc -o dicks dicks.c
$ time for i in $(seq 10000); do ./dicks "she sells sea shells on the sea shore" "se" >/dev/null; done

real    0m19.632s
user    0m1.624s
sys    0m4.416s
$ time for i in $(seq 10000); do ruby dicks.rb "she sells sea shells on the sea shore" "se" >/dev/null; done

real    1m21.144s
user    0m28.106s
sys    0m30.582s


That's much better than I expected from Ruby, especially considering the dumbness of the algorithm.
I assume C's poor performance is mostly down to the overhead associated with spawning so many processes. This may make this a pretty meaningless test.

Name: Anonymous 2012-05-05 22:33

rmFromString str [] = str
rmFromString str (x:xs) =
    rmFromString (filter (/= x) str) xs


Consider this code UNTESTED

Name: Anonymous 2012-05-05 22:49

#FIOC
rmFromString = lambda s, f : filter(lambda c : c not in f, s)

Name: VIPPER 2012-05-06 4:59

Anti-shit bump

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-05-06 7:15

All you timers should keep in mind that OP wanted "efficient", not "fast". Scale the times by memory usage and a very different picture can emerge.

Name: Anonymous 2012-05-06 7:27

>>27
However, >>4,5.

Name: Anonymous 2012-05-06 8:12

>>26
data:text/png;base64,iVBORw0KGgoAAAANSUhEUgAAASwAAAD7BAMAAAAiH7RlAAAAAXNSR0IArs4
c6QAAABhQTFRFAwYCHiAeNDUzWVpYmJqX/5WD0cvH/f/8KAcFagAADPdJREFUeNrtnUlzozzXhg9DOVu
SdLW3dJJKtnioZkvHlPkFfr8tsf2gbWxUuv/+t2Aw2IwCkSyiVSqxowvp6EwSR4Rv2egH6wfrB+sHa1S
s4+YPJU1b+dG3wIr8JRFp92kjIn29ZV+JJTar1dIiIt+Psnb0N3+Intfsq7DEu5XOHL1eDeA7kfbCvgT
rQEREL77vb6xfN3/dWKT9nR5LvFPziIjNH3qMJsbiDhE9b1skj/S/k2Jxi4jauxQOvUyIJSwio5NE/yN
zMizhUOdROFgGmwZLOKRtuxsA2wgmwfpHWp+OuKWzCbDOpPXrZhSuNixOet9J4fZw+WrBEjKiwivMwLh
Yjibz4Ef6rRTLI1fqv+4lv9cN6yCttf9pTBlWTNKyKxxDFZawSV4z8munbDQsT0Jwj77v+ywRL6YE60B
G5r13jCOOmfu6ZgAcUwUWt5IpFBubiOi5/dkP9PC09n3fX5IRAHzIaqzFcmiWuARkbKPjglotSky/ko+
8P/mWFgB7fXys1BTGpG/LU1ov42b2A82ShWjPx8YSFrkA+MWn4VazrNhGPmpkQthvQCxvs6luFRoAhKM
T/cotSpO6OOcInMgEzjoA79e4WDFRAOCDVkT5J8JZk5maXVksywW4tOKjGnk3AQgyY6Jc//Am6bIvAHw
NAOEdAGc2JhZPBivUAl7AElaDwGtZhB2lH49nGCBdVC1ZMwCwf0NYhRXo1E9JnA7L5v7+MVkk3AQA520
8LJ6MUKwx4GhdWLx6/Xi+S2VS+2NpLgAIAwBifTwsL1FB4bVJDKsffcOy0eLWDICT6JZklCWHi2p1Fpx
ULsRxtY0AwKvugtwEQhxT3bWgv9ExlfYPYyyscypO9wmWWBCR9sxK81l8CjIYYD+vFpT6/eKdKDNWQk5
HUJV2SEbFZgmVto38BelPVkK7v/JY97r9G4iJtDWD+G+3Y8BmlefhvPk4WDxToGEylcnjcit7/ivnkNP
8rDNAMABit9vt/lf+d/o4WCGliz3Wfd93MvdERJmG8sxyxAZksyt2u91ut2N1inYIVj4awiEiYwtARJe
ezjNOl0zp0TYY4L0kHmklVjgbA4tf9KdIHWBRmBluu9hTqjJxsGgOwFuvHoI6LKlZpNs5vJbRpLP/S35
+dgJgT6SvAbEgY28C8ALwvwDw32632+3qzaU8lnMTGiR9JcMVs/AVOK2Xidv+kjgJuVUSURRF19+XmUW
60UI33t6ugBVhqQfwXAht9bS2GAQZq2WzbuLGcKzTbWDwX2ESETNPfyIX0BngBOC6TfTSHExIzOI1lnc
bpZeU0RrCJssFiAGOidjkW4ZNcyZjPhjLrvDYxUVehAuI7acLQa84aPRoz4ULxI3jcTaGYomW4C5mABA
aLNb0Ff0+WAbDHYDG8RD98yR07cS3ZIhSJaLbv//RI8M+AH4D8JtnMRiI9dkSoSfd+9xK1QgPgHWOW9c
+7wZieZ28tldgkyAJF9jmk1urIsyBWFYXKRB3wGuChDRdJJq/YA3DijutmdhNVqSLZAbbmxMMwupmJ07
sgvXeqZdwPgir21N94IL10amXszkES2jdEpCAeEuxRLdvGEOw4u62/i1ZiV1b9VLir52wwu5BHes6UNc
K9dhpdmiw292pfb7lzm96akI0dkiDg5ROLRePkIhoncVVtdq7iHUylWHlMu/ovr8gFzjdATjNOmDJRZq
dmrAzOXkDsNdSg1S7QmkS0co1okhCdecNQk+yAqwNq6PWkmupFAnKjZz99ERUdxKggBWbCrE+EylKsl4
QOoND2mq7b8f6nCvESp85c3E0lthf0T6JnkLRyoQ7UxQ6S8evLmfW09eSbkm+Jx0toWdO1OmuBUsYKqm
ybFkyabGZ/cjdFiylEp+LiPM715BOmehovVZh9Y8DerUkiYc9bQGeDBQrZ4q0Qs7gguW5SrFO6TpfEFH
Vtm5osA+zAstRKvH5GhSb5VPVAUc7KIa5OZa4V0pVE5TlPpvQSyqK5GO5nsa6sNCznW+xMAC+fA5Sbz/
3yi5Y8UwxVj4bYkF6gKPvL0l7gbCIKEh0/ml+g/X5phYLmWuDf9raMeAR0UsExPQ38gyEs6C46uhqASt
0bVJ5PpMLToFY+RaQ7CwJ2/0kbb2gW5HvHfhKelxwXgHs3wDxB0jSd/iYncunf3Ose6YYK50hoZfGL7G
MXOe08re3Wl7oiqmyaK+w4j0398P+sPI+F8nGvf2x5tdY4Sx3bBxWDiRoGkONdE8d4Hqeuz+ZFwvOykF
QhnWaqcZKexB2utGQzFAqcRorTxeVn0VhO6cPflytVqsVuYniDw2WBM7ioRJLtTa9EpMPE4AOcNJ9dhs
40zRuTRGL32UmUmPAnojoJu1OU2nTy1rnRqaQFgzAcbO8f6xzmhV7W0UXQtgv0XGjF9yam021HMuG8pY
r7Jjqw+kyVv8U9QAsHKzH5m2PHEu57enXBU0SJF6NVlMLilj8S7EKyeHUZNI0nnyz61TQ5em0ZVgz9Vh
2JdYGl3xflOcqUqz4Tj3WlWo8bjM9HutpMKTn8F+EJfxFqrnCOcIZnPuHIJnmxNOZEisT7CUORPQYpLq
V2QEsIiOxkdNjZaPlCMvKjsNA2LoOoTNhu9BY9qkvwYrWH5e1x60ZziYQ3kFnXylbDiv1dzARzoFwDp1
lamRCrDwf4wQlPclNhC4QznHPvkBB6IVhKxpIbiJ8AxwXzhbcAA5sQixRwrJLma/Q8N8N4KyvvRlw/DK
sohI734GnfrNHFExtfPSiAnMCIAzyIeGrRGFkAf+EplovBvkbdkn+sVp/a1LH5uC2hjXTuYG3XTSENRn
WvXKsmwkR1O40q498bpIvTYIzXZx4gyXcDljB5FgdIp8JchBnGazeZ4Zk81v9sBTvj6FnBi3DOiu3Pr3
mgzCV9el27PAKS30SoteioubgUkXg0wtL6XGD3gqbKvIAimJ9yGCdJ9tP7IU15e5rD6yv2atuxVJtrPt
tOhfOQai1iv1iqwvWSa1w9dsLp07O4hgm0ZXDUnxQqp+6JkmjpVbJF7HUKlRdFourPEnZMxCd6txpTyN
SOqWr0HHuKSFFrFihiugZwpQCblKnInouc5J2t1Xqh6uXHpTZn77bqOVXRLoo+qPUQjQGYHXyIqSSKH3
P0pWxurxJ6MjYqL5n6a5SXx3esJfKVvSNq66wOryt9SljOvtGoVdYHeyizHLtHbNf5y/by0rIuIuxORC
rg9BLZCt6v/F9k+1tdyMkDjf2Xr03WHtj7FXVUU03Ywmr7cn6H5ztL463KXuv7X/0P2bc3x2vKuoRjBm
IygUvFRscbcMl8ToyGwGr7QX53ttW8eB39gG0lsTpfQRNok5F5S6V7Y7q2kj4HJVYLfWpeqacuESd8Zq
yYI2FDHu6/KfZWFhxo5LouREj4zfWlZwzR9OOXKaYZ22Bvr9jKa4PmXCqThz3Dc/YT3FJ+f71xR8bHrJ
PUkguD9RQKvN1FM9cqnJTY2HRYASPa7xqZXnntXMVBmoFvqVorTnY4xK2OzYW4jotce6sT8esG1gQr2q
u7seq7DcFWPCqi6V39s3PsuWa24ptV4p9+eWXJsmSLbrdXpq8iqvjTuqHdDK2tZB7JVc3j0u+Cmt72Xt
eVXG7WygzoFJzu+cYV9Qc7pTj+hiQuO7g0Fbc3tDF4VRTPrrQjvb1RQEdEkPcHlKSn7ot9Kt7XtoV18B
K7t2iEuGU67K3ZveEM6zufdcbapbl+2daOhWO5mICLGBj6euuEaxwaItpsCAcegg6OYLcHkrV666od6L
soqGm1Nuh320og7HAl6StI6D40v+NNlnQI8OkWMBxSdqzz2r3cI5L0se4Zat32uK4ILp/3szq/jbODVs
S2RSxJCLt+epSNOEvLaJRLvOB7B13xyURET2snp7XURRtnlY2ET08b4GvxIKwI3+1IrrP7rp7WI16pZz
sjYAOAxD+jaKjv1pXFND9IizPhcpCILJY4RzAef7dsE4zqHw7SBYrNqGyyIwsFje+JRb074l1z1S+4iW
N5QQqT/ZKY4VBXW736Pv+0PtNpbE+3yqwRJRfvUrPQ4yRNNb57hbruCTSnle+7/urJQ3xvKSx+OwaSyy
I1tt89iLfpof11FjCuMI63LqAB6LZ1FhWKYoV74W4qPBbd2IsOKxcgWzUi3LlsbzgkhUUDo15Ge0QrPA
NcDLEoTcAjod1muUx/37ksRqCFZsZ1mF0qgFYwkgnsWkvbXos6CzZwbMVGOwBWA6Dw5o20r4GKwzgsGG
ZWxVYpzm8ALYSV3AAVjyD55409s2whA7vzX7FN8OCzbwHRSf/h2B5c0+JvA/EOpmeqtckhmBx3Zl9Qyz
Y6UVq3wwrVCVaw7A+vydW/D2xpC+TVIylrFzDQKy7b4n1+S31lrqasj9YP1g/WD9YP1g/WD9YP1j92/8
Dw8W84AjjdpUAAAAASUVORK5CYII=

Name: Anonymous 2012-05-06 12:23

>>27
Hello Cudder, I am the author of >>25 and my contribution is neither ``efficient'' or ``fast'', what do you think of it?

Name: Anonymous 2012-05-07 3:15

>>26
You bump this thread as "anti-shit?"  I'd hate to see what you consider a shit thread.

Name: Cudder !MhMRSATORI!FBeUS42x4uM+kgp 2012-05-07 4:30

>>30
I don't care, but do post the generated Asm.

(It probably won't fit in 1 post.)

Name: Anonymous 2012-05-07 4:35

So nobody on /prog/ has heard about strpbrk(3)?

Name: Anonymous 2012-05-07 6:17

>>33
Code using that will be both messier and much slower than >>17. The fact that it's in the standard library and sort of looks relevant doesn't mean it's magic (see also >>2).

Name: Anonymous 2012-05-07 9:46

one extra add ain't too bad
sounds like some polecat kebab country or ballad rock song

Name: bampu pantsu 2012-05-29 4:45

bampu pantsu

Don't change these.
Name: Email:
Entire Thread Thread List