Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-4041-

Show me your strcpy

Name: Anonymous 2010-05-13 20:16

Here is my string copy function: http://pastebin.com/f7gBfsDY

<Nola> it took agner fog 3 years to write a strcpy
<Nola> why are you guys all lamers
<Nola> would you write some lame ass thing that loops and copies Chars
<Nola> that version is gonna be 10x faster than your C loop


Three years, motherfuckers.  Have you optimized your hand-coded assembly strcpy() today?

Name: Anonymous 2010-05-13 21:01

Hello Agner I write Python software on my quad-core computer so I do not use strcpy.

Name: Anonymous 2010-05-13 21:09

My performance sensitive copies use DMA. ~sigh~

Name: Anonymous 2010-05-13 21:20

meh, I'm just fine with rep cmpsb/rep movsd/b(with /4/%4/&3 for the bytes) variant. That's what I use when I write it in asm, and it's also what msvc usually uses. I've seen the mmx/sse strcpy/memcpy/memset versions used in some other libraries, but IMO they're much more complex, and a lot more bloated/huge. I'd like to see some benchmarks/cycle counts (on a few CPUs), so I'd know if these overly complex versions are really worth it.

Name: Anonymous 2010-05-13 22:09

my strcpy uses copy-on-write, so it probably beats any other strcpy in existence in stupid benchmarks.

Name: Anonymous 2010-05-13 23:32

>>5
I think that's a naive assumption.

Name: Anonymous 2010-05-13 23:39

>>6
It depends on his implementation. I can imagine such a thing being actually slower for the first write, but I won't say anything until I see the implementation. Why would strcpy even use copy-on-write on a string-to-string bases. Copy-on-write is used on a segment/memory area basis.

Name: Anonymous 2010-05-13 23:58

>>7
I would expect it to be slower for small strings even without doing any writes.

Name: Anonymous 2010-05-14 0:10

All my strings are lists; it takes over 9000 cycles to copy them regardless of the algorithm I use.

Name: Anonymous 2010-05-14 0:13

>>4
This. movsX is around 10 bytes or so, I'm not going to count the version in OP but it looks to be at least a few hundred. More than 10x the size for only 2-6x improvement is a diminishing return.

Name: Anonymous 2010-05-14 0:24

>>7
most people trying to benchmark a strcpy function will just copy a string millions of times and never write to any of the copies. a copy-on-write strcpy uses a lot less memory and runs a lot faster in that kind of benchmark.

>>8
it's slower for strings that are 32 bytes or less. for anything longer, it's faster. unless you write to a string, in which case the first write probably takes longer than >>4's strcpy.

of course, if you really care about the performance of strcpy, you'll make sure all your strings are 16-byte-aligned and multiples of 16 bytes in length.

Name: Anonymous 2010-05-14 4:46

will just copy a string millions of times and never write to any of the copies.
We have something like that already. It's called a shared pointer.

Name: Anonymous 2010-05-14 4:51

>it took agner fog 3 years to write a strcpy
Did he get even a dollar for this?

Name: Anonymous 2010-05-14 5:32

>>9
over 9000
Back to /b/, please

Name: Anonymous 2010-05-14 6:30

>>14
BACK TO YOUR ASPERGERS SUPPORT FORUM

Name: Anonymous 2010-05-14 6:58

>>15
I dont thin so anon!

Name: Anonymous 2010-05-14 6:59

>>15
*Assburgers

Name: Anonymous 2010-05-14 7:01

>>17
* Aspergers

Name: Anonymous 2010-05-14 7:25

>>18
* Assburgers

Name: Anonymous 2010-05-14 7:31

>>15-19
Your asspergers acting up, sir!

Name: Anonymous 2010-05-14 7:33

Also, the aspergers forum now has a quality news piece coming up:
http://news.slashdot.org/story/10/05/13/183221/Wikipedia-Is-Not-Amused-By-Entry-For-xkcd-Coined-Word

Name: Anonymous 2010-05-14 7:57

>>21
Yet another piece of evidence for the "Randalls retard slaves should be executed" pile

Name: Anonymous 2010-05-14 8:14

>>13
Probably not, since he bitched about Intel's C/C++ compiler not generating fast code for the inferior and cheapo AMD CPUs.  Who would complain except those who can't afford a superior GenuineIntel CPU?

Name: Anonymous 2010-05-14 8:35

>>19-20
* Aspergers

Name: Anonymous 2010-05-14 9:06

* Arsepurgers

Name: Anonymous 2010-05-14 9:10

* Assburgers

Name: Anonymous 2010-05-14 9:18

>>24-26
You know only people with AS would dedicate so much time to single triviality.

Name: Anonymous 2010-05-14 9:54

>>27
the fuck on

Name: Anonymous 2010-05-14 14:39

>>23
your crappy intel doesn't have SSE4a.

Name: Anonymous 2010-05-14 14:57

And ECC Memory for non-Xeon models.

Name: Anonymous 2010-05-14 16:08

>>23
It's hardly the compiler alone. The Intel Performance Primitives also have a "Genu" "ineI" "ntel" CPUID check of their own.

Why the fuck would to that anyway? A string check, are you serious? That's pathetic, as expected from a company that only makes half-decent processors and then tries to bundle all their other failures with them (graphics and wireless — both terrible jokes).

How's that Larrabee going, Intel? Oh yeah, it's not a desktop x86 processor with few cores, therefore it's going bad. It's like I'm cruising aboard the Itanium again.

Name: Anonymous 2010-05-14 17:09

>>31
Larrabee
Sadly I sat through a presentation of a guy hyping the Larrabee a few weeks back. He was completely surprised that it was going the way of the Itanium. Upon receiving the news you could actually hear his spirit break.

Name: Anonymous 2010-05-14 20:03

>>31
At least they're wireless has free open-source drivers!

Name: Anonymous 2010-05-15 2:16

>>25-26
* Aspergers

>>27
No, I just have extreme OCD.

Name: Anonymous 2011-02-13 3:23

Bumping this thread because I was doing some reversing and found a memcpy implementation that takes a total of 6,878 bytes. It starts by issuing cpuid, then makes a ton of other decisions such as whether to use MMX, SSE, etc. before finally moving data around with either movsd/movsw/movsb, MMX/SSE, the FPU, or just move/store instructions, all complete with a bunch of subcases to handle unaligned data. What the fuck.

Name: Anonymous 2011-02-13 3:34

pastebin doesn't work in lynx

Name: Anonymous 2011-02-13 3:43

>>36
Fuck. This is why we need the LLVM system everywhere.

Name: Anonymous 2011-02-13 4:43

>>36
1337 Optimized MemCPY 2011  Enterprise Plus

Name: Anonymous 2011-02-13 5:24

>>36
Fuckity. Using

  mov eax, ME_HAS_MMX
  jne no_mmx
  mov memcpy, memcpy_mmx
  jmp bye
no_mmx:
  mov eax, ME_HAS_SSE4
  jne no_sse4
  mov memcpy, memcpy_sse4
  jmp bye

and then

call [memcpy]

looks easier

Name: Anonymous 2011-02-13 6:11

>>40
choosing which one to use at compile time would be even better.

Name: Anonymous 2011-02-13 6:42

>>41
Too bad C has no Lisp macros.

Name: Anonymous 2011-02-13 7:00

choosing which one to use at compile time would be even better.
Not everyone uses gentoo.

Name: Anonymous 2011-02-13 7:00

>>36
IHBT

Name: Anonymous 2011-02-13 7:03

>>43
You'd use the >>36 one not compiling the OMG OPTIMIZED one, otherwise too bad; slow as fuck.
Also, go back to /g/entoo.

Name: Anonymous 2011-02-13 8:26

>>41
Not every processor has MMX, SSE19, Power8D Later, therefore it's impossible to choose best way during compile time

Name: Anonymous 2011-02-13 8:31

Name: Anonymous 2011-02-13 14:41

Jesus fucking Christ, how fucking hard is it?

Just have have the function pointers to strcpy/memmove/etc. patched at run-time with the appropriate ones for the given CPU. Are you people all fucking retarded or have I been trolled constantly?

Name: Anonymous 2011-02-13 15:20

>>48
But those functions are frequently inlined!
Luckily, all my code runs on virtual machines. CLI, bitches.

Name: Anonymous 2011-02-15 9:42

>>49
No. Optimized versions are not inlined. What inlined is trivial
while(*a++=*b++)

Don't change these.
Name: Email:
Entire Thread Thread List