Return Styles: Pseud0ch, Terminal, Valhalla, NES, Geocities, Blue Moon.

Pages: 1-

Merging thousands of txts

Name: Anonymous 2009-04-06 16:17

Hello gentlemen,

I'm no programmer so forgive me if I use incorrect terms or fail to address important points.

I have tens of thousands of txt files which I have as a result of, among other things, a ton of OCR scanning of different texts. The naming scheme is A-1.txt with each subsequent page adding 1 to the number.

What I need to do is put all the txt files together into one. I'm on a linux machine and have failed to put them together in order. I've tried 'cat A-*.txt > new.txt' and the pages are merged out of order. The pages in the new.txt begin with any file that begins with A-1* and then any file that begins with A-2*, etc. I'm trying to find a way to put all these in order in such a way that the order will go A-1.txt, A-2.txt, A-3.txt, etc. Basically, I went them in Dewey Decimal order (I think).

I'm sure this could be easily done with a simple bash script, but I'm pressed for time in getting this together and don't have enough time to learn/experiment with it.

Name: Anonymous 2009-04-06 16:18

find -name \*.txt | sort -n | xargs cat > out.txt

Name: Anonymous 2009-04-06 16:21

I'm no programmer
Look at the top of the page what does it say?
Programming

Name: Anonymous 2009-04-06 16:51

find -name \*.txt | sort -n | xargs cat > out.txt

Thanks a lot. The command seems as though it should work, but i still get the same order as I started out with. I said the filenames incorrectly in my original post. They actually start at A-01.txt, A-02.txt.... A-111.txt...A-699.txt, etc.

Name: Anonymous 2009-04-06 17:43

>>4
Shouldn't matter, check the order of just find -name \*.txt | sort -n

Name: Anonymous 2009-04-06 18:03

>>5
For some reason It that spits them out in the same order as a lone 'ls' command, but in a single column. With A-71.txt being after 6999.txt.

I did however figure something out with the 'sort' command you introduced me to. I did 'ls | sort -n' and for some reason that spit out the order I was looking for so I put it into a txt file and with emacs I replaced the newlines with spaces and added 'cat' to the beginning and '> out.txt' to the end and it worked like a charm!

Thanks a bunch!

Name: Anonymous 2009-04-06 18:13

>>6
why do you use emacs if your not a programmer? (and don't say that its a good general purpose text editor because thats a bunch of horseshit)

Name: Anonymous 2009-04-06 18:16

>>7
your

Name: Anonymous 2009-04-06 18:28

>>8
[quote] you're [/quote]

Name: Anonymous 2009-04-06 19:25

What about merging billions of txts?

Name: Anonymous 2009-04-06 19:34

>>10
British or American billions?

Name: Anonymous 2009-04-06 19:41

>>7
Because it's a good OS.

Name: Trollbot9000 2009-07-01 10:44

Xargs cat out txt Thanks a lot  I am also  gay But I  invented RMS M  Stallman All I  am suggesting is  that it be  that way I  know I can  focus on what  you XOR with.

Don't change these.
Name: Email:
Entire Thread Thread List