is there any way to do it that's not slow as fuck?
Name:
Anonymous2009-01-20 3:56
yes there is.
Name:
Anonymous2009-01-20 3:57
>>1
Yes. Iterate through the file, each time choosing whether to keep the current choice or choose the current line instead. The first line is 1 likely to be picked. The second is 1/2 likely. The third is 1/3 likely. The fourth is 1/4 likely. The nth is 1/n likely. When you run out of lines, return the current choice.
Whether this is slow as fuck is a matter of opinion, of course. Disk access is slow as fuck.
_________________________
orbis terrarum delenda est
Name:
Anonymous2009-01-20 4:35
>>3
i tried that. it was about 10 times as slow as reading the whole file into a list, generating a random number between 0 and the length of that list - 1, and then indexing the list.
and even that was about 100 times as slow as this simple perl script: #!/usr/bin/perl
my ($i, $line) = (0, '');
open FILE, '<file.txt';
while(<FILE>) $line = $_ if rand($i++) < 1;
close FILE;
print $line;
Name:
Anonymous2009-01-20 4:43
Look at the c that your Haskell generates. And post it here.
randomLine f l n = do
eof <- hIsEOF f
random <- getStdRandom (randomR (0,n))
if eof then return l else do
line <- hGetLine f
if random < 1 then randomLine f line (n + 1) else randomLine f l (n + 1)
main = do
file <- openFile "file.txt" ReadMode
randomLine file "" 0 >>= putStrLn
my ($i, $line) = (0, '');
open FILE, '<file.txt';
while(<FILE>) { $line = $_ if rand($i++) < 1; }
close FILE;
print $line;
c code that i wrote: #include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(void){
char line[LINE_MAX], ret[LINE_MAX];
FILE *file = fopen("file.txt", "r");
if(file) for(long long i = 1; !feof(file); ++i)
if(fgets(line, LINE_MAX, file) && arc4random() % i < 1)
strcpy(ret, line);
fputs(ret, stdout);
return 0;
}
time results (averaged over 10 runs, with a 178690 line file):
haskell: 2.251s user 0.109s system
perl: 0.516s user 0.02s system
c: 0.093s user 0.015s system
Name:
Anonymous2009-01-20 6:58
Try using ByteString.
Name:
Anonymous2009-01-20 7:10
>>9
that's a tiny bit faster (2.06s user 0.093s system), but still a lot slower than perl.
args=$(getopt u $*)
if [ $? -ne 0 ]
then
echo 'Usage: rac [-u] [file ...]'
exit 1
fi
set -- $args
for i
do
case "$i" in
-u) buff="-r";;
--) break;;
esac
shift
done
shift
for file
do
$randomcmd $buff -f $file
racstdin=false
done
>>27
consider a file where half the lines are about 900 characters and the other half are about 100 characters. your method would choose a long line 9 times out of 10.
Name:
Anonymous2009-01-20 20:34
>>30
So >>27's method fails on some pathological input-- so what? It's still better than every other suggestion put forward so far.
Name:
Anonymous2009-01-20 20:36
Use a real database system
Name:
Anonymous2009-01-20 20:37
>>31
That's not true. For almost EVERY input there would be bias with your method. You can't sacrifice functionality for optimization.
Name:
Anonymous2009-01-20 21:57
>>31
input where every line is the same length would be pathological input. in the real world most text files have lines of different lengths.
>>32
that would be slow as fuck and overkill when the only thing i'm ever going to do with this text file is pick a random line from it each time the program is run... but now that i think about it, maybe it'd be a good idea to just make a file containing the positions of all the newlines as fixed-size integers, then i can seek to a random position in that file (modulo the length of each integer), read the number, and seek to the right place in the text file...
Name:
Anonymous2009-01-20 21:58
>>33 Prop. I. I am not >>27. Prop. II. The OP never said he wanted unbiased results.
>>36
it'd be a lot slower than make a file containing the positions of all the newlines as fixed-size integers, then i can seek to a random position in that file (modulo the length of each integer), read the number, and seek to the right place in the text file...
>>37-39 $ shuf --version
zsh: command not found: shuf