I'm a few years behind here, but:
In 'Edit 4/5/6' of the original post, you are using the construction:
$ /usr/bin/time cat big_file | program_to_benchmark
This is wrong in a couple of different ways:
You're actually timing the execution of
cat, not your benchmark. The 'user' and 'sys' CPU usage displayed by
time are those of
cat, not your benchmarked program. Even worse, the 'real' time is also not necessarily accurate. Depending on the implementation of
cat and of pipelines in your local OS, it is possible that
cat writes a final giant buffer and exits long before the reader process finishes its work.
cat is unnecessary and in fact counterproductive; you're adding moving parts. If you were on a sufficiently old system (i.e. with a single CPU and -- in certain generations of computers -- I/O faster than CPU) -- the mere fact that
cat was running could substantially color the results. You are also subject to whatever input and output buffering and other processing
cat may do. (This would likely earn you a 'Useless Use Of Cat' award if I were Randal Schwartz.
A better construction would be:
$ /usr/bin/time program_to_benchmark < big_file
In this statement it is the shell which opens big_file, passing it to your program (well, actually to
time which then executes your program as a subprocess) as an already-open file descriptor. 100% of the file reading is strictly the responsibility of the program you're trying to benchmark. This gets you a real reading of its performance without spurious complications.
I will mention two possible, but actually wrong, 'fixes' which could also be considered (but I 'number' them differently as these are not things which were wrong in the original post):
A. You could 'fix' this by timing only your program:
$ cat big_file | /usr/bin/time program_to_benchmark
B. or by timing the entire pipeline:
$ /usr/bin/time sh -c 'cat big_file | program_to_benchmark'
These are wrong for the same reasons as #2: they're still using
cat unnecessarily. I mention them for a few reasons:
they're more 'natural' for people who aren't entirely comfortable with the I/O redirection facilities of the POSIX shell
there may be cases where
cat is needed (e.g.: the file to be read requires some sort of privilege to access, and you do not want to grant that privilege to the program to be benchmarked:
sudo cat /dev/sda | /usr/bin/time my_compression_test --no-output)
in practice, on modern machines, the added
cat in the pipeline is probably of no real consequence.
But I say that last thing with some hesitation. If we examine the last result in 'Edit 5' --
$ /usr/bin/time cat temp_big_file | wc -l 0.01user 1.34system 0:01.83elapsed 74%CPU ...
-- this claims that
cat consumed 74% of the CPU during the test; and indeed 1.34/1.83 is approximately 74%. Perhaps a run of:
$ /usr/bin/time wc -l < temp_big_file
would have taken only the remaining .49 seconds! Probably not:
cat here had to pay for the
read() system calls (or equivalent) which transferred the file from 'disk' (actually buffer cache), as well as the pipe writes to deliver them to
wc. The correct test would still have had to do those
read() calls; only the write-to-pipe and read-from-pipe calls would have been saved, and those should be pretty cheap.
Still, I predict you would be able to measure the difference between
cat file | wc -l and
wc -l < file and find a noticeable (2-digit percentage) difference. Each of the slower tests will have paid a similar penalty in absolute time; which would however amount to a smaller fraction of its larger total time.
In fact I did some quick tests with a 1.5 gigabyte file of garbage, on a Linux 3.13 (Ubuntu 14.04) system, obtaining these results (these are actually 'best of 3' results; after priming the cache, of course):
$ time wc -l < /tmp/junk real 0.280s user 0.156s sys 0.124s (total cpu 0.280s) $ time cat /tmp/junk | wc -l real 0.407s user 0.157s sys 0.618s (total cpu 0.775s) $ time sh -c 'cat /tmp/junk | wc -l' real 0.411s user 0.118s sys 0.660s (total cpu 0.778s)
Notice that the two pipeline results claim to have taken more CPU time (user+sys) than real wall-clock time. This is because I'm using the shell (bash)'s built-in 'time' command, which is cognizant of the pipeline; and I'm on a multi-core machine where separate processes in a pipeline can use separate cores, accumulating CPU time faster than realtime. Using
/usr/bin/time I see smaller CPU time than realtime -- showing that it can only time the single pipeline element passed to it on its command line. Also, the shell's output gives milliseconds while
/usr/bin/time only gives hundredths of a second.
So at the efficiency level of
wc -l, the
cat makes a huge difference: 409 / 283 = 1.453 or 45.3% more realtime, and 775 / 280 = 2.768, or a whopping 177% more CPU used! On my random it-was-there-at-the-time test box.
I should add that there is at least one other significant difference between these styles of testing, and I can't say whether it is a benefit or fault; you have to decide this yourself:
When you run
cat big_file | /usr/bin/time my_program, your program is receiving input from a pipe, at precisely the pace sent by
cat, and in chunks no larger than written by
When you run
/usr/bin/time my_program < big_file, your program receives an open file descriptor to the actual file. Your program -- or in many cases the I/O libraries of the language in which it was written -- may take different actions when presented with a file descriptor referencing a regular file. It may use
mmap(2) to map the input file into its address space, instead of using explicit
read(2) system calls. These differences could have a far larger effect on your benchmark results than the small cost of running the
Of course it is an interesting benchmark result if the same program performs significantly differently between the two cases. It shows that, indeed, the program or its I/O libraries are doing something interesting, like using
mmap(). So in practice it might be good to run the benchmarks both ways; perhaps discounting the
cat result by some small factor to "forgive" the cost of running