[Beowulf] C vs C++ challenge (awk version)
Selva Nair
selva.nair at utoronto.ca
Thu Jan 29 17:59:13 EST 2004
On Thu, 29 Jan 2004, rgb wrote:
> On Thu, 29 Jan 2004, dc wrote:
>
> > > I still guarantee you two things:
> > > 1) Your code will be longer
> > > 2) Your program will be slower
> > >
> > > As always, I love to be proven wrong ;)
> >
> >
> > Here is another try at that, this time in Java.
> >
> > file size C++ j client j server
> ...
> > shaks12.txt 5582655 0m4.476s 0m3.321s 0m2.842s
>
> And here is a version in C. It is longer, no question. It does its own
> memory management, in detail, in pages (which should be nearly optimally
> efficient). It is moderately robust, and smart enough to recognize all
> sorts of separators (when counting words, separators matter -- hence
> this program will find more words than e.g. wc because it splits things
> differently).
But this one does not count unique words, does it?
Here is my version in awk. It beats C++ by 1 line in length and
1.5 times in speed (1.86s versus 2.83s elapsed time) with shaks12.txt as
input.
[selva at scala distinct_words]$ wc shaks12.txt
124456 901325 5458199 shaks12.txt
This copy of shaks12.txt has been filtered by dos2unix.
Timings:
First my awk script (with GNU awk 3.1.0)
[selva at scala distinct_words]$ /usr/bin/time ./dwc.awk shaks12.txt
Number of distinct words = 67505
1.82user 0.04system 0:01.86elapsed 99%CPU
Now the original C++ code (compiled by g++ 2.96).
[selva at scala distinct_words]$ /usr/bin/time ./dwc < shaks12.txt
Words: 67505
2.79user 0.04system 0:02.83elapsed 100%CPU
Here is the script:
#!/bin/awk -f
{
for(i = 1; i <= NF; i++) {
if (words[$i]) continue;
words[$i] = 1 ;
++nwords;
}
}
END {
printf "Number of distinct words = %i\n", nwords;
}
Selva
_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf
mailing list