[Beowulf] C vs C++ challenge

Robert G. Brown rgb at phy.duke.edu
Wed Jan 28 00:14:32 EST 2004


On Mon, 26 Jan 2004, Andrew Shewmaker wrote:

> Jakob Oestergaard wrote:

>  > I still guarantee you two things:
>  > 1) Your code will be longer
>  > 2) Your program will be slower
>  >
>  > As always, I love to be proven wrong  ;)
> 
> Here's a similar program...word frequency...instead of word count.
> 
> http://www.cs.bell-labs.com/cm/cs/pearls/sec151.html
> 
> His C++ code is shorter, but his C code is faster.
> 
> With a trivial modification, his C code can meet your challenge too.
> 
> http://www.cs.bell-labs.com/cm/cs/pearls/wordfreq.c

There are two or three very interesting observations to make about both
Jakob's challenge and this response.  One is that the code is indeed
longer.  Or is it?  One obvious feature of the C++ code is that it
relies heavily on the STL.  Is this cheating?  Perhaps not exactly, or
not quite, but neither is it totally fair WHEN COMPARING CODE LENGTH.
One could encapsulate all the code in ANY such program into a single
subroutine call and then say "look how short my program is", and
although perhaps the library IS a standard feature of C++, it is also
very much a library whose functionality could, in all probability, be
pretty much duplicated in C.  And of course as expected C++ appears to
lose the speed challenge (which should be no great surprise, as the C
code that DOESN'T encapsulate a solution is pretty close to as efficient
as assembler and is in any event dominated by I/O.

So let's give C++ points for having a powerful library.  Let's take
points away because it runs slower and because HAVING the library,
especially a POWERFUL library, tends to channel your program design in
ways that can take advantage of the library but that might not be
horribly efficient.  Let's also not give it too MANY points for shorter
program length, both because the library hides a LOT of code, you
betcha, and because using the library renders the program totally opaque
to somebody that doesn't know C++ and the library.

As in maybe it's just me, but when I look at Jakob's code I have no idea
how it works.  What are these routines?  What do they do?  Where's the
logic?  Oh, I can read the comments and see the keywords and sort of
guess that all sorts of dark magic is being done; I just have no idea
how it does it, any more than I would if one took Jakob's code or my
code or anybody's code for the same task and wrapped it up in a single
routine:

int count_words_from_stdin();

main()
{

 int count;
 count = count_words_from_stdin();
 printf("I counted %d words\n",count);

}

Simple code -- pretty readable without any comments at all.  However,
not terribly understandable or useful, without being able to see what
the routine count_words_from_stdin() in fact does in however many lines
of code it hides.  It could even be written IN assembler, and take a LOT
of lines and be VERY efficient.

It would be my hope and expectation that C++ is more than just a big
library with some nifty stuff added on top of the already fairly hefty
(but incrementally includeable, you only link to what you need) and
relatively low level library set one has to work with in C.  In fact, to
convince me to try C++ you'd have to show me code that is NOT this
compact -- code that in fact defines all the various class/objects used
the hard way, uses (and I mean really USES) inheritance and protection
and so forth, in ways that cannot be trivially duplicated in C code (and
that, in fact, doesn't look very much LIKE the associated C code).  The
fundamental issue is whether or not "objects" are somehow managed better
in C++.  Beyond that, even its relatively strong typing and powerful
library are not language issues, they are features added to C++ as a
more recent revision of C and could probably be added to C as well, if
anybody really cared.  As the KR to ANSI transition shows, perhaps
somebody one day will care, so this isn't an entirely moot point.

You might argue, perhaps with complete accuracy, that WRITING such a
high level library in C would be difficult because it exploits many of
the language features of C++.  I'd argue back that projects such as the
GSL suggest otherwise -- the GSL code is certainly written in a very
"object-oriented" way, and there are constructors and destructors and so
forth where they make sense, but underlying those entities are ordinary
C structs as the essential objects, and you can read the GSL headers and
work directly with the structs if you prefer to work with the naked data
for reasons of speed even in cases where they do provide a get/set
method for accessing components of an object.  In my own code I
typically use lots of objects, and when it is reasonable to do so (when
they are very complex or have multiple layers of pointers that have to
be freed in a particular order) I use constructors and destructors for
those objects.  Lots of times I don't.  I NEVER use "methods" for
objects -- if I want to operate on the contents of a struct I operate on
the contents of a struct, with a subroutine or without it as the needs
of the code dictate.

I think such a argument would eventually end up back where I left us.

C++ a) shapes code in certain ways that go beyond the mere use of
"objects" or "object oriented methods" in programming which is ALWAYS an
option in any language that supports some sort of implementation of a
struct or union or (in the worst possible case) even a raw data block
with suitable offsets (fortran, anyone:-); b) provides strong typing,
which I'm willing to acknowledge could be a real advantage (although one
CAN cast void types in C and accomplish very subtle things one does have
to be a programming god to do it and not occasionally get bitten by it);
c) provides library extensions (quite possibly powerful extensions) that
encapsulate a variety of useful tasks so that they can be done with a
relatively few lines of the concept of YOUR code (plus all the hidden
lines of library code, of course); d) at the cost of transparency and
user-controllable logic and user access to some components of the code.
This is just restating a) again, and around we'd go.

If I were to write code to read lines from stdin, parse them into words
on the basis of a set of separators e.g. space, tab, comma, period, put
them into a vector of strings (uniqued or not) and count and print the
final result on EOF it would certainly be longer than the C++, no doubt,
but it would also be literally as fast as it could be at execution
(probably comparing decently to assembler) and any programmer, even
someone that didn't program in C but minimally understood the concept of
the malloc family for memory management, could likely make out the
logic.

Preferring this to writing a program that just calls STL functions or
somebody else's count_words_from_stdin() is doubtless a matter of taste,
and there are doubtless programmers that span the range (taste-wise)
from "real programmers only program in assembler" to perl or python
(where the code would also be very short -- a single transparent loop
and an output statement in perl) to "I like LISP".  So tastes differ,
experiences differ, and needs differ as well.  Is the C++/C debate about
taste, experiences, needs?

When I talk to CPS faculty, they make no bones about it -- they teach
C++ so that their students can learn structured programming in an
environment with strong typing and lots of structure, and they feel that
in today's programming world performance is less important than ease of
programming and structure and having a powerful library.  They
compromise helping students develop skill in the "details" of procedural
programming for the ability to quickly put together projects using big
building blocks in a collegial/corporate programming environment.  They
KNOW (in many cases) that they are doing this; they just consider
ability to be a good, diligent programmer that follows the rules and
colors within the lines to be more important than knowing how pointers
and malloc works.  Is this a good tradeoff?  I dunno.

This is a bit different from when I learned to code -- there WERE no
powerful libraries, and a whole lot of our programming exercises were
what amounted to writing code to process lists of words or numbers in
various ways, to sort, to build little code machines on a microscopic
level within the confines of the language.  In one course we even wrote
an assembler simulator/interpreter and a compiler interpreter WRITTEN in
our assembler interpreter (all punched out on cards for our IBM 370 in
the upper level language du jour, which I think was PL/1 at the time).
Useless for developing programming skills, unless you think that good
programming skills are developed writing a really complex project that
forces you to duplicate the way a computer really really works, and then
write a complex program that works on your simulated computer...

A lot of the kids learning programming now don't learn "how can one sort
a list"; they learn how to call a routine named "sort", and extensions
thereof (at least until they are a relatively upper level class).  This
works just marvey, of course, as long as the entity they need to sort
and the sort criterion can be fed to the subroutine successfully in a
form that it can work with... but one cannot help but feel that
something is lost, just as something was lost even with C when people
who needed random numbers just naively called "rand()" and were a bit
surprised when, um, the results weren't horribly random...

So perhaps my problem is that I'm just an Old Guy(tm) and punched too
many cards in too many mediocre languages in my day.  For me learning C
was liberating, exhilerating, empowering, and I continue to discover
clever little things it can do to this very day.  Portable assembler,
that's what C is, plus a bit of upper-level dressing (some of which
COULD be a lot nicer, no doubt).  But I will, just maybe, give C++
another try some day in the future when I have time and have found a
really good book on it.  Every time in the past I was working without a
decent text and with a questionable (Microsoft or Borland) compiler.  If
g++ is, as you suggest, a pure superset of gcc (a few extra keywords
don't matter) I can always give using a class a try sometime when I want
to make a new struct, and see just how many hoops I have to jump through
to do what I want to do.

   rgb

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu




_______________________________________________
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list