[OT] statistical calculations
r.grenyer at imperial.ac.uk
Mon Nov 24 10:53:29 EST 2003
Likewise, as a heavy-ish R user, I'd say go look, immediately. R is a
stunning piece of software anyway, but I suspect the level of interface
between most major language packages (as the previous poster said, it
talks both ways to Tcl, Perl and Python to name but a few) and database
implementations alone would make it your first stop. *Most*
statisticians would love you for it, too.
On Monday, Nov 24, 2003, at 14:29 Europe/London, Andrew Piskorski wrote:
>> From: "Robert G. Brown" <rgb at phy.duke.edu>
>> To: Martin WHEELER <mwheeler at startext.co.uk>
>> On Sun, 23 Nov 2003, Martin WHEELER wrote:
>>> I have to process a group of several thousand acquired datasets, each
>>> containing well over one hundred numerical items; and eventually, I'm
>>> going to have to work with a statistician to pull some meaningful
>>> figures out of it all.
>>> In other words, the data have to be massaged in some pretty fancy
>>> For various reasons outwith my control this is being done principally
>>> via a spreadsheet (wouldn't have been an obvious choice for me, but
>>> I only know about words, not numbers). Can anyone on this list used
>>> doing this stuff point me towards a GPLed spreadsheet with built-in
>>> statistical functions? or an add-in to gnumeric / OpenOffice etc.?
>>> (I believe such exist.) Or maybe a library of GPLed spreadsheet
>>> Please correct me if I'm barking up a wrong tree here.
>> Ask on the GSL (Gnu Scientific Library) list. There have been
>> on the list of people wrapping/encapsulating list functions in various
>> ways, but I can't remember offhand if any of them were inside a
>> spreadsheet per se. It also depends to some extent on what you mean
>> "built in statistical functions" -- GSL has the basic functions but is
>> not a package like R. Which is the second thing you should probably
>> look at on: www.r-project.org. R is a full-service stats suite with a
>> variety of interfaces including web -- hopefully somebody has wrapped
>> up into a spreadsheet of some sort.
> Martin, R should definitely do whatever statistical stuff you want.
> There is also an R plugin for the Gnumeric spreadsheet, and some stuff
> to let MS Excel call R. I've never tried either of those plugins, but
> they might be good if you don't want to use R directly:
> For general vendor data clean-up and conversion issues, well, that
> depends. :) You didn't say enough for me to know whether you need to
> worry about that or not, but most of the vendor data I've seen (not in
> linguistics) has always needed cleanup of some sort!
> In my own line of work, for that sort of thing (which means for
> financial/market data), I mostly write Tcl code to read and manipulate
> the files, shove all the data into an RDBMS like Oracle or PostgreSQL,
> then sometimes do additional processing in the database. This works
> well, but if you're not already using an RDBMS you probably should NOT
> want to get into that for just for this one application.
> Most likely, as long as your data all fits (or almost fits?) into RAM,
> and you don't need the many-readers many-writers (concurrency,
> atomicity, etc.) support that a real RDBMS provides, stuffing all your
> data into a R's built in matrix or dataframe types should be fine.
> Depending on what the vendor files look like to begin with, you may
> want to pre-process them a bit with a Tcl, Perl, Python, or whatever
> script first to make them easier to get into R via R's read.table()
> Andrew Piskorski <atp at piskorski.com>
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
Beowulf mailing list, Beowulf at beowulf.org
To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf