The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

IA and Data Elements

Are multi-dimensional arrays better than arrays of objects? From an IA perspective, it may be easier to think of things as objects and then diving in for whatever data is desired. From a programing perspective, are there any real advantages to using multi-dimensional arrays?
nonymous
Tuesday, June 07, 2005
 
 
Um, some context for your question would be nice. First off, which of the many expansions of the acronym IA do you mean? Second, what programming language(s)/environment(s)/application domain(s) are you talking about?
Exception guy Send private email
Tuesday, June 07, 2005
 
 
IA for Information Architecture, the discipline loosely speaking. My background is mainly in Java and C++. I ussually end up grouping data in objects, tossing them in a java collection and using a comparator if need be to get the results I want. It works but I wonder if multi-dimensional arrays might offer some speed or retrieval advantages in some situations.
nonymous
Tuesday, June 07, 2005
 
 
Without knowing your particular situation, the main advantage of using arrays over what you describe is less memory used, especially if working with primitive types.

Memory overhead aside, I would expect that access time and sort time on an ArrayList are comparable to using arrays, (assuming same data), with the Java array solution being slightly faster from ditching the object creation overhead, and the C++ array solution faster than that thanks to fewer CPU instructions.  The differences may be negligible on your data.  Also, the underlying platform will make for some differences.

Any collection that doesn't implement RandomAccess will potentially be much slower for retrievals than arrays.  The relative complexity of the compare functions between the Collection and array solutions will make a difference, too.
Matt Brown Send private email
Tuesday, June 07, 2005
 
 
There have been cases (in Java code) where I've preferred to use arrays of primitives rather than collections of objects. The results can be orders of magnitude faster, depending on what types of operations you'll need to perform on the data.

For example, I've written an analysis and simulation package for historical stock market data. On every trading day, each security has five pieces of data: the open, high, low, and closing prices, as well as the trading volume. From a object-oriented design perspective, I was tempted to create a TradingDayBean object, wrapping these five values together and associating them with a TradingDay object. Then, to create a stream of data, I'd create a List containing TradingDayBean objects.

As it turns out, though, that would be a pretty inefficient design. The system would need to create tens of millions of those TradingDayBean objects, and then during the analysis process, it would need to access each of them multiple times performing statistical computations on their values. The resultant system would be very slow.

Instead, I created a class called a QuoteStream, which contains five different float arrays:

float[] openPrices;
float[] highPrices;
float[] lowPrices;
float[] closePrices;
float[] volumes;

If any of the other classes want to grab a TradingDayBean from this data, they can call a getTradingDayBean() method, which constructs the bean upon request by accessing the appropriate entries in the primitive arrays. However, other classes--which iterate over the data sequentially by date--can bypass the one-bean-at-a-time request model, instead operating directly on the arrays of pricing data.

It's incredibly fast. By using this type of design, I can iterate over all of the pricing data in the system (constructing moving averages, standard deviations, relative strength indices, and other financial indicators), performing over a hundred million calculations in less than twenty minutes.

Anyhow, the point of all this is that usually it makes the most sense to use the data model that makes the most sense. Think carefully through the types of tasks that you'll ask your application to perform and determine the best way of structuring the design. Generally, there will be a data model that's both elegant from both a structural perspective and from a performance efficiency perspective.
BenjiSmith Send private email
Tuesday, June 07, 2005
 
 
Benji is using the APL / K / J data model (which, in some sense, can even be attributed to Fortran). It works better than OO for large homogenous data sets. From my experience, 10 times the performance and significantly shorter and simpler code is NOT uncommon.
Ori Berger
Wednesday, June 08, 2005
 
 
Thanks for your help and insight. I used a similar structure in a program last year and had no idea this structure was based on the APL data model. I decided to use it so I wouldn't have large numbers of if/else statements in the programming logic. All I had to do was load up parallel arrays, sort them, and I got the data I wanted. Very simple and fast. I hadn't considered the advantages of the same structure for large amounts of data the way Benji explained. With all the OOP hype it's tempting to use objects for everything. I looked up the APL language (http://www.users.cloud9.net/~bradmcc/APL.html) because I didn't know what Ori was talking about, It's very intriguing.
nonymous
Wednesday, June 08, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz