The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Multiple Inheritance and Constructors

In C++ sometimes I like to skip the constructors completely and just cast some data straight into a class. This is entertaining to me to do with stuff like things that are struct data that is dumped straight to a disk file and then read back in. But I have functions to act on the raw binary data and damned if don't find it convenient to call them class member functions, which I can do through the magic of the coersive cast operation.

OK, I know that some C++ folks are having strokes right now and firing up vi to rant at me about this practice, but hey, it works great and I get to skip the nasty steps of writing serializers for these classes - I just dump them straight out.

Now of course, you simply can NOT have any virtual functions in any classes because the class can NOT have a vtable since that will never get initialized using this positively MAD method that I, the deranged scientist, have devised. (insert evil laugh)

So I was just idly wondering here. So far these things are really structs with functions that operate on them. But what if I were to use multiple inheritance? Would that work as well? I am pretty sure that there are no vtables just because you use multiple inheritance unless you have some virtual members. Right? Or is there some initialization that a class constructor has to perform on non-virtual multiply inherited classes?

I'm going to find out by experiment but I thought I'd toss this out in case any one has any guesses in advance.

Thanks for listening and I look forward to hearing your thoughts.
Scott
Tuesday, November 21, 2006
 
 
It depends on platform, compiler, compiler version, and moonphase. The C++ standard doesn't specify anything about the internal structure of objects, beyond (possibly) what's specified for C structs. Your compiler manual might, but I doubt it.

My guess is that most compilers won't insert a vtable unless you have virtual functions. However, there might be a table of offsets to the start of each base class's members or something similar.

Code like you've described and the data files that it produces are inherently unportable. The next compiler upgrade could completely break it. I'm sure you already know that, but it's worth pointing out for the benefit of others who are reading.
clcr
Wednesday, November 22, 2006
 
 
Yes, its definitely unportable.

However, it is encapsulated, and it can be made somewhat portable if necessary in that the classes always come with a version-id at the top so if it was ever necessary converters can be written.

This is all of course an old unportable C technique to read straight into a struct, but the cool part is that with C++ I now have accessor functions so if the binary format does change, the change is isolated from the rest of the program. And yet I have the advantages of streaming to and from disk with absolutely the least possible overhead, which is mostly why I do it.
Scott
Wednesday, November 22, 2006
 
 
Running this program on your system might be informative.

#include <stdio.h>

class foo
{
public:
    unsigned x;
};

class bar
{
public:
    unsigned y;
};

class baz : public foo, public bar
{
public:
    unsigned int z;
};

int main(void)
{
    baz obj;
    printf("object %p, x %p, y %p, z %p\n",
        (void *)&obj, (void *)&obj.x, (void *)&obj.y, (void *)&obj.z);
    printf("next object %p\n", (void *)(&obj + 1));
    return 0;
}


Output when built with G++ 4.0.0 on OS X:

object 0xbffff818, x 0xbffff818, y 0xbffff81c, z 0xbffff820
next object 0xbffff824
clcr
Wednesday, November 22, 2006
 
 
Thanks, that's a great idea.
Scott
Wednesday, November 22, 2006
 
 
I made the member variables longs to avoid any alignment issues and then did this in main:

  int overhead = sizeof(baz) - sizeof(foo) - sizeof(bar) - sizeof(long);
  printf("Multiple non-virtual inheritance has %d bytes overhead.\n", overhead);

This enables a portability sanity check, which I can even assertion check for:

assert(0 == sizeof(baz) - sizeof(foo) - sizeof(bar) - sizeof(long));
Scott
Wednesday, November 22, 2006
 
 
Actually my last post is plain wrong in terms of detecting overhead. If you stick a virtual function in bar, you still get 0 bytes difference because the vtable is inside of bar, and hence incorporated into the total for baz. So it gives 0 bytes overhead, but there's still this vtable in there.

Now, if I write out baz, it'll be the data along with this hidden vtable pointer in the middle. But I could read it in in a buffer, and then use placement new to pass the buffer into the constructor and it should init the vtable and accept the data that's already there. Come to think of it, that's probably exactly what placement new is really for!

  baz *newbaz = new(buffer) baz;

I'll just start doing that then.

But wait... can I use placement new in an array allocation???

How would that go?
  baz *newbazarray = new (buffer) baz[15]; // no way, right?
Scott
Wednesday, November 22, 2006
 
 
And I just remembered (for those following along) that you have to manually call the destructor and I *think* that when you do that, the destructor only goes through its code but doesn't actually do any deallocation.

newbaz->~baz();

and maybe:

newbazarray->~baz(); // where the heck does the [] go? guh? THis is advanced C++ or maybe deranged C++ but placement new exists in the language so there must be something about whether it works on arrays or not...
Scott
Wednesday, November 22, 2006
 
 
It's gotta be this...

for (int i=0; i<15; i++)
  newbazarray[i].~baz();
Scott
Wednesday, November 22, 2006
 
 
OK, g++ refused to compile that, so I had to go and look it up. Turns out I got it all right and g++ is just a backwards compiler that doesn't implement C++ properly.

However, I did get the destructor wrong - you have to destruct backwards to get the C++ semantics right because destruction must happen in the reverse order of construction. So it should have been:

  for (int i=15; i;)
    {
    newbazarray[--i].~baz();
    }

Unless there is some hidden way to turn on placement new for arrays in g++, I'm just putting this on the list of ways in which Richard Stallman has personally failed me.
Scott
Wednesday, November 22, 2006
 
 
Wait a minute, this has nothing to do with arrays. g++ doesn't like just a plain only single item placement new. That can't be possible. There must be a compiler switch I'm forgetting or something?
Scott
Wednesday, November 22, 2006
 
 
Oh this is such a pain! There are plenty of references on the net to placement new working fine in g++, yet I get:

error: no matching function for call to 'operator new(long unsigned int, char*&)'

 from:

baz *newbazarray = new (buffer) baz;

GRR!
Scott
Wednesday, November 22, 2006
 
 
Problem solved:

#include <new>
Scott
Wednesday, November 22, 2006
 
 
What will you do if you upgrade your compiler and find that the new version lays the classes out differently, and thus you can't read your old data files?
clcr
Wednesday, November 22, 2006
 
 
My version numbers start with a magic number, so even if it gets rearranged, I can still identify the chunks and manually break it all apart if needed for conversion.

However, I see this as unlikely but if it happens its no big deal. I can write converters to swap chunks in 15 minutes or less.
Scott
Wednesday, November 22, 2006
 
 
It all seems like a whole lot of effort to re-implement what the compiler ought to be doing for you anyway, for no benefit other than being able to dump memory to disk. Which seems to be a relatively minor benefit, to me, if you claim to be able to write the serialisation/deserialisation code in "15 minutes or less" anyway.

If you're a hobbyist, do whatever the hell you like - portability and reliability on customer's computers hardly matters if you don't have customers. If you're writing commercial software, the serialisation/deserialisation code is such a small portion of what's needed to ensure actual long-term reliability that doing it the quick'n'dirty way strikes me as a very bad idea.

Wednesday, November 22, 2006
 
 
It's all so simple for you isn't it? I knew this thread would come to these holier-than-though posts.

You have no idea what I am implementing so you just make grandiose sweeping statements about what is the best way to do everything.

Yes, this is commercial all the way baby, and yes I wipe my ass with my competitors.
Scott
Wednesday, November 22, 2006
 
 
I really hope you're the only developer that only has to work on this code base, because the first developer that comes along is almost guaranteed to screw this up, regardless of their experience.

Thursday, November 23, 2006
 
 
"It's all so simple for you isn't it? I knew this thread would come to these holier-than-though posts.

You have no idea what I am implementing so you just make grandiose sweeping statements about what is the best way to do everything.

Yes, this is commercial all the way baby, and yes I wipe my ass with my competitors."

Good holier-than-thou post.

Seriously, without details, of course any statement will be general.  We trust you have the smarts to evaluate them with the specific data you have.

Sincerely,

Gene Wirchenko
Gene Wirchenko Send private email
Thursday, November 23, 2006
 
 
>> However, I see this as unlikely but if it happens its no big deal. I can write converters to swap chunks in 15 minutes or less.

So if it will only take you 15 minutes or less to do it the right way, why not just do it the right way?
SomeBody Send private email
Thursday, November 23, 2006
 
 
Somebody, I think you misunderstood the discussion.
Scott
Friday, November 24, 2006
 
 
The hypothetical situation was 'what if a meteor hits and blows up a house'. My answer was 'then we'll fix the house'. You then asked 'why not rebuild the house to begin with.' That doesn't make sense in the context of the discussion.
Scott
Friday, November 24, 2006
 
 
No he didn't, by your analogy he actually asked "why not rebuild the house _so it is meteor proof_ in the first place". Which is where your analogy breaks down.
Don't step on the troll.
Friday, November 24, 2006
 
 
Scott,

This helps you learn a lot about the internals of a C++ compiler, but it does not help you with building software.

Imagine you have a team of 20 developers who work on some 2-3 million lines of code across 5-10 supported product versions and 3 different OSs and 100 or so customers.

In this scenario your approach is a complete nightmare due to portability issues, compiler version and such (allignment, little/big endian problems and such). Also it is not easy to read and understand. Simply a no-go.
Dino Send private email
Friday, November 24, 2006
 
 
I guess it must just be blind luck that our software is ten times faster at file access than that of our nearest competitor.

And no, there is nothing conceptually difficult about this at all.

Don't tell me, you guys saying this is an unmaintainable nightmare are huge fans of template metaprogramming?
Scott
Friday, November 24, 2006
 
 
Scott,

it is not blind luck your file access is so fast. Obviously it is your optimization which makes it fast. But I suppose you already knew that :)

But in the context I gave you, your approach would simply not work. I'm working with 3 different UNIX flavors and every OS or compiler upgrade doesn't go without compile problems. And we're using standard C++ and not relying on compiler implementation.

PS I'm not a fan of templates. I simply try to use the right tool for the job.
Dino Send private email
Friday, November 24, 2006
 
 
One more thing,

think how much your "compiler should optmize this" guy would understand of your works :)
Dino Send private email
Friday, November 24, 2006
 
 
Why do you disallow virtual functions? The vtable pointer makes a handy space for your type ID. If you map type ID to a placement new thunk (you can use templates to generate you this automatically) that calls a special post-loaded constructor, you can call the thunk to make the vtables for you, and any post-load processing (fix up pointers, register objects, etc.) can go in the constructor.

This goes a long way to making the system work well, because it increases the number of things that can be placed in the file in the format they will eventually be used, and allows you to put each per-type setup code into the definition of each class.

For those suggesting that this stuff is unportable and unmaintainable, I've not found it so on either count, though of course it's only as maintainable as you make it, and I haven't tried every single possible combination of OS, CPU and C compiler. (I'd be interested to hear from anybody who has, though, and found this sort of thing not to work. What were the issues?)
Tom_
Saturday, November 25, 2006
 
 
Hi Tom, I agree with what you're saying, though I'd rather not have the id overloaded on top of the vtable stuff.

As to virtuals, at the start of the thread, I note that I've been using this technique with simple classes without vtables or multiple inheritance for its various advantages, including to sort of provide an API wrapper for raw binary data from a file. I didn't use virtuals because I didn't construct the class, but just cast into it, to be able to use the class's API.

In this thread, clcr and I were discussing whether there would work for objects with inheritance and virtual functions. I remembered placement new and had this realization that this was a perfect example of when to use it, which answered all my questions, as you've suggested in your response. I then went off of some of the various implications and answered clcr's concerns about how the system I already have (ids in every struct) would prevent data from being untranslatable in the event of switching architectures, or major compiler changes, or whatever. I think much of that is not too likely to be an issue in the first place (compiler deciding to lay out data differently), but if it was, it would be easy enough to solve that it doesn't concern me.
Scott
Saturday, November 25, 2006
 
 
Ah yes, so you did. Sorry. I automatically skip over multiple consecutive posts from the same author. This normally works to good time-saving effect, but not this time.
Tom_
Monday, November 27, 2006
 
 
I've worked on some software that does this (without the multiple inheritance) to write and read to a network socket.

My view is that there's always a tradeoff betweeen maintainability and performance optimisation. What you're doing breaks encapsulation of the class - you now have non-class code which just 'knows' the layout of the class internals. You very probably also have subclasses inside your class that are effectively also cast, and are therefore also not encapsulated.

If performance dictates that you HAVE to do something like this, well I guess you have to do it. But if it doesn't, I'd suggest you don't. One day you might want all the benefits of things like polymorphism, STL, and so forth in those classes. Maybe you'll even want to read and write the file format for previous versions of your own product, or those of your competitors. In those cases you'll usually be better off with a good serialisation scheme than with a bunch of casts.
Duncan Sharpe Send private email
Tuesday, November 28, 2006
 
 
Did some reading and here are what I've seen as problem pointers:

1) First _vtables are stored differently between UNIX and Windows; Windows seems to store the vtables first then data, UNIX the other way around.

class A {}
class B : public virtual A {}
class C : public virtual A {}
class D : public B, C {}

2) With multiple inheritance (especially virtual) things get pretty hairy for 2 reasons:
a) multiple _vtables; think cross-casting

B* pB = new D();
C* pC = (C*)pB;
assert(pB != pC);  // pB is not same with pC

b) Memory mapping of virtual inheritance is implemented differently by different compilers.
Dino Send private email
Wednesday, November 29, 2006
 
 
I already said in the first post that you couldn't have a vtable, dumbass.
Scott
Thursday, November 30, 2006
 
 
And why the name calling?
Dino Send private email
Thursday, November 30, 2006
 
 
That wasn't me posting the rude remark, but an imposter.
Scott
Saturday, December 09, 2006
 
 
Not a problem.

Back to the topic, as long as you stick to one platform and the memory layout generated by the compiler stays backward compatible your approach works. Afterall, no matter what the object layout is, you are reading / writing a data structure to a file. And as long as you don't try to initialize the _vtables you don't need to really know that layout - you're absolutely right about that part.

However, if you need to handle data migration between platforms / compiler versions that may prove a serious problem.

Finally, I would wrap the whole thing with a clean C-like API (just in case I need to replace it later and / or to make sure it's used correctly by the team).
Dino Send private email
Monday, December 11, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz