The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

language design: closure semantics

I'm making a block-structured, function-oriented, imperative programming language.  You might think of it as C combined with APL and some random stuff.  This is not a high level garbage collected language.  Objects are destroyed when they go out of scope.  None of that "unlimited extent" stuff where function call frames are on the heap rather than the stack.

Functions are first class objects, stored in variables.  They have a way to get to variables of the blocks that they are nested in, even if intervening stack frames are gone because the function returned.  Additionally they will be able to capture variables explicitly, making them closures.  So a function object is stored as three pointers:

code
environment
captured environment

I know exactly how I am going to implement the code and environment portions.  I am unclear how I should handle the captured environment.

Actually let me make this clearer with an example:

int a=5;
int b=6;

f ()->() new(b);
{
    a+=b;
};

b=2;

g ()->() new(b);
{
    a+=b;
};

f(); //a+=6 so a==11
g(); //a+=2 so a==13

You see because of the new(b) syntax (actually this syntax is likely to change but it fine for this example) the value of b is captured, but a is not.  So the same a is modified by both functions.

Also f and g are not the names of functions, they are variabls of type ()->() (function not taking or returning parameters).  So now what if we do this:

h ()->() = f;

This copies f into h.  Here is the question: which is more intuitive, sharing the same b or copying b?

You would think that this is simply an implementation issue, but what if the function modifies its captured variables?  Now it is a semantics issue.

b int=0;

f ()->() new(b)
{
    b+=2;
    print(b);
}

f(); //print 2
f(); //print 4;
h=f;
f(); //print 6
h(); //print 6? print 8?

I am inclined to think that they should be separate, and you can use references or pointers if you want them to share.

So am I making the right decision for my language?  Yes it is a strange language and it seems like I am bending over backwards, but that is what you have to do to bring powerful features to a statically-typed imperative language.
Tom C
Sunday, September 03, 2006
 
 
You'll need to add garbage collection if your closures capture shared environment frames.  The case where a lambda value is stored in the very environment frame it has captured will come up all of the time.
Kalani Send private email
Sunday, September 03, 2006
 
 
What do you see the "new" facility being used for?

(If you have it simply share the name, you don't need the facility at all, because the name is already shared by dint of being in the environment that the closure has captured. So I guess it would have to make a fresh copy for each closure. But I can't think of a situation in which I at least have needed such a thing. At least, not when the name in question wasn't a parameter to the function that constructed the closure (and therefore unique for each closure anyway) -- hence my question.)

And, out of interest, how will the language cope with things like this?

  f(int x)->()->int
  {
    int y=x+2;
    g()->int
    {
      return y;
    }
    return g;
  }

Unless you do restrict the ways in which closures can be given names, go the whole hog and heap-allocate the frames, or copy the values in the environment into each closure (which is not efficient), it strikes me this will be a potential source of confusing bugs.
Tom_
Sunday, September 03, 2006
 
 
Agh! By "share the name", I mean "share the value", so that you in effect have two views of the same value. This is just what you get if defining a closure puts in it a copy of the environment pointer that happens to be in effect when it is made, which is how these things usually work. (Because this sort of thing usually goes hand in hand with garbage collection, you just do this, and let the garbage collector work out the rest.)
Tom_
Sunday, September 03, 2006
 
 
you see that is the difference: I am not capturing environments.  Function calls put their local variables on the stack.  When a function returns the destructor for every local variable is called, just like in C++.

However there is a secret tree being built in the background that mirrors the tree that the environments would make.  This secret tree has pointers to stack frames.  The secret tree is garbage collected as it needs to be in both the destructor of functions and the epilogue of functions.

So essentially I am going a hybrid of the stack frame/ heap environment approach.

This may lead to confusing bugs, where you try to access a variable in a frame that has returned.  The variable will have been written over.  Well I can probably throw some sort of exception or interrupt automatically but I'm not at that stage yet.

So anyway that is why I have the "new" syntax for now.  It is to explicitly capture variables, because there is no implicit capture as in probably every other language.

ok here is your example, changed a little:

 f (int x)->(g ()->(int) )
  {
    y int = x+2;
    g = ()->(r int)
    {
      r=y;
    };
  };

yes this will be a bug.  The function held in g works fine until f returns, at which point y disappears.  If you add "new(y)" then the returned function keeps its own copy of y which it can return whenever it needs to.

or maybe the correct syntax will be something like this:

 f (int x)->(g ()->(int) )
  {
    y int = x+2;
    g = ()->(r int) y int = y
    {
      r=y;
    };
  };

so that you can use the feature in more ways.  You wouldn't have to name the inner variable y of course, I sure wouldn't.

So yeah this is tricky, but this is a value oriented language, not like java with its reference model.  I'm trying to make something that gives you the feel of an old-fashioned straight-forward language.  So no treating "Objects" and primatives differently.  Everything is consistant.  Think of pointers in C++.  In C++ you can make a pointer to a local object.  No bug.  You can pass it to a function, no bug.  You can return it.  That's a bug.  However what if you are passed a pointer and return it?  No bug.  Now there is no way to track where a pointer originally came from so there is no way to check.  I want analagous behavior for my function objects.  That way they will be simple to understand.
Tom C
Sunday, September 03, 2006
 
 
Lua is a very different language, but they did some interesting things with closures...
noname
Sunday, September 24, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz