The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

help with my programming language please

Hi I'm making an expression-oriented programming language.  Every symbol is an operator, including the ; and , and even adjacency (there is no symbol for it, it is implied).  So I have the parser done and I am working on semantic analysis when I get to a problem.  The next four paragraphs give you the relevant details:

Unlike other operators the ; always evaluates its left operand first and then its second.  It returns the right argument (which is usually ignored)

The , operator evaluates its arguments in either order and returns both as a tuple.  So f(x,y,z) calls f with the tuple created from comma-ing x, y, and z together.  The , operator is generally used to create tuples but another use is to create non-deterministic ordering of statements.

The ( ) grouping symbol pair evaluates and returns its contents.  It is used to control operator precedence.  It is not used for special syntax.  For example f 5 and f(5) are the same but f 5, 6 and f(5,6) are different.

There are no C or C++ style header files but I like how everything in the global scope is executed before main and is sort of non-deterministic.  For example among various cpp files you can have:

int x;
int* p = &x;
int main() {return 0;}

Ok here is how the problem comes about.  I want my top level to be like C in that you can have a bunch of code that just sort of executes so I figured that the , operator would be the one to use and equivilent to the code above I would have:

int x, //line 1
int@ p x@, //line 2
()->() main {} //line 3

ok so now I was thinking that there are several steps to semantic analysis:

step 1: find classes
step 2: find variables
step 3: everything else

in the case of ordered expressions (;) it would be easy: apply the steps to each statement in order.  So if the code above had semicolons it would be: apply steps 1, 2, and 3 to line 1, then to line 2, and then to line 3.  I further figured that if the , operator was used instead to apply step 1 to lines 1, 2, and 3.  Then apply step 2 to lines 1, 2, and 3, and then step 3 to all of them.  That seems nice and good.

Ok here is how it would work:

(my_class var),
(class my_class = (/*implementation*/)

no problems, step 1 finds my_class before step 2 find var.

class my_class = (/*implementation*/); //line 1
(my_class var) //line 2

no problems, all of line 1 is executed before line 2

ok now what about this:

(my_class var1),
    int x;
    class my_class = (/*implementation*/);
    my_class var2

well that seems crazy but it seems like the logical extension.  So how would I implement this?  Looks like I have to go through both sides to find the class definition, then go through both sides to find the variables, then do the other stuff.  However I need to keep track of where things are defined.  The int declaration is the only line that can't know about my_class which seems quite strange but necessary.  Ok now another complication:

    int x; //line 1
    class my_class (/*implementation*/); //line 2
    my_class@ p mc@ //line 3
    my_class mc //line 4

ok this code works too.  In the first tuple we make a class and we make a pointer to a variable of that class.  In the second tuple we make that variable.  Everything works fine but consider line 1.  At line 1 the variable mc is declared (because of step 2) but my_class is not declared because the definition for my_class is on the following line.  Doesn't it seems strange that a variable is in scope but its type is not?  It would then be unusable I assume.  So now the time dependancy is getting trickier.  I haven't though of any but I am assuming that there are stranger scenarios.

These examples are somewhat contrived so hopefully they won't be an issue with regards to normal usage but I'm just concerned.  Also it won't be as easy to implement.
Tom C
Saturday, February 03, 2007
Sunday, February 04, 2007
that's better than disagreeing!

Well I have mostly solved it.  I have a Timestamp type (in the compiler implementation, not the finished language):

struct Timestamp
    vector<Timestamp*> past;
    vector<Timestamp*> future;
    Expression* present;

//is in the past of
bool operator< (Timestamp& left, Timestamp& right);
bool operator> (Timestamp& left, Timestamp& right);

//is in the future of
bool operator>= (Timestep& left, Timestamp& right);
bool operator<= (Timestep& left, Timestamp& right);

bool operator== (Timestep& left, Timestamp& right);
bool operator!= (Timestep& left, Timestamp& right);

//returns true if neither is in the future or past of the other
bool operator^ (Timestep& left, Timestamp& right);

Timestamp* SerialSplit(Timestamp* right); //;
Timestamp* ParallelSplit(Timestamp* top); //,

I will have a function go through the parse tree and assign timestamps.  Whenever it encounters a ; it will do a serial split and whenever it encounters a , it will do a parallel split.  I guess actually all non-semicolon operators have to be a parallel split otherwise it becomes almost impossible to deal with splits at lower than the highest level.  I just don't like it.

So now that every expression will have a timestamp and a pointer to the scope that it is in, all interaction with the scope will involve passing the timestamp.  So if you want to add a new variable to the scope you send in the timestamp, if you want to lookup an identifier you send in the timestamp.  Then it will give you the one that is not in your future.

I guess you get complications like this when you stray from the beaten paths and try to combine features that make implementing each other harder.
Tom C
Sunday, February 04, 2007
Sunday, February 04, 2007
Are you really sure that's the best way to approach that?  I think you might want to look at your assumptions again.
Art Send private email
Monday, February 05, 2007
well it seems like the only way.  Also for other operators I am doing a serial split followed up by a parallel split, so that the subexpressions occur before the operator.  I kind of had to do it this way in case there are things nested inside of other operations:

(int x, int y) = DivMod(10,3)

if I don't do timestamps all of the way down I was going to have to make the left side its own scope because writting the implementation any other way was too hard (I don't remember why, my brain was about to explode) and that would result in crazy behavior.  So I think that the timestamps for everything is quite workable.  New scopes will create new timestamp graphs.

One of the advantages of timestamp graphs is that I think that they will make code-genation easier.  So it is actually a win I think.
Tom C
Monday, February 05, 2007
You think your difficulty here might be telling you something about how difficult it will be to actually write programs in this language?
Mark Jeffcoat Send private email
Monday, February 05, 2007
no, but that is something to look out for.  The part that is causing the complication is that I want the benefits of header files on a grander scale without actually having them.

Programming in the language will actually be pretty easy.  Elegance is my #1 design goal.  So far I only have one part of my language that I consider inelegant, and really all it is is an = symbol in the class definition syntax.  I expect that I few more will sneak in but so far so good.

The reason that coding it is hard is because you have to deal with everything that could be written, not just idiomatic code.  Also I expect that once I work on my solution more and more it will become simpler and in the end seem like the obvious and simple solution that I could have typed out in five minutes.  I have never written a compiler before and the language definition is in my head so that complicates things too.

The more I think about this timestamp plan it seems that the nodes aren't so much timestamps but like logic gates.  I am turning the parse tree into a directed graph.  So now that I have a new metaphor I think that the solution will start to make more sense and become easier to implement.
Tom C
Tuesday, February 06, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz