The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

what do pointers need to be able to do?

Hi, I'm designingand implementing a new language.  I have finished tokenization and now I am working on the parsing/syntax tree stage.  Much like Ruby, the goal of the language is to make programming joyful for its creator (me).

So now I was thinking what pointers need to be able to do.  C++ pointers seem to do three things:

1) represent an assignable reference to an object
2) are null or not null
3) act as iterators for arrays

the first is the motivation for having pointers in a language.  The key part here is "assignable", which is what separates pointers from C++ references.

the second allows you to pass null.  For example you might have a function that takes a log file.  You can pass it a pointer to a file and the function will log stuff to it.  Or if you aren't interested in the log you just pass null.  I don't like this, as it seems to be confusing two different concepts.  C# has a nullable language feature which I think that I will use.  So for passing that log file you don't pass a pointer (unless you are handing off ownership), you pass a nullable reference.

the third thing that pointers do, act as iterators, is really not needed.  I'll just have arrays be able to create their own iterators.  Which of course the compiler can implement with C++ style pointers, or whatever.  Arrays will be in version 2 so I can hold off for now and think about whether I want bounds checking or not or whatever.

Ok, so as far as I know, that is what pointers do in C++.  2/3 of their job can be sent elsewhere.  So now for operators I will still need assignment, equality, dereference, and ...that's it right?  I won't need pointer arithmetic or the greater than/less than operators as that is part of the iterator role will is being handed off.  I won't need conversion to boolean as that role is being handled by the nullable construct.

So now that that is all dealt with (please correct me if I am missing anything), is there ever a reason to have a pointer to a stack allocated variable?  Unless there is I will disallow it.

A lot of people think that pointers are bad.  I think that the problem is that they do too many semi-related things.
Tom
Monday, May 08, 2006
 
 
"You can pass it a pointer to a file"

You can't point to a file. A file is an abstract conception in some file system, thus you point to a memory object holding all informations necessary to describe and access a file.

"Or if you aren't interested in the log you just pass null.  I don't like this, as it seems to be confusing two different concepts."

So you don't need to set a pointer to a value that means "not pointing to anything"?

"is there ever a reason to have a pointer to a stack allocated variable"

If all your variables you have to point to are in the global data area or dynamically allocated on the heap, no, there is no reason.

Besides that I don't have the impression you really understood what pointers are and how they are used, I recommend that you should implement them "all or nothing". Half-pointers are pretty useless. Depending on what you want to archive with your language that existing languages don't provide, references may just be enough.
Secure
Monday, May 08, 2006
 
 
s/archive/achieve/

I'll NEVER learn this... ;)
Secure
Monday, May 08, 2006
 
 
"is there ever a reason to have a pointer to a stack allocated variable?"

I'm assuming from your description that this is a relatively high-level language.  Pointers, honestly, are really only needed if your building a low-level language.  They're useful for building all those data structures (linked lists, hashes, arrays, etc) when you don't have any at all.  If you're building a high-level language you can abstract out pretty much every need for pointers.  You might still want to have *some* concept of a pointer to interface with libraries (For example, Visual Basic has the AddressOf operator for getting the pointer of an item into an Integer).
Almost H. Anonymous Send private email
Monday, May 08, 2006
 
 
"So you don't need to set a pointer to a value that means "not pointing to anything"?"

There won't be a value meaning not pointing to anything.  Instead I will be adopting the C# nullable operator.  For example:

i int; //there is a variable named i, it contains an int
j int?; //there is a variable named j, it may contain an int or it might be null.

p int@; //p is a pointer to an int
q int@?; //q may be a pointer to an int, or it might just be null.

See what I mean?  Null is not a pointer value, indeed it has nothing to do with pointers anymore.  The two ideas are entangled in C and derived languages, including java.  In assembly integral values are used to represent integers, characters, and addresses.  In higher level languages the three concepts are separated (well, ints and chars are still pretty connected).  However in C++ pointers still cover a lot of area.  They are used for relationships between objects and as iterators for arrays.  Those are two different ideas, and thus should be supported by two different language features.

"I'm assuming from your description that this is a relatively high-level language. Pointers, honestly, are really only needed if your building a low-level language. They're useful for building all those data structures (linked lists, hashes, arrays, etc) when you don't have any at all."

No, this is more of a medium-level language.  It is higher level than C and lower level than Python.  Hmm it is hard to compare how high/low it is compared to C++ or java.  It will be a little lower level than C#.  So I do need pointers but they don't need to be used for everything.  A pointer will be used for making a relation between two objects.
Tom
Monday, May 08, 2006
 
 
This is what pointers do that makes them different from references:

pointer1 - pointer2 = a number
pointer2 + a number = pointer2
Scott
Monday, May 08, 2006
 
 
Does a pointer not used as an array iterator need that feature?

Basically what I am asking in the thread is:

I have enumerated three things that pointers do in C++, are those the only three things that they do?
Tom
Monday, May 08, 2006
 
 
What about function pointers?

And BTW it would be a mistake to assume that pointer math is only used for iteration.
S. Tanna
Monday, May 08, 2006
 
 
>  is there ever a reason to have a pointer to a stack allocated variable? 

Because if you write a function that operates on a pointer to memory, any memory at all, why should be required to write a 2nd function for memory that happens to be on the stack.
S. Tanna
Monday, May 08, 2006
 
 
Tom, what you're describing seems a lot more like "references" that "pointers".

Not that there's anything wrong with that.
BenjiSmith Send private email
Monday, May 08, 2006
 
 
"I have enumerated three things that pointers do in C++, are those the only three things that they do?"

"1) represent an assignable reference to an object
2) are null or not null
3) act as iterators for arrays"

Function pointers are a subset of 1) when we consider functions as objects. Note: An "object" in C (not C++) refers to memory storage, *not* to OOP.

3) is only a subset of pointer arithmetics, which is a lot more than mere array iteration. Pointer arithmetics doesn't work on functions for obvious reasons.
Secure
Monday, May 08, 2006
 
 
"What about function pointers?"

Well I'm going to have functions as first-class objects, which will be implemented using function pointers, at least for function literals (the functions that the programmer writes using the function definition syntax).  So they will be able to be passed around efficiently.  Hmm... it will be kind of like the java String class.  Immutable (afterall this is a compiled language, not lisp, so you can't have your functions be mutable).  You could do this f = g+h where f,g, and h all have the same parameter and return types.  The compiler will handle everything, in the same way that String handles the char array.

"And BTW it would be a mistake to assume that pointer math is only used for iteration."

Yes, that's exactly what I am asking!  When would you use pointer math for something other than iteration? 

"is there ever a reason to have a pointer to a stack allocated variable?"

"Because if you write a function that operates on a pointer to memory, any memory at all, why should be required to write a 2nd function for memory that happens to be on the stack."

Well in most cases you would use a reference.  What I'm wondering is if I can replace the "most" with "all".

Ok in C you have your swap function:

void swap(int* left, int* right)
{
    int temp=*left;
    *left=*right;
    *right=temp;
}

in C++ you have:

void swap(int& left, int& right)
{
  int temp=left;
  left=right;
  right=temp;
}

So references take up a lot of the pointer role.  The main reason that you need pointers is this:
consider the linked list:
A->B->C
now you delete B,
A->C

of course if you attempt to implement this with references when you assign C to A.next you call the assignment operator of the node itself, not the pointer, and you end up with this:
A->C->C
which is wrong.
So pointers are needed, but where are they needed?
Tom
Monday, May 08, 2006
 
 
You lost me on the whole linked list thing. Why can't I do this with references?

A.Next = B
B.Next = C
C.Next = null

// Delete B somehow.

A.Next = C

// C.Next is still null. Isn't this correct?
anon
Monday, May 08, 2006
 
 
well when I say references I am referring to C++ references which have almost nothing to do with Java references other than that both are implemented using pointers.  A C++ reference is an alias for another variable.  A Java reference is a pointer that automatically dereferrences itself except when the assignment, equality, and inequality operators are called.  I wrote a swap function above using C++ references.  Note that the function simply would not work using java references.  If you have C++ references r1 and r2, then r1=r2 copies the value referred to.  After the assignment r1 and r2 still point to different locations in memory.

So if I attempt to use C++ references to make a linked list the links between the nodes are unmodifiable.  Once a given node is constructed its links to other nodes would be fixed.  Indeed you would have to know the size of the list that you wanted at construction time and furthermore since C++ references cannot be null the list would have to be of infinite size, or perhaps you could make it circular.  Either way it wouldn't be good.

Java references are basically pointers, but the problem is that they aren't explicit.  Instead of letting you choose whether dereference or not they make the decision for you.  Sometimes I want to assign one pointer to another so that they both point to the same memory location.  Othertimes I want to call the pointee type's assignment operator so that the two pointers point to separate memory locations that now have equal contents.
Tom
Monday, May 08, 2006
 
 
Ok. I wasn't following your logic because I was thinking in terms of Java references. Thanks for clearing that up. Carry on!  :)
anon
Monday, May 08, 2006
 
 
> Yes, that's exactly what I am asking!  When would you use pointer math for something other than iteration? 

A big point of using a pointer (!) is that it allows you to access locations in memory by whatever criteria you choose.

Obviously not everything you might choose, is iteration.

And it's impossible to iterate through (!), sorry enumerate, all the non-iteration possibilities.

Here's an example to illustrate some of the kinds of things that pointers are for.

You might want to treat memory or part of memory as some kind of random access array. That memory may already contain stuff that was put in some other way. 

Consider the case of reading some kind of binary file format into memory.  You might want to read the whole file into memory as bunch of bytes, step thru the memory containing it using pointers. When you get to a block of data which contains an ASCIIZ string, use the start location (pointer) of that string as a parameter to a string function.  When you get to a block of memory containing 32 bit ints, treat that block as a block of 32 bits ints, pointers to ints, etc. When you have an offset command in the file format (saying item X is at file offset Y), you jump a new location in the memory, by doing pointer arithmetic, and so on.
S. Tanna
Monday, May 08, 2006
 
 
S. Tanna, most of those examples are still basically covered by random array access. 

"When you get to a block of data which contains an ASCIIZ string, use the start location (pointer) of that string as a parameter to a string function."

Of course, that pretty much requires a C-like language with C-style strings.  It's also a very bad idea.  Outside of that, I can't really see of an example where you'd need pointer arthmetic.  If you can covert any memory region into a "byte array" than I think pretty much covers it.

I tried myself to think of some examples that aren't covered by what Tom is thinking but I really can't think of any.
Almost H. Anonymous Send private email
Monday, May 08, 2006
 
 
'Java references are basically pointers, but the problem is that they aren't explicit.  Instead of letting you choose whether dereference or not they make the decision for you.'

I'm not sure when this matters... Perhaps I just can't think of scenario.  The only thing java references *can't* do, is  pointer arithmetic as someone above mentioned.  Unless I'm mistaken... (been a while since i've done c++).
Vince Send private email
Monday, May 08, 2006
 
 
By the way, Tom, what's the purpose of your new language?

Not that you need a purpose. Maybe you're just implementing it for the experience of implementing a new language. If that's the case, then bravo to you for working hard to learn new things.

But the fact that you're creating a new language begs the question of why the previous languages don't meet your needs. If something about C++ pointers (or Java references (or Perl references for that matter)) rub you the wrong way, what is it that you'd most like to change?
BenjiSmith Send private email
Monday, May 08, 2006
 
 
Almost H.

I didn't say you couldn't do it another way

But an array is not quite the same, because it's an array of whatever (bytes, chars, whatever).  Whereas a pointer is allows a block of memory, which (as a whole or in parts) can be viewed at one moment in one way (e.g. as chars), and at another moment in another way (e.g. as 32 bit ints), WITHOUT copying.

As for ASCIIZ, that was just an example of the type of thing that you might see within a block of memory, and might view the memory as. I didn't opine at all whether ASCIIZ strings are a good or bad thing.  And FWIW, if you were parsing a file which contained ASCIIZ strings, whether you personally felt they were good or bad things, wouldn't remove the need to parse them.
S. Tanna
Monday, May 08, 2006
 
 
That's a good point..  I didn't read it that way originally.  It could also be an file containing say images and you want to be able to pass a pointer to the various images to various functions (say for display) or I could be, as you said, as simple as saying "these 8 bytes are a float" and working on that in the program.
Almost H. Anonymous Send private email
Monday, May 08, 2006
 
 
S. Tanna:
I have thought about bytes somewhat.  I am going to have a byte type.  I can see a conversion that takes a byte iterator and simply casts it to a pointer of the appropriate type.  Not sure on the details but anything like that would be part of the interface of the byte module rather than a central part of the language.  If you want to cast from A to B where A and B are unrelated you might have to do something like this:

b = from_bytes(to_bytes(a));

This language will be able to do low level stuff but it will take more typing, it is sort of hamming encoded, to steal a phrase.  Consider the following three operators:

& logical and
&& set intersection
&&& bitwise and

Vince:
Java references frusterated me to no end, and I started off by learning java so it isn't like I was coming from C.  For one thing they pretty much require that java have the primative/object dichotomy.  You can't have a reference to an int.  You can have one to an Integer but those are immutable.  You can't have two objects share an int.  You can't easily assign (copy) one object onto another.  I haven't used java lately but I recall their being some Copy function in Object that wasn't really pleasent to use

You can't write a swap function.  With C++ references you can do this:

int a=4;
int b=5;
swap(a,b); //now a = 5, b = 4

A few times I started using single element arrays to simulate pointers :(

BenjiSmith:
Doesn't every programmer want to make his own language? I always thought that no language really suited my needs. 

C++ sort of does.  However some of the syntax is rather awkward, it really seems random.  Almost as if someone took an imperfect language and threw a bunch of great features onto it :)  Just imagine trying to teach a beginner the syntax for a pointer to an array of function pointers.  It makes no sense.

So my first goal was to regularize syntax.  All definitions follow this pattern:

name type {initializer};
examples:
i int{4}; //declares integer i, initializes it to 4

f ()->(){ /*blah blah*/ }; //declares f as a function taking no parameters and returning nothing, initializes it to whatever the code in the braces is.

C class { /*blah blah*/ }; //C is a class initialized to whatever, classes don't have an assignment operator and thus are all const.

p int@{new int{4}};//p points to a 4
a int[]{1,2,3};//array, but this will change, arrays are going to be reworked

also tuples are going to be introduced.  In version 1 of my language all functions take one parameter and return one value.  Once tuples are added I get multiple returns for free.

since functions are written like this:

DivMod (numerator int, denominator int)->(quotient int, remainder int)
{
    quotient{numerator/denominator};
    remainder{numerator%denominator};
    //it is a compile-time error to not initialize a return value
};

you can change it to this (just came up with this today, well I came up with the motivation):

DivMod (n int, d int)->(q int{n/d}, r int{n%d}){};

As in C++ function arguments can be evaluated in any order.  There are no statements that need to be ordered.  So if I restrict what can appear in the intializers somewhat it becomes a nonimperative function.

Then there is multiple dispatch.  Why should the only virtual parameter be "this"?

Now it may sound like I am crazy and just throwing all of the great ideas into one big mess, but I am being very careful to keep the syntax simple.  Afterall, I have to write the compiler and I am not anywhere close to a master compiler writer.  I've placed lots of restrictions, such as all operators and type stuff is postfix, semicolons after every } and so forth.  Not ever feature is going to be in this language.  It won't be as fast as C++ in every situation.

Ok so I want to make my dream language but there are always distractions.  Well not lately.  I'm deployed to Iraq for a year, I'm about halfway through and I'm on 2 weeks leave right now, which ends saturday.  So in Iraq there aren't many distractions.  You get off work and you go back to live in your "can" as we call them.  It is just a shipping container but with a normal door and an air conditioner.  No family, lines to use the internet, really just nothing but work and sleep.  So I figured I would spend an hour here and there on my language and come back to the US with a program that makes syntax trees.  Then I'd study up on assembly and generate some code.

Also I get out in 2.5 years and plan on getting a real job.  I'll still have my CS degree and the fundamentals never change but from what I hear hiring folks don't know that.  They think that things change a lot.  Well some stuff does, but it isn't like I was current back then either.  I just know the fundamentals pretty well and I can learn the other stuff in a week.  I'm also short on experience.  I'm incredible at powerpoint though, as somehow "Intelligence Analyst" is the army term for "Powerpoint Monkey"  I've caught myself thinking in slides :(  So I have my degree and my TS/SCI clearance but I'd also like to have the compiler to show that I can still program and do some cool stuff.

P.S. never ever, ever let anyone you don't hate join the army for any reason.  I have a pretty good imagination.  I could not have imagined that army.  So I really can't explain it to people.
Tom
Monday, May 08, 2006
 
 
You might be interested in checking out the D programming language. Like everyting else, it's not perfect. But it offers a lot of interesting features:

http://digitalmars.com/d/overview.html
BenjiSmith Send private email
Tuesday, May 09, 2006
 
 
"Doesn't every programmer want to make his own language?"

Yup.  I want to -- just don't have the time.  I've actually written a language/byte-code interpreter/compiler for a university class so I know a little about what goes into it.  I even have a good idea of what tools I'll be using and what features I want. 

There's actually some overlap in ideas of what you want in your language and what I want in mine.  I also had the idea of functions taking one parameter and having one return value combined with tuples.  I'm not keen on references & pointers -- I'm leaning more towards what .NET has with specific value and reference types.

My 'ideal' language would have a very complicated type system.  I want a language where I can define a specific integer variable or parameter to be between 0 and 5 and the compiler/runtime will guaruntee that for me.  However, I also want to be able to define something as 'variant' and not have any typing at all.
Almost H. Anonymous Send private email
Tuesday, May 09, 2006
 
 
Tom, you might consider compiling into C, rather than into assembly.

Your then kick off gcc (for example), and hey presto you have a compiler for tons of different platforms.
S. Tanna
Tuesday, May 09, 2006
 
 
BenjiSmith:
I look at the D language every now and then but I think that it is going in the opposite direction from mine.

Almost H. Anonymous:
I really don't like the idea of value vs reference types.  If it makes it anymore userfriendly I am thinking of renaming "pointer" to "relationship" and "reference" to "alias" to give the programmer a sense of how they are meant to be used.  I will probably stick with "nullable" but I could turn it into "optional".  Iterator will stay iterator.  I think with these four language features I can get at least 99% of what you can do with a C pointer.  If I restrict pointers to pointing only to things that are allocated by new or returned by system calls then I think I would still have 95%.

Elaborate per-object subtypes (such as an int with range 0-5) is a lot fancier than I am looking to make.  I think that I'll keep that burden fully on the programmer for now.

A variant type goes against my religion (afterall programming language preference is mostly religion right?).  I will have the equivilent of the new C++ "auto" keyword (which infers types) by simply omitting the type:

begin {container.begin()}; //begin's type is Container.Iterator
end {container.end()};
sort(begin,end);

S. Tanna:
yeah I might do that.  Managing registers doesn't appeal to me.  I guess that I'll decide when the time comes.

All:
Thanks for all of the help.  You've helped me really get at what each feature does and disentangle them from each other.

Here are the results:

i int{3}; //and int, holds 3

a int:{i}; //a1 is an alias for i

r int@{new i}//allocates memory, copies i into it, r holds address to the allocated memory

r: = 5; //the : is the ToAlias operator, 5 is then assigned to the memory that used to hold 3

k int; //not initialized, holds random junk
//it is a compile-time error to use k

k{2}; //now you can use k

n int?; //n can hold an int but holds null
n{6}; now it holds a value

i=n; //compile time error, n could be null
i = n ?? -1; //if n holds a value assign it to i, else supply - 1 as the default value

btw, variable name first was decided on for ease of parsing, particularly in regard to functions.
Tom
Tuesday, May 09, 2006
 
 
Have you been working on a grammar?

I highly recommend keeping your lexical analysis, your syntactic analysis, and your semantic analysis completely separate from one another.

I recently implemented a domain-specific language for a project at work, and keeping these three stages of the parser completely separate from one another made my life a whole lot easier.
BenjiSmith Send private email
Tuesday, May 09, 2006
 
 
nah, I know my work habits.  I would never finish if I did it that way.  I will find it easier if I laborously write the parser bit by bit.  I can test each step along the way which keeps me motivated.  I did design my language so that it is easy to parse without defining a fancy grammar. I had to eliminate all prefix operators but that is no big deal.

I have a list of all of the tokens in the program
I have a SeparateStatements function that takes two iterators into the list.  It breaks it up using semicolons.  It has a nesting variable so that it does it correctly.  So the result is a container containing iterator pairs, each pair representing a statement.

Each statement is then processed.  The exact details of the processing depends on the context.  If you are in a class definition each statement defines a member variable.  If you are in a function each statement is an imperative command.  It is all so general so that you can define a class inside of a function, but of course that class goes out of scope when the function ends, and the LookUpName virtual function handles this.

So anyway I have begun writing the ProgramDefinition and ClassDefinition nodes of the syntax tree.  Right now the program definition separates the statements and then it scans through them looking for class definitions.  The second token of a class definition is always the CLASS token.  Then it takes the value of the first token (the class name), adjusts the pair of iterators so that the range is inside of the braces and calls the ClassDefinition constructor.  I had a working version but I decided to revise it.

Next I'm going to look for functions, again from the top down.  I'll scan (considering nesting of course) for the -> token, then parse the arguments, then the return value, then the statements.

So basically I'll have this series of things that I am looking for and recurse down the tree.  I've been thinking about how to deal with "closed" vs "open" scopes as I am calling them.  A closed scope is like the body of a function or loop.  You can't get at the stuff inside.  An open scope is like a class, you can access the members through the . operator.

So yeah, maybe if I were some guy who could handle large products I would do one of those fancy grammers and then pull out flex/bison and stick all of those semantic actions on.  I just find it easier to do it bit by bit.  I don't have a very good work ethic.
Tom
Tuesday, May 09, 2006
 
 
"I did design my language so that it is easy to parse without defining a fancy grammar."

You're out of your goddamn mind.

Ya know, if you take a little time and *learn* how to use an EBNF parser-generator, it will actually be *easier* than rolling your own. I've done it both ways, and it's a night-and-day difference.

About four years ago, I wrote an XML parser from scratch. It was a scanning parser, with a finite state machine (and two different stacks!!) to keep track of parse states and element names. Semantic analysis happened at the same time as lexical analysis, and it performed rudimentary validation on the fly. It was just a little hobby project of mine, and it was actually pretty cool. At the time, I was very proud of it.

But it took a long long long time to write (and debug), and the parser involved an enormous while looop, full of swich/case statements. Probably the most important factor in my sucess was the fact that the XML language was already very well-specified, so I didn't ever need to go back and make changes to the parser once it was working correctly.

Over the last few months, though, in my actual job, I've been developing a very domain-specific language to use as an internal development tool for some of our products.

The langauge has evolved significantly, and the development of the parser has been an iterative process. I'll make a tiny change to the language spec and then change the grammar to match that change. Then I'll regenerate all of the lexical/syntactic components of the parser from the updated grammar and hook it up to the semantic analyzer (which usually doesn't need to change at all). Then I'll write some sample programs with the new grammar and discover that there are some things I don't like about it, and I'll go back to change the langauge spec.

Anyhow, once the parser completes lexical, syntactic, and semantic analysis, it emits a symbol table. All references between objects in the symbol table are checked for type-safety and for cyclic dependencies. And then the whole symbol table is handed off to a series of "handlers" that emit the various structues that plug into our application.

Anyhoo...

I could see hand-crafting the parser once the language spec is nailed down. But during the prototyping stage? No way. Generating the parser from a grammar makes it so much easier to go back and tweak things.

Another thing to keep in mind is that a parser-generator can help you find lexical ambiguities in your grammar that you didn't know were there. (And trust me, the first few drafts of your grammar will have lexical ambiguities.) Also, during syntactic analysis, you'll probably have ambiguities than can only be solved using lookahead. Do you know how to implement lookahead on your own? Are you ***sure*** your grammar is LL(1)? If it's actually an LALR grammar, writing the parser yourself will be excruciatingly difficult.
BenjiSmith Send private email
Wednesday, May 10, 2006
 
 
"You're out of your goddamn mind."

Seconded.

However, as anybody who has followed a real course on compilers/grammars knows, writing a grammar by hand is really really easy.

The code produced by generators is simply not very good. As grammars become more complex, those generators tend to build large lookup tables, and the code will become a mess of table references. If you write a parser yourself, you can usually do a much better job. Not to mention that you're more likely to write a simple grammar if you do it by hand. If a grammar is hard to write, it's hard for people to understand. And if a grammar is hard for people to understand, they won't be able to guess how statements are represented in the AST, which is confusing as hell.

o Keep the grammar simple.
o Write the grammar down, semi-formally. (EBNF for instance). But make sure you know the parsing complexity of the grammar. LL(1) is not always possible.
o Write the parser by hand.

That's what I'd do.
L. Lort
Wednesday, May 10, 2006
 
 
What should a pointer do?

I would have thought that pointing to a memory address would be about the limit of it.  If that address is stored as an unsigned integer of sufficient length to represent a memory address on the system the necessary arithmetic comes for free.

The pointer data type should probably support a dereferencing operation akin to C's * operator.

The ability to take the address of any object is highly desirable.  You don't realize it's value until you are using a language that doesn't have the feature.
Clay Dowling Send private email
Wednesday, May 10, 2006
 
 
"The ability to take the address of any object is highly desirable."

The address of your object is 0x03FE3E22. What does that tell you? Absolutely nothing. Unless you're writing device drivers (or similarly low-level code) I contest that you need to know the actual address of any object.
L. Lort
Wednesday, May 10, 2006
 
 
L. Lort,

he doesn't say he needs to know it - he just wants the ability to take it. Except when debugging, I also don't care about the exact bit pattern in my pointers, as long as they point to the objects I want them to point to.
Secure
Wednesday, May 10, 2006
 
 
I read Tom's post last night just before bed so I didn't get a chance to comment but..

"You're out of your goddamn mind."

+1 for this.

I used Flex/Bison for my first language (school assignment) and it was basically a complete procedural language (even had nested functions).  It could not have been easier to do lexer and parser.  Rolling one by hand is extremely foolish -- comparitively it's like coding a windows application in assembler.

I've been looking into ANTLR ( http://www.antlr.org/ ) for a few years now.  However Flex and Bison have been around since the dawn of time.
Almost H. Anonymous Send private email
Wednesday, May 10, 2006
 
 
And what I'm saying is that pointers, or addresses, are never a goal in itself. Pointers are a means to an end. So when you design a language you should think about what one might want to accomplish, and introduce language features where necessary to make sure code can be written in a straightforward manner.

In my experience, when you stop thinking about pointers and start thinking about the end goal you'll notice that the lack of pointers in some languages doesn't hold you back.

Taking the address of an object doesn't have an inherent value. Being able to take the address of an object does have at least one downside: it makes writing a garbage collector more difficult. Therefore, I'd say that pointers should be added to a language only as a last resort.
L. Lort
Wednesday, May 10, 2006
 
 
L. Lort,

The address is wonderfully helpful.  When I read a double from the user I now know where to put it.  Later, when I'm storing the user input in the database, I know where to take it from.  I don't have to worry about plucking it off the stack before it goes out of scope, which might not be a viable option before storage.

Maybe it's not something you're interested in.  Maybe it's not what the original poster is interested in doing.  When I'm using pointers though, it's what I want to do.  It sounds like you aren't very comfortable with using pointers, and that's fine, because you're probably not solving the same problems that I am, or not solving them in the same way.  But for what I do with a programming language, it's exactly what I need from a pointer.
Clay Dowling Send private email
Thursday, May 11, 2006
 
 
Clay,

I think L.Lort has a point here, though it should be viewed from different levels of abstraction, without mixing addresses and pointers.

The "address" of an object is the lowest level. Any object in practical implementations is stored somewhere in the memory as a sequence of bytes, and it must be uniquely identified to access it: The absolute memory address (absolute without regarding virtual memory addressing). The value of this address is of absolutely no interest, except under special circumstances when writing device drivers or working on embedded systems, and maybe when debugging.

A "pointer" is already an abstraction from this concept. It allows to hold an address and access the object by dereferencing it, without the need to know the value. Though you are confronted with it when referencing objects and dereferencing the pointer, which you have to do explicitely by hand for both.

A "reference" is an even higher level of abstraction, mostly hiding the referencing and dereferencing, and caring for more, e.g. garbage collection.

But in the end, in machine code, both pointers and references will result in exactly the same: The absolute address, since it is the only useful thing to identify and access the object stored somewhere in memory.

There are no problems with pointers. Since they are already an abstraction, they could be further extended to support the garbage collection. But this is the wrong way to go, IMHO, because then we go the abstraction levels upward. It is better to only go them downward, as in any computer hierarchy. When your app uses an API, most of the time you adapt the app to the API; you won't adapt the API to optimally support your app. DLL hell, anyone?

So to the OP, this may be the way to go: Only use references, but include methods to retrieve and maybe set the absolute address of the referenced object for the rare occassions where you really need it. You could even support garbage collection for both cases, or you can leave the programmer on his own when doing such low level stuff.

As for pointer arithmetics and array iterations: Is there anything wrong with foreach?
Secure
Thursday, May 11, 2006
 
 
When you allow pointer arithmetic the garbage collector cannot free any objects. Even if no 'pointers' to a specific chunk exist the programmer may still rely on the memory in the chunk because it can be accessed by adding some integer to a pointer he saved earlier.

In a language with pointers the concept of 'const' doesn't even exist. The data may be changed by anybody who, through pointer arithmetic, gets a generic pointer to the const variable.

In a language with pointers, anything can happen. In a multithreaded program with 3rd party libraries you might find memory suddenly gets corrupted. In a managed language no such thing can happen.

In conclusion: pointers have clear disadvantages. And modern languages get by just fine without them. So you should ask yourself again: why?

And Clay, the fact that you need pointers in some languages doesn't mean you need them in all languages. The fact that you're thinking about "variables going out of scope" already implies that you have a hard time letting go of such concepts where they do not apply. It is exactly in managed languages where you don't have to consider memory at all. Pointers give you lots of power, but the price is, once again, significant.
L. Lort
Thursday, May 11, 2006
 
 
> In a language with pointers the concept of 'const' doesn't even exist. The data may be changed by anybody who, through pointer arithmetic, gets a generic pointer to the const variable.

I must have imagined the const keyword in C and C++ ?

I think what you are trying to say is that const can't be relied on when something else may access the memory


> In conclusion: pointers have clear disadvantages. And modern languages get by just fine without them. So you should ask yourself again: why?

I think you mean those "Modern languages" that don't get used for system programming, don't have to interact with hardware, don't have efficiency as primary consideration, don't have to call machine code/assembler, or don't ever have to call existing APIs which contain pointer parameters.
S. Tanna
Thursday, May 11, 2006
 
 
> I think what you are trying to say is that const can't be
> relied on when something else may access the memory

Hence the quotes. The keyword const exist, but it doesn't guarantee the variable is actually constant. In other languages such guarantees can be made. Const is just a compiler hint, like the keyword inline, or register. C++ programmers often rely heavily on the static type checking, even though the type checking makes no guarantees towards correctness at all. I say again, without pointer arithmetic actual correctness can be enforced. Look at Haskell: type checking has improved magnificently in the past decade.

> I think you mean those "Modern languages" that
Well, yes. That's what I said "unless you write drivers (or similarly low-level code" in a post earlier.

If the audience of your language is kernel hackers, then yes, you will need pointer arithmetic. But a language doesn't need to be the ultimate language for everybody. You can't make a language that is equally suitable for kernel hacking and spreadsheet macros. So decide who your audience is and adjust your language to suit that audience.

Try to please everybody, and you'll please nobody.
L. Lort
Thursday, May 11, 2006
 
 
Well last day in America for a while so I won't be able to reply.  Back to the land of middle-managers.  The team that I am on has five managers and three workers.  Oh and I'm transitioning to become a manager too, so we'll have a 3:1 ratio of managers to workers soon.

I understand that writing a parser by hand is the hard way.  However I will not have internet access and so I cannot be dealing with leaky abstractions.  I really don't like spanning multiple layers at the same time.  Plus a handwritten parser is naturally written incrementally, which makes for easier testing.  Easy testing = success.

As for pointers, addresses, references, etc...
I've attempted to figure out all of the purposes that people put these features to and have come up language features for those purposes: relationships, aliases, iterators, and nullables.  I could probably combine relationships (pointers without arithmetic) with aliases (C++ style references) but haven't thought elegant syntax that fits the rest of the language.  I'm going more for "classical building" than "apartment building with air conditioners hanging out of the windows".

My language won't prevent the programmer from doing bad stuff.  He will be able to access memory that he shouldn't.  He will be able to do all sorts of bad stuff.  I'm not attempting to prevent malice.  I assume that the programmer is a grownup and an outstanding citizen.  Realistically no one will ever use this language but me and maybe a few people that I teach.  I really don't have to worry about Mr. Bad Coder ruining stuff.

What the army has taught me is that any system that allows you to use evil and/or stupid people effectively isn't worth it.  It grinds down their souls and minds making them stupider and eviler and at the same time does the same to the good and smart people until they escape.  So when it comes to language design I am assuming and intelligent and mannered gentleman who wants to do some coding.  He is fallible and knows it.  He likes for the language to catch his mistakes and help him out.  He doesn't like for it to make his choices.
Tom
Friday, May 12, 2006
 
 
"So when it comes to language design I am assuming and intelligent and mannered gentleman who wants to do some coding.  He is fallible and knows it.  He likes for the language to catch his mistakes and help him out.  He doesn't like for it to make his choices."

Yikes! Are you aware that you won't have any control about who will use your language once it it finished and released in any way? At least you have a nice philosophy for it. Just make sure you won't be too disappointed when faced with reality...
Secure
Saturday, May 13, 2006
 
 
L.Lort,

"When you allow pointer arithmetic the garbage collector cannot free any objects. Even if no 'pointers' to a specific chunk exist the programmer may still rely on the memory in the chunk because it can be accessed by adding some integer to a pointer he saved earlier."

No, there is no real problem. Even in C, pointer arithmetic is only defined within a given memory object (e.g. array or struct), including a pointer to the first byte behind the object, which is not allowed to be dereferenced. Thus if you allocate a chunk of memory, all your pointer arithmetic have to be done within this chunk, with further restrictions by the object types you've stored into this chunk.

Making an inside pointer within this chunk leaving it, or making an outside pointer to repoint into this chunk is undefined behaviour by the C Standard. Thus you can safely assume that you have no more references when no pointer points into this chunk. For anything else your program is already in an illegal pre-crash state.

The reference control for the pointers becomes a bit more complex, of course, because they can point to any byte within the chunk, not only to the beginning.
Secure
Saturday, May 13, 2006
 
 
It's not quite that simple.

In C, you know that arrays are always allocated as one consecutive chunk of memory. So if you decide to put a bunch of structures inside an array you can traverse the array using pointer arithmetic. Then, by dereferencing said pointer you can access specific bytes within the structure. So a type** pointer can't be treated as 'just a pointer'.

So that one pointer gives you access to a list of structures. So intuitively speaking, when you still have a pointer to an array of objects, the objects inside the array shouldn't go out of scope.

But how can the garbage collector tell whether the pointer to one of the structures you have is *also* an array iterator? It can't. So either it deallocates the entire array except for the one object you have a pointer to, or,  it keeps the entire array alive until none of the elements inside the array can be accessed anymore. The first solution might be frustrating for the programmer: "Where'd my array go!", the second scenario is basically a memory leak.

This might be a contrived example, but I think it illustrates that it's very hard to predict the meaning of a pointer.

That said, if you allow a garbage collector to silently corrupt the memory of a program when the pointer arithmetic laws are not followed to the letter, then it can be done. But writing secure software is pretty hard as it is, and an unpredictable garbage collector is probably worse than doing your memory management yourself.

Even simple functions are strncpy are abused regularly by programmers, because they don't read the specs carefully. Can you expect programmers to understand complex garbage collecting rules? No way. Yes, strictly speaking the language isn't to blame when a program doesn't work when you break the rules. But shouldn't the compiler writers take at least some responsibility? Because a garbage collector runs only occasionally, incorrect code will *SEEM TO WORK*.

Do we need more code in this world that seems to work, but is in fact horribly incorrect? Hell no.
L. Lort
Saturday, May 13, 2006
 
 
A pointer is a variable whose value is a reference.
Most everything else you need follows from that.

Saturday, May 13, 2006
 
 
That was very enlightening. Thanks a million.
L. Lort
Saturday, May 13, 2006
 
 
L.Lort,

The second option is the way to go: Base the garbage collection on the single memory chunks, not on the object structure. Whenever an absolute address within this chunk is assigned to a pointer of any type, it is an additional reference. When this pointer is overwritten with an address outside, the reference is deleted. Any 'reference' to a single object within this chunk is basically another pointer with an assigned inside address.

When a pointer crosses the boundaries by pointer arithmetic, it is undefined behaviour, and since it already needs to be detected for the garbage collection, the language can throw an appropriate error (we are still talking about a new language on the example of C). That's it.

And I don't see in any way why this should be considered a memory leak. The complete chunk is freed when the last reference is gone. It is exactly the way it is manually handled in C. The complete memory chunk is deallocated when you free() it, regardless if you only use a single byte within an 8MB block and don't need the rest.

If you need partial deallocation, don't allocate an array of objects; allocate an array of pointers to objects instead, with any object in its own chunk of memory. You can even use your pointer arithmetic with pointers to the first array; you only need an additional *dereference.
Secure
Saturday, May 13, 2006
 
 
"If you need partial deallocation, don't allocate an array of objects; allocate an array of pointers to objects instead"

This is *manual* memory management. When you have to worry about issues like these the benefits of a garbage collector are mostly negated.

Besides, I think it's an unintuitive solution. For instance:

function declaration
{
  declare local variable
  array of strings
  iterate through array of strings
    if condition return dereferenced iterator (a string)
  return "no result"
}

Now, because of the garbage collector you know that the local variable has gone out of scope when the function returns. You know that if the function returns "no result" that the array has gone out of scope. But if a result *is* found, then, according to your garbage collection policy, the entire array would survive as long as that one string remains in scope. However, the rest of the array *cannot* be accessed.

This is a memory leak.

Now, the programmer had to write:
if condition return (dereferenced iterator) copy

This would have the desired result, because now the array goes out of scope and a fresh string is returned.

So what good is a garbage collector if you still get memory leaks? Avoiding memory leaks this way might even be harder than consistently using RAII and smart/auto pointers.
L. Lort
Saturday, May 13, 2006
 
 
L.Lort,

as far as C is concerned, a "string" is usually a pointer to the memory where the characters are stored, thus your array of strings is in fact an array of pointers, with any single string in its own memory chunk. Thus all strings will be garbage collected, except the single string that is returned, because the return value takes a second reference to the string. And even if not so,

"This is a memory leak."

No, it still isn't. The returned pointer value is a reference into the complete memory chunk where the array of strings is allocated. When this pointer value becomes invalid, the complete chunk will be garbage collected. A "memory leak" is when you have a chunk of memory still allocated, though no more pointers are pointing to it, so you are not able to free it, automagically or manually.

The garbage collector still knows that there is a complete chunk of memory behind the pointer pointing in the midst of it, even if you as the programmer can't access it by the language semantics.

At most it could be called a memory waste, but not a memory leak.
Secure
Saturday, May 13, 2006
 
 
Raymond Chen: "A cache with a bad policy is another name for a memory leak."

http://blogs.msdn.com/oldnewthing/archive/2006/05/02/588350.aspx

This is no different.
L. Lort
Saturday, May 13, 2006
 
 
L.Lort,

thanks, nice link. I won't argue against Raymond Chen; the probability that he is right is much larger than the probability that I'm right. Just note he's talking about "waste" all the time, too. But there is a point, let's find a compromise.

Viewing it this way, then storing a boolean into a 32 bit int is a memory leak, too, since 1 bit of information is stored together with 31 useless bits. It is just a question of scale. Try to allocate memory for 256.000.000 booleans. It is a serious difference if you try to allocate 32MB or 1GB of memory.

Memory waste on the small scale is no problem and even may be a good thing. Just imagine the effects on the performance when all variables are optimally stored into memory, so any single access needs bit masking and shifting.

Memory waste on the large scale can turn into a memory leak, because the effects on the machine will be the same as for a true memory leak, when you've lost control over the allocated memory.

Thus the string array example is no problem when the function is called 10 times. It IS a problem when it is called 1.000.000 times.

But it's handable. There is already memory control for the garbage collector, thus if the allocated memory is broken up to parts, the control is adapted and errors can be thrown. If the programmer tries to use pointer arithmetic on the array though only a single string is returned and the array is an internal of the function - BANG.

If the memory is not freed, the programmer can handle the leaky abstraction by making a copy of the string. There is one simple rule for him: Allocated memory is only handled in the original chunks. If this is too much to be asked from the programmer, well, then all hope is lost, anyway. Nothing in ANY language hinders him from allocating a big array of big objects which he never uses, but only keeps the reference alive.
Secure
Sunday, May 14, 2006
 
 
Well put.

I just hope I'll be at a safe distance when something goes BANG!
L. Lort
Sunday, May 14, 2006
 
 
Tom, OT:
I wouldn't worry about getting a job when you get out. With a TS/SCI and a CS degree you should have defense contractors fighting over you. At least that is the case here in the DC area.

Stay safe, and thanks.
JustNaCl Send private email
Friday, June 02, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz