The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Random thoughts on macros

This is going to be a bit weird & not relevant to anything specific, but this is the only place where I can find enough smart people to kick stuff like this around :-)

I've been playing with lisp macros & feeling more & more envy... But, while it's true that lisp is a programmable programming language, even lisp's macros are syntaxless. What I'd *really* like to do is to be able to create syntax with macros. (There's a reason why all human & most computer languages have syntax--it's much less rough on the eye).

For example, if the underlying language were lisp, I'd like this:
lst = [f(x+y), g(z[i])]
to expand to this:
(let lst `((f (+ x y)) (g (z i))))
If I haven't screwed up the lisp, the two are equivalent, but the first one is much easier to look at.

In the most general sense, a macro is an operation which takes a chunk of text, performs an arbitrary transformation on it, and returns the resulting chunk of text. This need not be language-dependent. e.g., I have always wished that C had a switch that could operate on strings. Why not write a macro that transforms this:
strsw(pS) {
case "Alice": /* do something */ break;
case "Bob": /* do something else */ break;
}

into this:

if(strcmp(pS, "Alice") == 0) {
  /* do something */
} else if(strcmp(pS, "Bob") == 0) {
  /* do something else */
}

Or how about these, to save some typing:
loop(i, N) -> for(i=0; i<N; i++)
loop(i, N, K) -> for(i=N; i<K; i++)

So a macro, as I propose it, would be a lightweight lex/yacc that would operate at preprocess-time. (Note: is it possible to make a recursive-descent parser that's easy to use?) Bad programmers would abuse this horribly, because they would use it to create crappy ad-hoc languages that compile into C (or whatever). But for good programmers, you could (a) create a language that describes the problem, (b) use that language to describe the solution. Which is exactly what lisp does, but without syntax.

Thoughts?
John Foxx
Wednesday, May 04, 2005
 
 
It's a small task to grok and use Lisp syntax as opposed to writing a robust macro processor. You could start here:

http://www.livejournal.com/users/fare/77480.html

Believe me, if I'd had web (or any) access to the information available now as opposed to what was accessible in the mid-80's I'd never have written an infix to prefix translator - based on Ratfor syntax but emitting Lisp, not Fortran.

It would have been more elegant to write the translator in Lisp of course, but if you did so you probably would not need your macros any more :-) However the contraints of the target Lisp environment were so severe it was written in Ratfor. Besides, all objects look like a nail to a hammer. And the clincher was I had users who initially found infix more comfortable to read.

But it always was a bit of a crutch. The users soon got used to Lisp because that was what broke in the machine so we all eventually abandoned the translator when better Lisp tools arrived.

The big danger in any macro environment is it rapidly becomes too personally customised and the source code is one more step removed from that executing. It's SOP to become proficient in a macro assembler's syntax. Using #defines in C or c++ to name magic numbers is OK but nesting parameterised #defines is far too tricky by half and a maintenance nightmare. Coralling Lisp is folly IMHO.

In a weak attempt to bolster my case I offer this in support:
http://www.ariel.com.au/jokes/The_Evolution_of_a_Programmer.html

my 2c.
trollop
Wednesday, May 04, 2005
 
 
I've always wanted to try this out but never seem to have the time...

http://spirit.sourceforge.net/

If you get time to play with it, let me know what you think.
Jeff Mastry Send private email
Wednesday, May 04, 2005
 
 
John,

your first example can at least be made somewhat more readable by using macros (untested, I remember that "" can make problems in specific circumstances):

#define equalstr(s1,s2) ( strcmp((s1), (s2)) == 0 )

if( equalstr(pS, "Alice") ) {
  /* do something */
} else if( equalstr(pS, "Bob") ) {
  /* do something else */
}

The second example is perfectly doable with macros, but I *strongly* recommend to don't misuse the macro feature to simulate the language structure itself. Else, if we have a Pascal fan:

#define loop(i,N)  for((i)=0; (i)<(N); (i)++)
#define begin  {
#define end  }

loop(i, 100)
begin
  dosomething(i);
  dosomethingmore();
end


However, I have currently another problem in C for which I wish there would be an elegant macro solution. I'm currently extending my libraries to fully support wide chars (and therefor Unicode). The wide char extension to the old C standard was done fully compatible by implementing any string and char related function *again* with wide chars:

size_t strlen(const char *s)
size_t wcslen(const wchar_t *s)

Now taking a function like StrReverse, I found myself to duplicate the code, doing the same (both versions should be implemented, the wide one is optionally included):

char *pStrReverse (char *pStr)
{
    char Tmp, *pBeg, *pEnd;
    size_t Length;

    Length = ( pStr ? strlen(pStr) : 0 );
    if (Length > 1)
    {
        pBeg = pStr;
        pEnd = pStr + (Length - 1);

        while (pBeg < pEnd)
        {
            Tmp = *pBeg, *pBeg = *pEnd, *pEnd = Tmp;
            pBeg++, pEnd--;
        }
    }

    return (pStr);
}

#ifdef FLAG_ENABLEWIDE

wchar_t *pWStrReverse (wchar_t *pStr)
{
    wchar_t Tmp, *pBeg, *pEnd;
    size_t Length;

    Length = ( pStr ? wcslen(pStr) : 0 );
[...]

#endif

While I could live with it when the functions are such small, because both versions are on the same place and within sight, it becomes a pain for the more complex functions. E.g. I have a function to match a string against another with wildcards * and ?.

int StrMatchWild (const char *pTest, const char *pRegExp)

It matches "*pic0??.gif" against "pic010.gif" and "tn_pic011.gif".

I thought about using macros, but it becomes really ugly with all the line-connecting \, and color-highlighting in the editor is another problem.

I thougt about using includes, predefining all the relevant macros:

inc.c:

[...]
CharType *ReverseName (CharType *pStr)
    CharType Tmp, *pBeg, *pEnd;
    Length = ( pStr ? GetLength(pStr) : 0 );
[...]


main.c:

#define ReverseName  pStrReverse
#define CharType  char
#define GetLength  strlen
#include "inc.c"
#undef ReverseName
#undef CharType
#undef GetLength

#ifdef FLAG_ENABLEWIDE

#define ReverseName  pWStrReverse
#define CharType  wchar_t
#define GetLength  wcslen
#include "inc.c"
#undef ReverseName
#undef CharType
#undef GetLength

#endif

At least there is no code duplication to maintain, but I would not be completely happy with this version, since there are a LOT of defines for larger modules.

And I thought about writing a pre-preprocessor (e.g. a Perl script) to automatically produce the code duplication:

<ProduceWide>

char *pStrReverse (char *pStr)
[...]

</ProduceWide>

The script would take anything between the tags and output an additional wide version by substituting the types and function names, and by extending the name with a "W". It can be put into the make-process as a pre-compilation step. The best solution I see, so far - no duplication to maintain (DRY principle), one straight forward code without any obfuscation. Only have to take some care to use a format expected by the script (e.g. the W is put before the first uppercase letter of the function name).


If anyone have better ideas, doable with native language features and macros, I'll be glad to hear about them.

Wednesday, May 04, 2005
 
 
Better idea: You want unicode enabled, remain with 8-bit characters and switch to a UTF-8 encoding. Changes to the code: In modern glibc systems (Linux, FreeBSD), setting an environment variable (LC_CHARSET="utf-8" IIRC). In Windows, adding a call to set the codepage to UTF-8.

wchar is the wrong solution, and microsoft's TCHAR doubly so.
Ori Berger Send private email
Wednesday, May 04, 2005
 
 
Ori,

yes, I thought about this, too. I'm not yet sure about the overall consequences - the given pStrReverse will fail hopelessly, since UTF-8 multibyte encodings are reversed, too. Similar for StrMatchWild, which examines single characters. I have to read somewhat deeper into the matter to see which of my basic functions have to be substituted or extended, and how far the whole thing is depending on the specific platform and system, since full portability is definitely an issue.

Wednesday, May 04, 2005
 
 
Why exactly is wchar the wrong solution?
Questions
Wednesday, May 04, 2005
 
 
Jeff,
I try not to do too much in C++ these days, but there's a lightweight version of the same idea called the Toy Parser Generator: http://christophe.delord.free.fr/en/tpg/
It's a Python class that you inherit from & then put a Lex/Yacc-like grammar description into the derived class's doc string. I wouldn't try to use it for anything too complex (like the thing I just proposed), but for writing parsers for config files, or simple telnet command-shell daemons, it could be just the thing. I haven't had a chance to try it either but it looks cool.
John Foxx
Wednesday, May 04, 2005
 
 
To Trollop,

You may be right, but here's the fundamental question: do I need to give up syntax to get the power of customization? I think syntax is more than just a bad habit: it's fundamental to the way our minds work--by pattern-matching.

Consider mathematicians, who are not constrained by implementation issues. What kind of notation do they create for their own use? Syntax--terse, yes, but very symbolic. Why? Because you can see, at a glance, that
f(x) is a function
x[i] is an array subscript
[x,y,z] is a list
{a,b,c} is a set
{f(x), g(x)} is a set of functions
...and so forth. So I wonder if there isn't some way to have your cake & eat it, too.

I recently saw a blog with a proposal about a change to Python (adding a "for x from list" as distinct from "for x in list"). I forget the details but the new construct would save a line of code. What ensued was a s**tstorm of controversy, ranging from "Great idea!" to "Why change the grammar to save one line of code?" (Here's a hint: because the guy who proposed it has to do this 10,000 times in his program & stands to save 10,000 lines of code).

I can't read this & not wonder: must the language be utterly non-extensible & under the exclusive control of a priesthood? Shouldn't a hacker be able to add a construct for his own special case without imposing it on the global hacker population? And must we give up syntax in order to have these things, when our brains are so hard-wired for pattern-matching?

So these are some thoughts on the whole thing... Not very useful, perhaps, but thinking about them will keep you sharp :-)
John Foxx
Wednesday, May 04, 2005
 
 
For the char/wchar_t debacle, for best results use C++. Create one C++ file that has the code as a static inline function templated on the character. Then create 2 extern "C" functions in that file that forward to the relevant templated function. (This level of templating should not stretch even the lamest of modern C++ compilers -- not even the famously template-challenged VC++5.)

I've not found a neater way of doing this kind of thing without resorting to other tools (or endless #define ugliness, which I've never even bothered to try as it looks so horrible) and have used just the above method for this very problem.

UTF-8 might be a better way round it, particularly as you're not limited by the wchar_t range, but working char code->equivalent working wchar_t code is a pretty mechanical operation, and there's plenty to be said for that.
Tom_
Wednesday, May 04, 2005
 
 
Take a look at the LOOP macro for a well-known example of how you can introduce syntax to Lisp.

http://www.lispworks.com/documentation/HyperSpec/Body/m_loop.htm#loop

From the examples (I can tell already this is going to look great):

(loop for n from 1 to 10
      when (oddp n)
        collect n)


(loop as n = (progn (format t "~&Number: ")
                      (parse-integer (read-line) :junk-allowed t))
        while n
        do (format t "~&The square of ~D is ~D.~%" n (* n n)))
Art Send private email
Wednesday, May 04, 2005
 
 
wchar_t takes 2-4 times as much memory compared to UTF-8 for Latin languages, and usually slightly more for far east languages (although if you use the latest editions to Unicode, utf-8 representation might actually be slightly bigger).

contrary to common beliefs, wchar_t does NOT allow you to reverse the string, match single chars, or otherwise break it arbitrarily and expect meaningful answers -- for the simple reason that chars often influence each other.

Arabic letters change their graphic form depending on context. Thai has "vowel" style letters which modify other letters, such that if you reverse the *characters* in the string you get a string that makes no sense. Hebrew has simialr problems. And even European languages get them with accents, if the string uses decomposed characters.

Microsoft's 16-bit WCHAR (or whatever it's called) is even worse, because it's still a variable length format (UTF-16), and you get all of the problems of space use/compatibility/etc., and non of the benefits of a one char = one memory cell.

If you want text processing to work for all human languages, you can't inspect individual chars, and you can't even simply concatenate strings. Thinking that you can makes you work a lot for little or no gain.
Ori Berger Send private email
Thursday, May 05, 2005
 
 
John,
it's true staring at a printed page of Lisp can lead to the "Lost in Sodding Parentheses" or "Look, It's Slowly Permuting" sensation.

But a pretty printer aids comprehension (I wrote one in Lisp for Lisp) as does an editor/debugger and a symbol crossreferencer (blush, ditto) and by then Lisp was pretty easy to read or at least not as hard to read as piece of unformatted dump.

I like your point about syntax in that brackets / braces / parentheses convey meaning and I still think X=2 travels more quickly into the cortex than (setq X 2) HOWEVER by adopting a preprocessor you are setting yourself apart from the mainstream and working at least at one remove from the target environment. Unless there are hooks back to the source for error analysis you can be executing fairly blind.

Then there is the development environment. Much has changed since the raw text edit/load/run/debug loop of 20 years ago -  there are IDEs now that do most of the dogwork for you and may be flexible enough to accomodate a preprocessor and even run it for you - a great thing if it cuts out repetitious, errorprone coding but adressing your last point, the Lisp syntax is so simple that extensions ARE simple to implement once you see the whole deal is essentially (print (eval '( ....))), that code and data share the same structure and that often whenever you feel the itch to instantiate a macro expanxion you really should consider calling a function instead.
trollop
Thursday, May 05, 2005
 
 
Common Lisp does allow that kind of syntax through customizing the "reader," the thing which slurps in characters to turn into code. (In other languages, it's called the lexer, tokenizer, etc.) But I've only done minimal things with it; you can ask around Usenet about more extreme stuff.

The guys behind IntelliJ (the Java IDE) are pursuing "language oriented programming." I suspect it is more conducive to what you want than Lisp. Unfortunately, when I tried speaking with them, it went to hell with misunderstandings... so unfortunately I was unable to communicate some potentially useful lessons from Lisp. But maybe that's for the best, dunno.
Tayssir John Gabbour Send private email
Thursday, May 05, 2005
 
 
Trollop says: "...by adopting a preprocessor you are setting yourself apart from the mainstream..."

I've been apart from the mainstream for so long now I can't remember what it's like to be within shouting distance of the mainstream :-) Time to go tinker with my Harley & rattle some windows...heh heh
John Foxx
Friday, May 06, 2005
 
 
Ditto, just minus the hog.
trollop
Friday, May 06, 2005
 
 
More random thoughts:
- Syntax is only so important
- What's the relationship between the following?
  - text-based macro/code-generation languages like m4 or cog
      http://www.nedbatchelder.com/code/cog
  - syntactic macros that know about the target language. This is better than text-based substitution, but don't try to retrofit it on an existing language.
- Macro language == Target language? If not the body of work in integrating relational database access into OO languages is relevant.
- Syntactic macros for arbitrary syntax (=, {, }) is to my knowledge an unsolved problem. There doesn't seem to be anything useable anyway.
- Forget arbitrary syntax, let's stick with the uniform syntax of lisp. Should we use position-in-argument-list or keyword to convey meaning? The loop macro cited above can do either. I prefer keywords:
  http://www.cs.utexas.edu/~akkartik/feed.cgi?lisp.html
because conveying semantics using nested lists can get quite ridiculous. (e.g. lisp's let or loop) I'm not confident of this, though. Are there drawbacks to keywords?
Kartik Agaram
Wednesday, May 11, 2005
 
 
Clarification on this:
"Macro language == Target language? If not the body of work in integrating relational database access into OO languages is relevant."

The word 'macro' implies that you preprocess one language then another. (In lisp they're both the same language, but eval and macroexpand are still 2 distinct stages)

Alternatively you can try to have the two languages coexist, in which case you run into all the problems that people have had with code like this:
  for i in db.exec("SELECT name FROM emptab WHERE ..")
    ..

So this point is not about whether there are multiple languages, but whether  they need to be aware of each other.
Kartik Agaram
Wednesday, May 11, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz