The Joel on Software Discussion Group (CLOSED)

A place to discuss Joel on Software. Now closed.

This community works best when people use their real names. Please register for a free account.

Other Groups:
Joel on Software
Business of Software
Design of Software (CLOSED)
.NET Questions (CLOSED)
TechInterview.org
CityDesk
FogBugz
Fog Creek Copilot


The Old Forum


Your hosts:
Albert D. Kallal
Li-Fan Chen
Stephen Jones

Perl vs Python - GC and Reference Counting

We're developing a project that will be getting data from a web service, interacting with the high-performance "core" of the software (written in C++), and displaying data to web users. 

The Sr. Dev has been a Perl dev for 10 yrs, and likes it, but he agreed to give Python a try. 

It seems like we're going back to Perl because Python doesn't have explicit destructors - that is, you can't rely on a block of code being called when an object falls out of scope.  Since our program will be interacting with a database constantly, we need to have "cleanup" code that will run in a predicable way. 

I'm not against Perl, but I was especially attracted to Python.  Before I get completely disillusioned, could any developers who know more about this than me offer some perspective of the pro/con re garbage collection in Perl and Python (and Ruby, for that matter)?

I've been doing the academic research, but I'm hoping some personal perspectives will be more illustrative.
A. Skeptic Send private email
Wednesday, March 22, 2006
 
 
It's best to read the docs and get info directly from the developers and maintainers. I recall from the Perl documentation that it uses explicit reference counting on objects. When the reference count goes to zero, the memory is reclaimed on the spot (with maybe some special exceptions). You must be careful obviously to avoid circular reference islands. I imagine the Python documentation contains similar information about how Python handles memory.
Ian Boys Send private email
Wednesday, March 22, 2006
 
 
From Python 2.4 docs

Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether -- it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable. (Implementation note: the current implementation uses a reference-counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the Python Library Reference for information on controlling the collection of cyclic garbage.)

Note that the use of the implementation's tracing or debugging facilities may keep objects alive that would normally be collectable. Also note that catching an exception with a `try...except' statement may keep objects alive.

Some objects contain references to ``external'' resources such as open files or windows. It is understood that these resources are freed when the object is garbage-collected, but since garbage collection is not guaranteed to happen, such objects also provide an explicit way to release the external resource, usually a close() method. Programs are strongly recommended to explicitly close such objects. The `try...finally' statement provides a convenient way to do this.
Bart Park
Wednesday, March 22, 2006
 
 
It looks like CPython has deterministic destruction, since it uses reference counting and not GC:

http://tinyurl.com/zuwd7
sloop
Wednesday, March 22, 2006
 
 
Ian,
That's pretty much how I understand it works (with the reference counting), with the downside being the circular reference thing. 

Bart,
The "external resources" is exactly the reason why our Sr. Dev wants predictable destructors - we're hitting the database constantly, and need to release db resources asap.  We've tried putting together try/finally blocks, but it seems kludgy.  We've also tried building an RAII class to register and close the resources, but it seems like a lot of support code for every little thing. 

Basically, I'm wondering if the downside of reference counting (circular references, introspection is harder/impossible) are significant in ways I haven't thought of. 

Also, does this mean that place like Google (which uses a lot of Python) have some homegrown way of handling the resources problem?  I can't imagine our situation is that rare - how does one address that problem unless the language uses reference counting for GC?

(At the risk of being too verbose: One of the points the Sr. Dev made is that he feels that GC is largely about memory management, not resource management, but you still have to handle your other resources. He suggests that a language is lacking without a predictable way of calling cleanup code.  This makes total sense, so how is Ruby/Python supposed to work in production environment like ours?)
A. Skeptic Send private email
Wednesday, March 22, 2006
 
 
Bart is correct.  CPython uses reference counting _and_ cycle detection GC.

The python equivalent of a destructor is the __del__ method, which always gets called if an object's refcount hits zero. 

Now, if you have an object in a cycle, there are some corner cases to be aware of with __del__, but if you're coming from a Perl background, Perl can't detect cycles _at all_ so your co-worker really has nothing to complain about. :)
Jonathan Ellis
Wednesday, March 22, 2006
 
 
> It seems like we're going back to Perl because
> Python doesn't have explicit destructors

Does that mean you can't do something like "ref = undef" in python and have it go away?
son of parnas
Wednesday, March 22, 2006
 
 
Oh, and as for Ruby, they use a "pure" mark-and-sweep GC mechanism; no reference counting.  So the situation there is the same as other languages w/ a similar GC approach: you can define a "finalizer" but there is no guarantee when it will be called.  (In Java, it may not be called _at all_ if the VM terminates in certain ways.  I don't know if similar situations exist for Ruby.)
Jonathan Ellis
Wednesday, March 22, 2006
 
 
"Does that mean you can't do something like "ref = undef" in python and have it go away?"

you could write "ref = None" but in this case it's better to just write "del ref," which removes it entirely instead of implying that maybe you have a use for this new reference to None.
Jonathan Ellis
Wednesday, March 22, 2006
 
 
... in case that wasn't clear, __del__ is called whenever the refcount hits zero; "del ref" is just one way that could happen.
Jonathan Ellis
Wednesday, March 22, 2006
 
 
a = someobject()
a = None

In my experience, this python code always garbage collects quickly.

Wednesday, March 22, 2006
 
 
---
One of the points the Sr. Dev made is that he feels that GC is largely about memory management, not resource management, but you still have to handle your other resources.
---

That's my opinion as well.  Which makes me wonder that since he made that point, why is he trying to tie resource management (db connection cleanup) into the memory management part of the application (garbage collection) ?

Even in Java you can't rely on finalize() being called, so you have to manage your resources another way.  Personally I'm more of a fan of a more predictable method for managing database resources, even in perl.
Andrew Hurst Send private email
Wednesday, March 22, 2006
 
 
> Dev made is that he feels that GC is largely about
> memory management, not resource management

This is the beauty of C++ destructors in a block scope.

I don't see how perl helps you though. Resources need to be explicitly freed and in a server I always explicitly free memory. Python sounds the same to me.
son of parnas
Wednesday, March 22, 2006
 
 
CPython (the C implementation of the Python language) does use reference counting (plus GC). However, Python is unique in that it has multiple implementations. Jython (on the JVM) and IronPython (on .NET) both implement the same language independently.

The language spec requires some sort of GC, but it does NOT require deterministic destruction. So code that works "correctly" under cpython may perform differently under one of these other implementations.

Unfortunately, this caveat also applys to future versions of CPython. There's no guarantee that refcounting will always be in the interpreter.

The upshot is that, unfortunately, try-finally is the best bet for guaranteed deterministic destruction at this time.
Chris Tavares Send private email
Wednesday, March 22, 2006
 
 
Hmmmmmm. Why not just call connection.close() ?

The garbage collector is for managing memory. I don't like giving the GC responsibility for closing database connections, closing sockets, etc.
BenjiSmith Send private email
Wednesday, March 22, 2006
 
 
-----
Even in Java you can't rely on finalize() being called, so you have to manage your resources another way.  Personally I'm more of a fan of a more predictable method for managing database resources, even in perl.
------

True, I code in Perl and I can not rely on 'DESTROY' being called when the variable is going out of scope. The main source of problem is closure.

I prefer to use a wrapper for external resources like databases. It encapsulates when the connection and disconnection is done, and it can cache them as well.
badaiaqrandista
Wednesday, March 22, 2006
 
 
For a project like this, wouldn't you normally implement some sort of resource pooling, in which the initialization and destruction are easily handled explicitly and deterministically?  If the resources you're talking about are db connections, a resource pool is a natural fit for both management and performance reasons.  Object pooling may be a bit more of a reach, but if the system is going to have any sort of scalability, this is the obvious step, isn't it?
Mediocre Coder
Wednesday, March 22, 2006
 
 
Even if you do pooling it's extremely useful to have deterministic destruction so that your resources are returned to the pool automatically when they go out of scope.
sloop
Wednesday, March 22, 2006
 
 
And for DB connections, the pooling may occur at a lower level than your application, so you may not have to worry about it. For example, ODBC does pooling for you.
sloop
Wednesday, March 22, 2006
 
 
"It seems like we're going back to Perl because Python doesn't have explicit destructors - that is, you can't rely on a block of code being called when an object falls out of scope."

This isn't a failing of Python.  Some languages (C++, perl) have deterministic finalization; others don't (Java, Python).  C# is the only language I know that tries to have both (objects are finalized when the GC runs, whenever that may be -- but you can also implement IDisposable and then use the "using" keyword, and the Dispose() function will be called at the end of the using scope).

Lack of deterministic finalization doesn't mean that Python is unusable as a general-purpose language.  It just means you have to code yourself what another language would have done for you.  99 times out of ten (as an old lead used to say), you don't care when an object is finalized, so it's fair to say that if you DO care, explicitly calling "Dispose()" is not a huge burden.

All this means is that your team lead can't program the idiom he finds most comfortable.  Big whoop.  If you ever get THAT comfortable in a language where lack of feature X impairs your ability to appreciate the other language's improved support of A, B, and C, I'd say you've stagnated as a software engineer.
Alyosha` Send private email
Wednesday, March 22, 2006
 
 
Like Benji said, just add a close method.  You can't do RIAA in python.  I know this doesn't help you now, but 2.5 (out in August) adds the 'with' statement, which allows you to control resources when you need to.  i.e.

with x = file(a.txt):
    file.readlines()

would automatically close the file when the block goes out of scope.

http://www.python.org/dev/peps/pep-0343/
Grant
Wednesday, March 22, 2006
 
 
Make that RAII.
Grant
Wednesday, March 22, 2006
 
 
In Ruby, for external resources, we simply call the destructors methods and let C or C++ free their resources. Sometimes it's a "close" method, sometimes it's "destroy", sometimes a "delete" works. :-) But the best of Ruby is to close the resource automatically when we no more need it. Example:

def use_some_resource
  r = get_some_resource_somewhere
  yield(r)
  r.close
end

use_some_resource{|r|
  r.do_something_with_it
}

And then it's closed as soon as it's not needed anymore. You could that way return the object to the pool, for instance.

It's interesting how Perl can be "deterministic", though. Is Perl 6 going to be "deterministic" still? Like people have shown, many languages prefer to act at such things at the best times, to do things in batch, rather than to do things always incrementally.

Anyway, if you are going to access external extensions from a high level language, Ruby is a very good option, as it makes 90% of the process automatic:
http://www.onlamp.com/pub/a/onlamp/2004/11/18/extending_ruby.html
http://www.rubycentral.com/book/ext_ruby.html
Lostacular
Wednesday, March 22, 2006
 
 
Thanks for all the responses so far.  I should probably clarify that he's really pining for C++ style "when it falls out of scope the destructor is called" style cleanup code.  Memory management is not a concern at the moment, so we're not trying to combine the two, we just want a deterministic way of calling cleanup code.
A. Skeptic Send private email
Wednesday, March 22, 2006
 
 
"99 times out of ten (as an old lead used to say), you don't care when an object is finalized, so it's fair to say that if you DO care, explicitly calling "Dispose()" is not a huge burden."

It is a huge burden if you want to write correct code. Cleaning up properly without the help of something like C++'s RAII idiom or C#'s "using" is not trivial.
sloop
Wednesday, March 22, 2006
 
 
"You can't do RAII in Python."

Huh?  I already posted exactly how you can do just that.

Here's a concrete example. 

/crosses fingers that the forum doesn't de-format it

import socket
class Foo:
    def __init__(self):
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    def __del__(self):
        self.sock.close()
Jonathan Ellis
Wednesday, March 22, 2006
 
 
... when an instance of Foo goes out of scope, its socket will be closed cleanly.
Jonathan Ellis
Wednesday, March 22, 2006
 
 
... and yes, that's CPython-dependant, but if you think the OP or anyone else who says "Python" without qualification doesn't mean CPython, you're a fool. :)
Jonathan Ellis
Wednesday, March 22, 2006
 
 
Please do not confuse memory management with other resource management. The correct technique to release DB resources is to use RAII (either in the form of C++ local objects or "finally" declarations). Memory management is a totally different beast.
Achilleas Margaritis Send private email
Thursday, March 23, 2006
 
 
Jonathan,

The thing is you're assuming that there's only one reference to the Foo class.  Calling 'del x' doesn't guarantee that the __del__ method is called.  That happens when the reference count = 0, which may not happen when you expect.
Grant
Thursday, March 23, 2006
 
 
"The thing is you're assuming that there's only one reference to the Foo class.  Calling 'del x' doesn't guarantee that the __del__ method is called.  That happens when the reference count = 0, which may not happen when you expect."

Why does this matter? This should be no different than a pointer in C or C++, you need to make sure nobody else is using it before you delete it.
sloop
Thursday, March 23, 2006
 
 
Actually, you could know how many client objects point to a target object.  Just wrap it in a container.  Or, use weak references.  Or, both.

This problem is no reason to switch languages, IMO.
oh, the pain
Thursday, March 23, 2006
 
 
The standard java way to do this is with close or dispose methods. You can write unit tests using mock connections which test that they are getting closed everywhere.

You also can reduce the amount of cleanup code by using an approach like Springs JDBC support: You write classes which just set up the query and handle a resultset they are given, and the framework takes care of opening and closing everything for you.
Asd
Friday, March 24, 2006
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz