The Joel on Software Discussion Group (CLOSED)

A place to discuss Joel on Software. Now closed.

This community works best when people use their real names. Please register for a free account.

Other Groups:
Joel on Software
Business of Software
Design of Software (CLOSED)
.NET Questions (CLOSED)
Fog Creek Copilot

The Old Forum

Your hosts:
Albert D. Kallal
Li-Fan Chen
Stephen Jones

Is the shuttle software really that great?

Remember the recent thread about how much better the shuttle software supposed is, as compared to other software?
Sunil Tanna
Monday, November 06, 2006
Hm, that's sort of a y2k type problem, being scared of January 1st.

It's not that the software is buggy, it's that they are scared something bad 'might' happen if the shuttle is in space on the 1st.
Monday, November 06, 2006
My read is the ground software will get out of sync with the shuttle software and that *might* have bad consequences.  I'll call it a software flaw.  They can't launch the shuttle.
Monday, November 06, 2006
OK, but it's more of a design or project management flaw in that they don't know if it is a problem or not and that somebody in the past decided that the clocks at ground control would work on a different standard from the shuttle, which is really silly since I know that they have stuff on the shuttle that makes time corrections to take into account effects of relativity!!!
Monday, November 06, 2006
And when people talk about the shuttle software being "great", they generally mean extraordinarily low defect rates. Not that it is great in every sense that software can be (for example, I expect that the shuttle software is absolutely terrible on initial usability).

And it's probably splitting hairs, but this particular problem seems to me to be a requirements specification failure, not a construction failure. For this problem the shuttle software is "broken as designed".
Bill Tomlinson Send private email
Tuesday, November 07, 2006
One of the claims in the articles we read before, was that the code and specifications were both fantastically good, because of the process used. Here we see something major that seems to have gone wrong in the specification and process.

Another way to look at it, is that if you eliminate variables (I mean things that vary - not programming language variables) and features  - it's a simple way to reduce bug count.

The more you nail down the functionality, reduce flexibility, and standardize the usage of the software -  less possible bugs.

Would this approach perhaps explain some of the shuttle software's supposedly lower bug count?
Sunil Tanna
Tuesday, November 07, 2006
It's impressive that they know about it and are taking precautions. It's not like they had the shuttle up and noticed something wasn't working which is the typical approach to software bugs.
Tuesday, November 07, 2006
Nope, I'm with Sunil on this.  Not allowing the Shuttle to be up over New Years is a hugely expensive manual work-around for something that SHOULD be trivial.

As has been pointed out, this "Y2K" type bug is well understood, and is well handled by 99% of systems out there.  For a CMMI Level 5 organization, SOMEWHERE in the process this issue should have been worked out in excruciating detail.

Which is to say, yes, the Shuttle software really IS that great, and yes, even CMMI Level 5 generated software can have some pretty expensive bugs in it.
Tuesday, November 07, 2006
"Here we see something major that seems to have gone wrong in the specification and process."

Not really.

You have to remember that the Space Shuttle was not originally intended to be operational 25 years after its maiden flight, nor was it intended to fly so few flights, nor was it intended to be so expensive to maintain.

Also keep in mind that it wasn't until the Challenger disaster that it became politically "unacceptable" for astronauts to die.

The Shuttle software has worked remarkably well considering that its requirements and specifications were drafted in the late 70's.
Tuesday, November 07, 2006
Sorry, TheDavid, but it's ALWAYS politically unacceptable for Astronauts to die.  In the US, anyway.

The Apollo program was delayed by a year by a fire in the Apollo 1 capsule on the launchpad.  Those were the first American Astronauts to die, and it was a big hairy deal.

Challenger, and then Columbia, were both tragedies.  You are correct in that Shuttle has always been more expensive to operate than originally assumed.  But 9-month extended Shuttle missions were contemplated at one point -- this "year rollover" bug would have been a major problem in that case.
Tuesday, November 07, 2006
Perhaps I mispoke.

It's always politically unacceptable for soldiers to die on the battlefield - you will never find someone saying "Let's send little Timmy to Iraq" but instead, they will always say something like "I have no choice but to put Timmy at risk."

The Mercury, Gemini and Apollo programs had the later philosophy in the sense that the astronauts knew they were placing their lives at risk. Politicians believed that if we didn't take the chance and risk our lives, we could very well loose the space program to the Russians. While Apollo 1 was unfortunate and regretful, the nation still largely believed in the mission to put a man on the moon.

In contrast, after Challenger and Columbia, I heard a lot of people openly questioning whether we belonged in space. They went as far to say that we should cancel the whole program unless we found a way to make it 100% safe, so that no astronaut ever died again. It was...  stretching the analogy a bit... like saying that Detroit shouldn't build cars unless they can guarantee your safety.

However, I will cede the point about the 9 month extended missions. I don't remember hearing about that but as soon as they ramped up to such long duration stays, yes, it should have gone into the design specifications and they should have fixed that bug.
Tuesday, November 07, 2006

You're either trolling in this thread or you're just not thinking. You tell me which.

The shuttle has been in service for more than 20 years.

Not being able to see issues nearly 30 years in the future is not a failure of a spec. A failure of a spec is not being able to see 30 days, or even 30 weeks. But 30 years?

Face it, best practices have worked very well for the shuttle software, but nothing is perfect. Chances are that codebase is far more perfect than anything you or I have ever worked on. But that doesn't mean that pervasive errors like this don't exist.

This means nothing. And I think you know that...
Shane Harter Send private email
Tuesday, November 07, 2006
Shane, watch out, I can see brice sneaking up on you.
J.B. Send private email
Tuesday, November 07, 2006
Shane,  you need to actually read the article rather than telling us what you think it says.

There isn't anything in the article, that says this is a new problem, or this is a new problem that appeared after Y2K, or this anything to do with the shuttle being designed in the 1970s and being 30 years old.  The article simply says the problem became more important to NASA after 2003 (with the implication because of greater safety concern and a pressing schedule).

The underlying glitch is the system is possibly unable to handle the transition Dec 31 to Jan 1.  And Shane, perhaps you know something that mere mortals like me don't, but as far as I know this transition happens/happened every year - even in the 1970s.

So, as far as I can see, the fundamental problem, requiring a hugely expensive manual work round, appears to have existed since the 1970s.  Now that's quality!
Sunil Tanna
Tuesday, November 07, 2006
What I don't understand is why NASA can't just simulate the issue and see what happens.  They have so many simulators that, I thought, were very much the same as the real thing.  This doesn't seem like the type of bug that would be very hard to artificially reproduce on the ground.

It's also kinda scary that they're not sure what would happen.  It's almost like some higher up got spooked by the y2k (non) bug and overreacted.
Tuesday, November 07, 2006
There was an earlier thread on this forum discussing the PERFECTION of code that was being utilized by NASA for which they not only strived but did in fact (to some degree) achieved - it was equivalent to some craziness involving only being able to write 5 lines of code per day.....

X-Ref Discussion Post:

The SHUTTLE code may be approaching perfection….to which I agree….perfectly incapable of handling a single date change between years?????????

You have got to be kidding me.....if that software is so complex that a single date change between years causes a concern for the NASA staff, then someone needs to disband the Shuttle program and work on simplifying that level of complexity because IMO there should NEVER be in existence this kind of reluctance over a shuttle mission based on a damn single date?
Brice Richard Send private email
Tuesday, November 07, 2006

I did read the article--it's all of 1/2 page long--and the funny thing is, you're the one that seems to be jumping to conclusions.

Like, for example, what is the "hugely expensive manual workaround" that you spoke of?

And secondly, why you assume this has ever been an issue before today.

At least you answered my question. Now I *know* that you're just trolling.
Shane Harter Send private email
Tuesday, November 07, 2006
This thread = A confederacy of Dunces.

The shuttle has no problem with date changes. None. The thing will be fine. The software won't care.

The problem is interoperability between two systems. Two applications can be perfect (although, I don't think that word is used anywhere in the original article) and still not work well with one another.
Shane Harter Send private email
Tuesday, November 07, 2006
As I understand it, the real problems are...

1) The ground computers in Houston, running software written by contractor X, assume January 1st is the first day of the year, without manual intervention.

2) The shuttle's onboard computers, running software written by contractor Y, assume January 1st is the 366th day of the previous year, without manual intervention.

3) NASA would really really like to not have to rely on that manual intervention.

The apparent causes are...

A) Having two separate systems that require manual intervention to synchronize, gives NASA the opportunity to catch errors.

B) The calendar year doesn't match Earth's revolution around the sun exactly, which can create problems within the realm of orbital mechanics.

C) Computers then were not powerful enough or efficient enough that NASA had the luxury of calculating orbital mechanics "on the fly". It was cheaper to do everything in advance and transmit the data as needed.

I'm sure the engineers back then did propose several solutions, some of which may have been optimal. Given all of the contradictory constraints and requirements, it's hard to believe there was a single "right" answer.

Again, this is all 70's era thinking.

Today, you can buy a PDA for $100 that has more computing power than the first generation Shuttle computers. If they were actually taking an IBM z90 mainframe up into space with them every time they left the pad, I'm sure the software would be more robust as far as the year changeover issue was concerned.
Tuesday, November 07, 2006
Yes, this is a 'problem' that has appeared as a mainly political issue since 2003. Before 2003, there was no problem flying the shuttle across the Jan 1 date since there were no known problems with doing so. In 2003, they adopted the policy of taking no unnecessary risks whatsoever, at which time they adopted the 'no jan 1' policy, 'just in case'. It's not that there are any known bugs. It's that there might be, so why take a chance. There are no bugs about this in the shuttle codebase, or at mission control, but when you combine the two the possibility is there of some concurrency like issues that no one could possibly have foreseen.
Tuesday, November 07, 2006
> what is the "hugely expensive manual workaround" that you spoke of?

When you have ideal flight conditions and you have to delay the launch until the next flight window, there can be an opportunity cost of millions of dollars, increased risk, pushing the whole schedule back, etc. That's the manual workaround - not launching even though it would be a good time to because of the weather. Also, late december has fewer frosty days than late january, so you also load on some risk regarding ice build up, damaged tiles, and so forth as you push into winter.
Tuesday, November 07, 2006
"The problem is interoperability between two systems. Two applications can be perfect (although, I don't think that word is used anywhere in the original article) and still not work well with one another."

A big problem I've faced with data warehousing. We had one system where the key identifier they were using didn't match the company standard. It worked fine within itself, yet was wrong when it went outside the silo.

Everything seems to be a matter of perspective.
Steve Hirsch Send private email
Tuesday, November 07, 2006

You can't have it both ways.

Either this was a "problem" since 1970's when the software was designed, meaning they just planned to never fly the shuttle thru a calendar change, OR it's a new problem that wasn't forseeable 30 years ago and is only now being discovered.

If the problem really is 30 years old, then the "opportunity cost" (btw- that term means something entirely different than how you're using it) doesn't apply. It works as designed.

If the problem is new, you can't blame them for not being able to see 30 years into the future.
Shane Harter Send private email
Tuesday, November 07, 2006
You're an idiot Shane.

> Like, for example, what is the "hugely expensive manual workaround" that you spoke of?

Use some common sense for a change.  You think it costs nothing to delay the mission a month, having thousands of people sitting on their hands, and miss a perfectly good launch date?

As for the design v. coding issue... you're "Oh it works as designed BS".  Anyway that you slice, it's a big problem.

In fact, by the original criteria from when the shuttle was originally designed, it was probably a bigger problem then than it is now 

- When shuttle was originally supposed to be flying lots of missions per year, all year round --  as well as being able to fly military missions on short-notice -- and scrubbing most of December from the calendar, would hardly have fit either those design goals.

- It's only later (now), with the shuttle flying few missions, and no short-notice military missions,  that crossing a month off the calendar isn't quite as bigger deal.
Sunil Tanna
Tuesday, November 07, 2006
Side note: I went through all the shuttle missions and none have straddled the new year to date:

Just to save any one else the time of looking through them.
Tuesday, November 07, 2006

The fact that YOU think I'm an idiot reassures me. If you'd shown the slighest competence in forming opinions I might have taken offense to it, but based on your impressive display of ineptitude, I think I can take it as a compliment.

Like I told scott, you can't have it both ways.

If this is a 30 year old problem--and it appears to be--then it's not so much a problem as it's "by design." Even if you think it is a defect, it's been part of the system since day 1. Therefore, they don't DELAY ANYTHING. It's not as if they've scheduled a launch that now must be pushed back.

It's like taking a vacation. If you scheduled a vacation to cross the calendar year, and you had to push it back, that would have a COST. But, if you know in June that you can't schedule your vacation thru a calendar change, that doesn't cost you ANYTHING.

Sunil, it's clear you have some sort of predjudice over this subject--although I can't for the life of me understand why--but you're not really making any sense.

It's funny how some people take so much issue to the proposition that NASA creates better software than you've created. I don't get it. Why does that illict such personal knee-jerk reactions? Insecurity? Can some explain it to me?
Shane Harter Send private email
Wednesday, November 08, 2006
A defect "as designed" is still a defect.

Being unable to fly over a year-end, is a defect, and it must have been a problem from the start, for all the reasons already in this thread:  9 month extended missions, short-notice military missions, flights sround the year.  I also remember the same issue being discussed in previous years (prior to 2003).  The defect is LESS of a problem now, than it was originally, because the environment has changed (the shuttle is now hardly used), but it's still a defect.

Your vacation analogy that there is no cost for this defect is one of the dumbest things that I've heard.  It's not like your vacation where you can simply reschedule your personal work to fit.  Use some common sense.

- By the original goals of the shuttle, the year-end defect is enough to stop the shuttle fufilling it's original missing goals. 

- By the current operational process, it still costs a huge amount of money  to have thousands of people sitting around.  What's more if they are planning to do a launch in November or early December, there's lots of prep work done for that - and if they slightly miss the launch date, and then have to add delay to go into the new year, some of that prep work has to be repeated. 

NASA vs My Software.    You're on the wrong lines here. There's no comparison.  I wouldn't even try to make a comparison.  They are so different, it would be ridiculous to make a comparison.

Right now, I'm selling standardized Windows/Web server for < $100. I am personally responsible for more KLOC than  the whole NASA team of 260.  Fortunately if my programs crash or have bugs, nobody is killed.  And I'd agree that they have less defects than me.
Sunil Tanna
Wednesday, November 08, 2006
"and it must have been a problem from the start"

This is the basis of your entire argument. If it was really such a problem, don't you think it would've been fixed over 30 years?

Could it be that you just don't understand the requirements, and you're blowing this entire thing out of proportion?

And a "defect by design" is nearly a contradiction in terms. If it's BY DESIGN, and THEY DON'T HAVE AN ISSUE WITH IT, then it doesn't seem like a problem to me.

For some reason you've latched onto this is a big problem that, you claim, costs lots of money, when all of this is just pure speculation on your part.

So really, let it go. My argument the entire time has been that nobody claimed the software is perfect, that this has nothing to do with the software itself and instead with interoperability between systems, and that it's not a big deal.

Your argument is nasa developers can't be that good because this is a big defect that costs lots of money and is somehow indicative of the quality of the software or the priorities of the software team.

You tell me: Who is jumping to conclusions based on the tiny shred of information that is available here?

And you call me the idiot.

I love it.
Shane Harter Send private email
Wednesday, November 08, 2006
I didn't even read your whole post before replying. Just your first paragraph was enough o display the flaw in your thinking.

But you once again harp on the "people sitting around" thing.

Do you think that if they're not flying a shuttle they're just sitting around? If that's the case, then you understand that they must all "sit around" for, oh, 40 weeks a year.

You say "common sense" like your argument makes ANY SENSE, let alone common sense.

The vacation analogy was apt. It just doesn't support your ill-informed argument. 

How about this: When you stop jumping to wild conclusions about things you can't possibly know, then I'll take your argument seriously. Until then, you're just another guy on the Internet talking out of his ass.

One more thing: "A defect by design" is not called a defect. At worst, it's called a trade-off. And trade offs have to be made sometime.

Finally, you've mentioned over and over that this prevents the shuttle from reaching its original goal of many dozens of missions each year. Do you realize that this was never, ever realized? It's just that NASA never built a reusable craft like this. They figured you could just turn these things around like an airplane. They learned it doesn't work that way. Many weeks of reconditioning must be done between flights. It's not as if this defect has any affect at all on NASA being able to meet the original 1970 goal of dozens of missions.

That is a big problem with the shuttle program, but it has nothing to do with software. And the fact that the shuttle program has problems doesn't mean it's not a success. The sole goal of NASA is exploration and advancement of science. Next time we build a reusable craft we'll have a lot of lessons that we can apply. That sounds like an advancement of science to me.

You never did tell me: What weird motivation is inside your head to belittle something that you know so little about?
Shane Harter Send private email
Wednesday, November 08, 2006
Sunnil......I am reassuring your position vis-a-vis Shane Harter.....he is not an idiot but he thinks he knows all....he's full of shit most of the time but let him rant over his hack philosophies in software development....

Just call him Shane for Brains....
Brice Richard - Not a Shane for Brains Supporter Send private email
Wednesday, November 08, 2006
This is a quote from Shane for Brains:

<<<And the fact that the shuttle program has problems doesn't mean it's not a success.>>>

Okay slick you be a part of a program that's run not ONE but TWO shuttles into the ground killing all of its crew members and tell me that is a success?????

From what planet were you hatched?
Brice Richard - Not a Shane for Brains Supporter Send private email
Wednesday, November 08, 2006
So you measure a successful space program by "no one dies?"

Sorry, Brice. Most problem domains are not centered on creating lightweight Microsoft Access apps using VBA.

Not many things in this world are a success by your measure, Brice. Nearly every high technology has killed people. That's a price of progress.

What kind of fantasy world do you live in?

And please, please keep using the phrase "Shane for Brains." It's not at all clever, and in fact, it's pretty stupid. It makes people scratch their heads and wonder what the hell this dude brice is talking about. If my name sounded, even slightly, like "shit" it would work. Since it doesn't, it reads more like a compliment than the bad phonetic-pun that it is.
Shane Harter Send private email
Wednesday, November 08, 2006
Shane, you're full of it.

Your argument boils down to: (1) There is nothing wrong with a space vehicle that can not fly over a year-change.  because (2) Perhaps the designers intended it.

But your argument falls apart, because if you look at the history of the shuttle, there is absolutely no way that the original mission design goals (i.e. the shuttle program designers) intended it.  So the year-end limitation was introduced at some level below the shuttle program designers,  perhaps software designers, or system integration, or something else.

Shane, stop looking down your nose at the rest of the world. 

When I comment on what is undoubtedly a flaw in the shuttle (the software doesn't work as intended), I'm apparently "belittling it".  Meanwhile, you're allowed to criticize the program anyway you want - what puts you in such a privileged position?

And btw, your vacation analogy really is a bunch of crap.  NASA themselves say otherwise (each delay to launch has an estimated cost).

If a launch is delayed: Sure _some_ people at NASA can probably do other stuff.  But some can't.  And there are also wasted launch prep work, as I've already told you.

It's a fact, it does't matter how much you deny it: If a launch is delay, there is a cost.  NASA acknowledges as much. 

Just a couple of examples I found right at the top of Google - there are many more going back over the years.

$2 million cost for delaying Discovery launch:

$616,000 cost for delaying an Atlantis launch:

Seems you don't know half as much as you think you do.
Sunil Tanna
Wednesday, November 08, 2006

I don't "look down my nose" at the rest of the world. I do look down my nose at you.

I swear, maybe you're ESL or something, because you apparently don't understand basic concepts.

If they cannot schedule a launch around a calendar change, that doesn't mean there's a delay. A delay is when they wheel the shuttle out to the pad, fuel it up, get it ready to fly, and then they have to undo everything they did for weather or something. *THAT* costs money. *THAT* is a waste.

But since they know that they cannot fly over the calendar year, there is *ZERO* chance that they would wheel it out on 12/29, fuel it up, etc, and then realize they couldn't fly.

Do you realize that if the if the shuttle misses its landing strip, it has to glide to Africa, and if it misses that, it will crash land? It's true. It's by design, but by your measure, that would be a defect. After all, it would cost a lot of money to ditch a shuttle in the Indian Ocean.

Or how about the fact that if a Shuttle flys to the Hubble it cannot later dock w/ the ISS. They're in different orbits, and the shuttle doesn't have the ability to change its orbit in such a way. Again, this is a design decision. But you'd call it a defect. After all, if they needed to service both the hubble & the ISS, they'd have to do multiple missions, costing lots of money. 

Your entire argument is based on assumptions you've made. And you accuse me of being arrogant. Why? Because you & Brice Richard (who BRAGS ABOUT HIS IGNORANCE! He wears his lack of education like a badge!)

I'm not afraid of telling someone on a fricken internet message board that they're wrong just because they might try to spin it as me being arrogant. Of course, the tone of your posts is no different than mine, but for some reason you are not "looking down you nose?"

You're a bucket full of contradictions, Sunil.
Shane Harter Send private email
Wednesday, November 08, 2006
You're still full of crap.

There is absolutely no fucking way that the shuttle, when it was conceived at the high level, was intended to be unable to fly over the year end.  The fact it can't was something that was introduced at somewhere between the high-level vision and the implementation  Whether you want to call it a defect or a limitation, it's still something where the shuttle doesn't meet its original design goals.

Nobody's suggesting they wheel the thing out on 29/12 and then wheel it back.  It isn't quite as bad as that.  But they are building up to launch in early December, and if they miss the date (say for some other reason), they then have to wait until January.  And some of that build up will be wasted, and there will be extra work and extra costs because of it that rework.  Not all the work that is done in prepartion for a launch is done on the pad or the way to the pad... you are clearly a moron to even (implicitly) assume as much.
Sunil Tanna
Wednesday, November 08, 2006

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz