The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Thinking of a good way to redesign a complicated hack of a DB

Hello everyone.

I'm currently looking at redesigning my company's existing database (actually I'm looking at redesigning the entire system).  The current thing is a mishmash hackjob put together by a glorified, self-taught Access developer who evidently never heard of anything but tables and queries; it seems like every time some new analysis was needed, he'd hack it together in Access and then export it to SQL Server as a table.  There are currently 96 tables (see below, however)

There's a slight snag in this process, though.  We're an e-commerce company, and we also take care of the product/etc information for another e-commerce company (we have a partnership, but we don't own them).  There are two databases, one for each company, with 90% of the functionality duplicated (the products are the same; the only difference is the list of customers and the prices of the individual items).  Thus, changes made to one (or to the web site, or to the back-end administration site) need to be made to the other or things will break and not work.

The only way I can think of to consolidate this data and differentiate between companies is to combine the Customers table, and then give each company an ID so I can include a lookup table ("CustomersLookup") that relates customers (e.g. SiteID 1 and CustomerID 32343, SiteID 2 and CustomerID 23433), so I can then provide a View that segregates the customers by company.  I would have to do similar with the pricing; include a table that contains the prices for each company, and join it on a lookup table so I can extract prices for the right company.  In addition I'm going to have to strip out redundant fields (Current databases break all sorts of Normal Forms).

There's also the issue of parsing the vast amounts of data from suppliers and vendors which we receive; right now everything is done pretty much manually, with analysis done in Excel and/or Access; I want to move away from this.  I'm pretty sure that I can use SQL Server Integration Services (SSIS) to take care of this for me and parse/clean the data and put it into the proper table.  I might need to do some Data Mining, but I'm not sure (I'm totally new to that).  For example, sometimes my manager (the COO) wants to see a report of our listings on Amazon; the way we do this is I run the most recent data in Access and run a query to compare it to the existing data, or something like that.  I'd love to use Reporting Services or something to allow him to just browse and click, instead of my doing it manually and emailing him an Excel file with the data.

Is there an easier (or better) way to do what I need to, or is my solution pretty spot-on?  I don't claim to be a database guru, but this stuff is bad; really bad.  It uses some open-source e-commerce solution that's pretty wretched (lots of Classic ASP/VBScript spaghetti code with cryptic variable names).

Thoughts, comments and advice are greatly appreciated.
Newbie IT Director
Monday, November 26, 2007
This has been debated here before a few times. In general, most people (I amongst them) seem to prefer keeping separate databases for each company, all sharing a common code base.

This has many obvious advantages, and a few disadvantages. In general, the larger the user base is and the more complex the application, the more it favours one-db-per-company (IMO).

I don't know much about Access (I am happy to say), and with only two companies to worry about, your situation is obviously different, but there's my $.02 anyway.
Greg Send private email
Monday, November 26, 2007
On the data end.  If you combine things then you will use memory more efficiently.  For example, if the two companies use the same product lists but just duplicated in two dbs then both tables are cached in memory. (since they are frequently joined to or accessed)  if they are in one table then just that 1 table (which is smaller than the combined 2 tables) is cached in memory.  So you would get better use of caching capabilities. (reduce IO improve scalability)

As for a data warehouse I recommend looking at Kimball (eg The data Warehouse Toolkit.  It isn't database specific and you could use the principals in just about any database.)  A data warehouse has different requirements than a OLTP system.  The joins and reports can be done from an OLTP system but are usually not as eficient or as easy to craft.  (Just as you wouldn't want an OLTP system designed like a data warehouse, that would be a nightmare)

Don't worry as much about the technology to address a solution, make sure you understand the problem first.  I am not a MS SQL Server guru so I can't tell you what MS SQL Server technology to use.
JimK Send private email
Monday, November 26, 2007
There may be very good reasons why the data is separate. The isolation can be a very good thing in terms of security. You didn't give us the whole story either. Is the codebase exactly the same for both companies or do they have some custom software that you are currently maintaining?

Either way I personally wouldn't combine the databases since there are only two companies. Don't fall into the trap of believing that because you've heard "duplication is bad" over and over again on the Internet that it is actually so. Sometimes a little duplication is just the ticket to provide the flexibility you need. Now if you had 10 companies then it may be a different story.

You've been posting a lot out here lately and the common theme seems to be that you are caught up on aging technology. If it ain't broke then don't fix it.
Monday, November 26, 2007
IMO it is broke, though.  Things have to be done manually when it should be automated or made easier.  For example, we manually change prices in an Excel file and prepare it for upload to Amazon; in my view this should be automated at least in part (if not totally).  There is no way to run any real reports; my predecessor just would pull data into Access and spit out a new table containing the results of the query.  The tables are little more than run queries; any number of tables will have repeat columns (e.g. Orders table will include all the customer information, despite this already being in the Customers table).  Many fields in these tables contain either a value, a blank value, or NULL (usually a mix of the three).  There are meaningless column names.

In short, it might work, but IMO it doesn't work well at all, and the company is wasting time screwing around with the shit.  Nearly everything that brings in income is done manually via Excel spreadsheets and/or by pulling data into Access and running a query, then exporting it to Excel.  I admit I'm a little biased against old technology, but it's because I've seen too many companies barely chugging along because they're so reliant on obsolete/amateurish programs, and afraid to change because of the exact reason you mentioned (i.e. "if it's not broke").  If a wheel is rusted and pitted and showing its age, you replace it with a new one; you don't remain blissfully ignorant until it breaks and you fall flat on your ass.  IMO of course.  Like I said, I've seen a lot of places that could have been so much more efficient and better, but couldn't begin to see that the technology they were using wasn't enough.  This company seems to realize it; they admit the previous guy was a hack.  I'm pretty much trying to clean up his mess and turn the organization around.


Both sites run exactly the same, for the most part.  They both make use of an open-source e-commerce app, with about 90% of the data identical.  This makes updating really hard because if I change something to one site, I usually have to remember to do it to the other, or it'll break.  The only real difference between the two databases is that the Customers table is different, the prices are different and, of course, there are different amounts of orders.
Newbie IT Director
Monday, November 26, 2007
Addition:  It's not so much the aging technology, its the fact that the technology is both old and hacked together without rhyme or reason, and it makes no sense at all yet somehow manages to work.  I wouldn't mind it so much if it seemed to at least be well-designed.
Newbie IT Director
Monday, November 26, 2007
I wouldn't attempt to merge the two databases at this stage, as it may introduce more problems than it solves.

It is trivial to manage updates to more than one copy, just automate the updates, or semi-automate it with a strict procedure. Then, you won't risk being out of step.

I would always keep the databases separate, even if there were many, not just two. The risk of the data getting cross-contaminated or displaying the incorrect data is just not worth thinking about, and certainly out-weighs the minimal performance/scalability impact of running two copies of everything, imho.

Don't forget, hardware is very cheap and very quick to implement, compared to developer time. I know there is a limit to what you can do with the hardware, but it doesn't seem like you are there yet.

Rather than trying to hack away at something you are clearly uncomfortable with, why not create a plan to replace it with something better ?
Entries of Confusion Send private email
Tuesday, November 27, 2007
"Rather than trying to hack away at something you are clearly uncomfortable with, why not create a plan to replace it with something better ?"

That's what I'm trying to do.  The only reason I hack together things right now is because I haven't had time to figure out the details of the plan and experiment on the development server, and my boss will ask for a report or something.  So for now I'll play along and hack together a solution so its at least available, while I'm working on a better way that I can replace it with later.

The main reason for wanting to merge the databases (besides the obvious, i.e. I've been taught that duplication is bad) is because I want to move the back-end to a consolidated enterprise system.  Right now there are two back-ends (one for each company) with the same functionality, just like the databases.  Keeping the data separate would mean somewhere in that application I'd have to keep a constant flag on which company we're working with, so I know which connection string to use to display data.  I suppose I could move it into one logical database, but keep it in seperate tables (e.g. CompanyBCustomers, CompanyBPrices), which might be the best hybrid solution as it would differentiate between the data, but allow me to create a procedure/view/etc that does the necessary operations with it.  I need to obtain more information on that, I think.  Our prices change very often (to stay competitive), I have to find out of the other company is the same way, or if their prices are flat.

Thanks again for the advice everyone.  As I've said, I'm somewhat new to being the main technology decision maker, so I'm not trying to jump into things but instead get input from my peers and seniors (experience-wise) so I can make informed decisions.
Newbie IT Director
Tuesday, November 27, 2007
Someone said keep the databases separate, I'm going to agree with that. If there is a business split, you'll have a time splitting out all the data, and then have to carry the burden of multi-company code logic where you don't need it.

For cross-company data mining, set up a third "data warehouse" database that makes calls into the two separate ones. If large chunks need to be summarized to speed up historical summaries, you can store it there, without as strict normalization that you have in the live. Designing a data warehouse takes some experience, I have done some work in that realm, but would not call myself an expert.

User cross database links to pull from both if you need, using stored procedures, and that way if there is a difference in business rules at some point you can hide the complexity inside each database's sproc used to pull the data out.
Jason T
Tuesday, November 27, 2007
Hmm.. okay so I should look at creating a third data warehouse then, and use that as the "unified platform".  I'll look into that, funnily enough I was trying to read up on Data Mining earlier because I thought it might be what we're looking for.  I'm quite new to that concept, though, so it might be worth looking more into it because, I admit, I'm pretty clueless as to where to start with it.

Another thing I'll need to do, though, is clean up the data.  Like I said, the guy before me was very inefficient with his data stuff, so tables have tons of unused columns, some columns contain either NULLs or empty values, and the like.
Newbie IT Director
Tuesday, November 27, 2007
>> Another thing I'll need to do, though, is clean up the data.  Like I said, the guy before me was very inefficient with his data stuff, so tables have tons of unused columns, some columns contain either NULLs or empty values, and the like. <<

Unused columns can be ignored for the moment. They clutter the picture, but it looks like you've got bigger fish to fry. Also, there's nothing necessarily wrong with null values or empty values, assuming these correspond to business reality.

Perhaps you should back away from the technology arena, and start by doing the following:

• Define each business problem.
• Define a business solution to address each business problem.
• Define a set of measurable business goals to determine the success or failure of the proposed business solutions.

The most important part of this is to avoid business goals that aren't measurable. If the users and other project stakeholders don’t have an objective way to measure the success or failure of a project, it is then very easy for them to blame you for failing to meet business goals that weren’t clearly specified and understood.
Mark Pearce Send private email
Tuesday, November 27, 2007
If I were you I would run with what you have working, build a test environment with a snapshot of live data and beat it back into shape.  From there make migration tools to bring current live data to the test system. When it all goes smoothly move the new system to the live system.
Tuesday, November 27, 2007
I agree with another poster.  You've got bigger fish to fry and if you are gonna make the DB right, might as well go all the way...

Dont worry about what the data looks like now.  You need to figure out how this business operates first.

How do people handle their workflow?  Inventory?  Accounting?  Orders?  Refunds?  TPS reports?

Once you answer these, the new schema will fall out of your answers and become obvious.  You'll know about if you should partition the database or not because it will be obvious.  Then you can worry about migration, which sounds like it is going to be a pain in the ass and might have to be incremental.

That all said, I've never done something like this, so I'm just guessing at how I'd approach it.  I'm guessing the integration with what their tools are now is going to be a huge challenge as well.

Good luck!  Sounds like you got quite a project :-)
Cory R. King Send private email
Tuesday, November 27, 2007
By the way, in my mind it is easier to use one database for both companies.  It is less risky because you can easily split them into two if needed.  Conversely, if you start with them split initially, it will be hard to merge later.

My $0.02.
Cory R. King Send private email
Tuesday, November 27, 2007
Agree with Brian.

> it makes no sense at all yet somehow manages to work.

Last year I worked on a project that was very similar in its difficulty:  it was an ASP application implemented as a huge series of SQL Server stored procedures by a bunch of long-gone Access programmers acting under the direction of a subject-matter expert who had since, well, died. 

The first question that I was asked was:  "What do we need to do to improve this system so that it will function if we double our customer base?"  It took me four or five months of study to get to the point where I could say with confidence that the answer was "Replace it."

I didn't spend a whole lot of time finding out what was wrong with the system.  That was low-hanging fruit.  The thing that took me four or five months was figuring out what was *right* with it.  Figuring out what it was actually doing for the existing customers, and why.

This system, essentially, made no sense at all, but somehow it managed to work.  I wasn't ready to replace it until it made sense. 

A great deal of what "made sense" about it was recognizing the set of anti-patterns that the original developers believed in.  One thing I'll give them credit for:  they were consistent.  Once they settled on the wrong way of doing something, they did it everywhere.

(For instance - I know this post is getting ridiculously long, but I've suffered for my art, and now it's your turn - you'd find this in one SP:

  -- find limit amount
  SELECT @Limit FROM Limit WHERE ...
  SET @Limit = ISNULL(@Limit, -9999)

and then later:

  IF @Limit = -9999  -- we didn't find a limit amount

...and then you'd realize that every single SP in the system was checking output parameters of SPs it called to see if they were -9999.  I found myself grudgingly admiring the original developers for their steely-eyed determination to do the wrong thing at every turn.)

The point is, I had to first reach a point where all of the crap in the system that didn't seem to make sense actually made sense.  If I hadn't done this, I wouldn't have been able to persuade myself (or anyone else) that my redesign of the system would actually be functionally equivalent to what they already had.  A lot of what made no sense to me at all actually proved to be there for a reason.

One thing this experience taught me:  when commenting code, it's often a good idea to indicate, somehow, the level of design that went into it.  Comments like "this is the best we could do to get this done on time" (or, less frequently, "we know this looks ugly, but it's the only logical design, and here's why") are of tremendous value to the software archaeologist.
Robert Rossney Send private email
Tuesday, November 27, 2007
Corey R. King: "By the way, in my mind it is easier to use one database for both companies.  It is less risky because you can easily split them into two if needed. "

Less risky, until Company B accidentally gets access to the Company A's data and you end up getting sued by Company A for damages.

The risk is there because with all of the data in one database, security is a lot easier to circumvent. Especially since you've specifically mentioned financial data (order information), separate entities should be in separate databases.
Ken White Send private email
Wednesday, November 28, 2007

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
Powered by FogBugz