The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

Should I Use XML?

Hi all,

I have very little XML experience, so please bear with me :)

Our company hosts software that processes health insurance claims. A customer sends us their claims in a flat file, we process the claims and send them back in a flat file (the format for both flat files is the same). However, for troubleshooting and testing, sometimes we have to edit the flat file before processing. This is tedious to do with a plain text editor. Also, each customer has their own flat file format.

What I was hoping we could do with XML is to convert the flat file to XML, edit the file in an XML editor, and convert back to a flat file. Does this sound reasonable? If so, what are the best XML tools/technologies to use to accomplish this?

Some details about the flat files. I'll use the term "record" to refer to a line in the file terminated with a <CR><LF>:

1) The fields are usually delimited by pipes (|), but sometimes are fixed-width.

2) Some files have a single claim on each record, with 1 to N claim detail lines. For example, the file format is like this:
CLAIM-ID|CUSTOMER|DATE|LINE-NO|CODE|DATE-OF-SERVICE|LINE-NO|CODE|DATE-OF-SERVICE...<CR><LF>

3) Most files have a "claim header" record followed by 1 to N claim detail records. For example, the file format is like this:
CLAIM-ID|CUSTOMER|DATE<CR><LF>
CLAIM-ID|LINE-NO|CODE|DATE-OF-SERVICE<CR><LF>
CLAIM-ID|LINE-NO|CODE|DATE-OF-SERVICE<CR><LF>

Thanks in advance,
XML Know It Not
Thursday, July 12, 2007
 
 
Just depends whether you think you'd rather spend the time writing the converters or editing the files manually. (Oh, who am I kidding -- if you're a programmer, you'll always choose the first option!)

I'm not sure XML will be that much less tedious to edit, though. Maybe you're better off importing/exporting them to Excel or a database?
JW
Thursday, July 12, 2007
 
 
The format you are using is the HL7 data format standard used for sharing data in the medical industry.  I believe the latest HL7 specification is XML so you probably just need to move to it.  You can buy commercial parsers for HL7 and I'm sure someone probably sells an editor.
Kevin
Thursday, July 12, 2007
 
 
Yes, please use an established format... don't roll your own.

I was working with a customer last year who created their own syndication format because "RSS didn't work!".  Fast forward a year and now all of their content providers are feeding them RSS - they had no motivation to convert to a new structure - and now my former customer has to perform hundreds of transforms on the fly.
KC Send private email
Thursday, July 12, 2007
 
 
Guys, the OP is not even talking about changing the format of the data interchange. He is simply talking about introducing an XML conversion into his manual editing stage. How ridiculous! You're just adding extra steps to a bad situation in the first place.

Editing XML files is not so easy as to make the conversion worthwhile. It will be better to simply write a program to aid in editing all those flat formats. Convert them into a common internal structure and then back out to the flat format.

Standardizing the data interchange around XML is what you should really be trying for, but according to your post, that is not an option.
onanon
Thursday, July 12, 2007
 
 
As other posters have said, XML will not help you with that specific task. It will just be another tedious layer on top of what you are already doing.

If you by change are using .NET, there's a nice library called FileHelpers that you can use to make this sort of stuff less tedious. It's perfect for this sort of scenario.

http://filehelpers.com/
Hector Sosa, Jr - PainlessSVN Send private email
Thursday, July 12, 2007
 
 
Whenever possible, avoid using XML languages (to "use XML" is like eating a cookie cutter), except for marking up text.

While your output format may or may not be an XML language, that doesn't mean that you need to use an XML language internally. They make lousy databases and lousy data transfer formats.
Steve Hirsch Send private email
Thursday, July 12, 2007
 
 
OP said:

"What I was hoping we could do with XML is to convert the flat file to XML, edit the file in an XML editor, and convert back to a flat file. Does this sound reasonable?"

No, this is not reasonable.  And I am an XML bigot.

XML is for data interchange.  It is the lingua franca for this purpose.  But for editing, it is no easier than a text file, and as you described it, will make life worse for you.
OneMist8k
Thursday, July 12, 2007
 
 
My recommendation: load the flat file into a relational database. That's usually not very hard to do, with tools like SQLLDR, DTS, etc. Do any editing in the RDB, then ship it out using SQL+, the MSSQL equivalent, whatever.

If you're working hard to do this, you're doing something wrong.
Steve Hirsch Send private email
Thursday, July 12, 2007
 
 
I would also ask specifically what are you editing before (and after) you normally process the file?

If you're just troubleshooting and testing, it should be easy to create a series of text files and add to them over time. When there's a change in your "processor", you can have it churn through all the sample test files you've accumulated to date. Think unit testing, but with files as the test cases. Every time you find a new situation you didn't account for, create another dummy test file and add it to the stack.

If you have to manually edit actual patient records, then there's a definite problem somewhere, either in your input mechanism or in their output mechanism.
TheDavid
Thursday, July 12, 2007
 
 
Even better. Load the flat file into Excel, check it out, then spit it back out.

I agree with TheDavid, sounds like there's some more fundamentally wrong with the process.
Steve Hirsch Send private email
Thursday, July 12, 2007
 
 
Thanks everyone.

There's definitely something wrong with our process, but unfortunately for us, insurance companies are reluctant to change legacy systems that are working ;)

I support a QA group that deals with these files. Most of them are not very technical. They found editing the text files directly confusing, and would often make mistakes, so one of the more technical QA folks wrote a tool to "unfold" the text file so that each field was on a separate line. For example, a file of the following format:
CLAIM-ID|CUSTOMER|DATE|LINE-NO|CODE|DATE-OF-SERVICE|LINE-NO|CODE|DATE-OF-SERVICE...<CR><LF>

would "unfold" to look like this:
====CLAIM HEADER
CLAIM-ID: <value>
CUSTOMER: <value>
DATE: < value >
----LINE
LINE-NO: 1
CODE: <value>
DATE-OF-SERVICE: <mm/dd/yyyy>
----LINE
LINE-NO: 2
CODE: <value>
DATE-OF-SERVICE: <mm/dd/yyyy>
…. (yes, including the "====CLAIM HEADER" and "----LINE" entries).

The QA folks would edit this "unfolded" text file, since having the field names made it easier to find and change the correct values. They then run a separate "fold" program to roll this file back into the original format. As time went on, they wrote a unfold/fold program for each customer (currently just a handful, but by the end of this project, we'll have 30 or more customers using this system).

Now, we had a recent re-org and a QA Tools and Automation group was created, of which I'm now a member. It was decreed that we are now responsible for the coding and maintenance of all these unfold/fold programs. UGH! My knee-jerk reaction was to write a generic tool that takes the following arguments: a file that expresses the format of the data file, and the actual data file. This tool would display the data file in the given format, let them search/edit the fields, then save back using the original format. So I thought that rather than re-invent the wheel to transform and edit these files, I may be able to use XML.

I am very open to other suggestions. I'm keen on importing the data into a database, but due to budget limitations the only option would have to be an MS Access database (we don't use SQLServer, and are told there's no budget for QA to have an Oracle database created). Excel is out of the question since we're using Office 2003 and these files frequently have more than 64K rows.

Thanks again,
XML Know It Not
Friday, July 13, 2007
 
 
Kevin said:
  The format you are using is the HL7 data format standard used for sharing data in the medical industry.  I believe  the latest HL7 specification is XML so you probably just need to move to it.  You can buy commercial parsers for
HL7 and I'm sure someone probably sells an editor.

So what if you converted your files coming in to the HL7 format and then used teh commercial tools from that point on?  Then if you could get your customers to upgrade their systems and start sending HL7 at some point, you could discard the conversion at your end.
Kathy Wasden Send private email
Friday, July 13, 2007
 
 
"I am very open to other suggestions. I'm keen on importing the data into a database, but due to budget limitations the only option would have to be an MS Access database (we don't use SQLServer, and are told there's no budget for QA to have an Oracle database created). Excel is out of the question since we're using Office 2003 and these files frequently have more than 64K rows."

The major database vendors now have free editions of their databases. It looks like these would fit your needs. Here are the free database servers that I know of:

Oracle 10g XE
http://www.oracle.com/technology/software/products/database/xe/index.html

Microsoft SQL Server 2005 Express
http://msdn.microsoft.com/vstudio/express/sql/download/

IBM DB2 Express-C
http://www-306.ibm.com/software/data/db2/express/

PostgreSQL
http://www.postgresql.org/download/

MySQL 5.0
http://dev.mysql.com/downloads/mysql/5.0.html

FireBird 2.0
http://www.ibphoenix.com/main.nfs?page=ibp_download_20

It's always better to stick with what your shop already knows, so it looks like the Oracle XE product would be the best fit for your shop. Another idea is to create a web front end to the QA application with your favorite or known tools. Somedoby else in another thread pointed out some pretty awesome free web controls that should cut your creation down considerably. Here's their link:

Ext JavaScript Library
http://extjs.com/
Hector Sosa, Jr - PainlessSVN Send private email
Friday, July 13, 2007
 
 
The unfolded format you describe is better than XML because it is easier and more understandable to edit. If they edit XML with a text editor they would potentially need to escape special characters like < > & and quotes. However, if you write a tool o aid in editing, then you will not need an intermediate format, neither XML nor "unfolded".
onanon
Saturday, July 14, 2007
 
 
The OP says that there are a few changes that need to be done to the flat file before processing it.

If these few changes are kind of same for all the files then i would suggest u to use tokenization. Replacing the tokens by using simple ANT scripts. Trust me things can be made very simple using this tool.
Malik
Saturday, July 14, 2007
 
 
I agree with most people here in that converting your file to XML will not help you.

If you can use Access on every workstation in QA then create an Access application that loads a file and shows each record on a form with navigation buttons. This will work for files with several megabytes without any problems and it is much more user friendly that text files or XML.

No offence but I cringed several times when I read the description of your process. Some companies would rather torture people instead of spending some money to create proper software that will save them money and tons of time.
JSD Send private email
Monday, July 16, 2007
 
 
JSD,

That's OK, I cringe every time I have to go through the process ;)

I understand where it comes from, but I don't like it.

I've flopped back over to the database camp for this problem, though implementing 30+ different file formats doesn't appeal to me. I don't want to have to change the database and write new import code when we get a new customer with a proprietary format. I haven't given much thought to this, but my knee-jerk reaction is to come up with some sort of file specification "language" that describes the file structure and use that to parse the incoming data file.

Thanks,
XML Know It Not
Tuesday, July 17, 2007
 
 
I did something similar in the clinical trial world. Lots of dynamic code and a well maintained metadata repository.

You'll need two types of error checking: one for the metadata and another for the data.
Steve Hirsch Send private email
Tuesday, July 17, 2007
 
 
Unfortunately, there's no way around the root of the problem: you're dealing with 30+ custom file formats. Furthermore, as soon as a new customer comes along with another format, you're going to have to accommodate them. This is true whether you use XML, an Access database, a custom VB desktop application or the suite of fold/unfold programs or some combination thereof.

There's a couple of possible business solutions, and I emphasize that these are business solutions rather than technical solutions.

1) You can define some standards, publish those standards and give a discount to vendors that provide files compatible with your standards. That may convince them to fund their own converters.

2) You can invest in better training for the operators to the extent they no longer need fold/unfold programs. Instead, they will be smart enough to figure out how to read the format and make corrections directly to the files.

3) You can conduct formal statistical analysis (such as Six Sigma) to determine common mistakes and then write a program to correct those mistakes and free up the operators to focus on the more unique and difficult mistakes. I recommend a formal analysis because you have to be absolutely sure the computer will only modify true positives.

If you really want a technical solution...

4) think about writing a code generator that will in turn create the necessary fold/unfold program. Ideally, you should be able to provide a "blank" input file, the system creates a data structure based on that blank and then the resulting source files/executable for that format. As a result, you'll have one code base, one "master", and then a billion "macro like" programs, one for each format you support.
TheDavid
Tuesday, July 17, 2007
 
 
I would move towards an XML format for your intermediate manipulation you mentioned, for two reasons....one, to allow your users to work with the data as you descriobed using any number of tools that can read XML, and before spitting it back to the old format. And second, because you then have the data captured in a format you can slowly use to translate it later to a new format you can sell your company on. You might say this is a strategic move on your data management overall, which ends in you eventaully moving all of the data to XML or some other format. You can also use XSLT to get the XML intermediate view of your data to other formats quickly, like a database or temp tables, and still transform the xml back into some of its old format if you have to. XSLT might even work doing that.

This moves the burden on you to simply design a simple xml schema to hold the data, and write a text parser to BTW....all data thats lost in translation between formats you can just sticks between CDATA tags in XML and that way it can be later parsed and checked for errors. You can then easily export it back out as is and nothing is lost.
ranger
Saturday, July 21, 2007
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz