| ||
|
This community works best when people use their real names. Please
register for a free account.
Other Groups: Joel on Software Business of Software Design of Software (CLOSED) .NET Questions (CLOSED) TechInterview.org CityDesk FogBugz Fog Creek Copilot The Old Forum Your hosts: Albert D. Kallal Li-Fan Chen Stephen Jones |
At my worklplace, I write several batch programs. These programs have a linear flow of logic. Unlike event driven programs, they execute a series of small tasks, one after another, till completion. Are there any design patterns/best practices for designing such programs? For ex. I may want to download a file from an ftp server", unzip it, move it around on the file system, read the file, process and save data to database, run another program for downstream processing and finally send emails with results. I often keep the control of the flow in the shell script that calls a series of java programs/other shell scripts etc. to get the work done. I recently had an argument with a colleague who insisted that the control should be kept in the programming language (Java in this case) that should call other subroutines/classes to get things done. So his version of the batch consists of a controller java class that uses FTP libraries to download the files, java IO utilities for file manipulation, third party libraries for zip/unzip/file manipulation/email etc. I can see pros and cons of both approaches. What do you guys think ?
Anand Buddhiraj Wednesday, August 27, 2008
I helped write a simple workflow engine that is exactly what you described to replace a narly set of perl scripts. This allowed us to make a reusable framework, integrate it, and allow non-technical users to manage jobs. It has since been adopted for all of our batch data load processes, many of which used to be cron->bash->ant->java. We used the Command Processor pattern (see POSA) as the basic design concept. Its very similar, but a few years prior, to the recently released Spring Batch engine. We will eventually migrate all workflows onto one engine, either a custom one based on colored petri nets or a free/commercial alternative. There are a lot of good solutions out there now.
Have you checked out ANT, btw? http://ant.apache.org/ It could be an alternative to writing everything from scratch every time. Or if nothing else, it could give you some ideas about the patterns involved.
Shell scripts are perfectly suited to what you are doing. Java is not. Use the best tool for the job.
Just another voice in the crowd Thursday, August 28, 2008
I'd sa your colleaue suffers from the Golden Hammer anti-pattern. Keep it as simple as possible.
Vee Thursday, August 28, 2008
I would use perl or python for this sort of stuff. They have the libraries to do all that stuff, are cross platform, play well with various OS calls and you can slap a GUI on top of it if you need to. I wonder if your colleague thinks all software should be written in C, or maybe C++, since that what the bulk of most operating systems are written in?
"Python can do every thing you mentioned in about 5 lines of code. " I love stupid comments like that. Sure, I can do a lot of things with 5 lines of Python code, but I'm probably not going to do it well. There is such a thing as "error handling" and other such requirements (configurability, scalability, etc.) that REAL WORLD applications require you know. I seriously doubt that your "5 lines of Python code" would hold up in a true enterprise situation. You amateurs really need to get a clue.
a REAL pro Thursday, August 28, 2008
Bittorrent is python. That doesn't seem to have scalability issues. Unless something has changed at Red Hat, their entire installer is written in python. I know Red Hat is chock full of 2 bit hacks that can't code their way out of a wet paper bag. Any program in python is, as with a program written in any other language, as configurable as you make it. If you are interested in what some amateurs, such as ILM, have done with python, this might clue you in a bit http://www.python.org/about/success/ You probably can't do all the error handling, etc. in 5 lines, but this thread starts by talking about batch jobs and shell scripts and python's error handling is a LOT more robust than either of those. And I've seen precious little, if any, error handling in those sorts of applications anyway. Mr. Pro, not sure if you are harping on the "5 lines" or the "python" part of the post, but your comments along both lines seem pretty ill-informed given the fact that the OP is talking about batch jobs and shell scripts and python can clearly handle a lot more than you think it is capapble of. Suck a lack of cluefulness seems terribly out of place from a seasoned veteran such as yourself.
I've written my share of batch files. The compatibility issues between Windows versions is ridiculous. If you're writing something that needs to run on multiple machines, I'd suggest another approach. Perl is actually rather well suited to this task -- particularly in Unix, but reasonably so in Windows as well. If you decide to stick with the batch files, you may want to consider using Microsoft's Power Shell in the future. It may make some of your tasks easier. Before you take my (or any one else's) suggestions, I suggest you identify the pain points in your current batch file solution. Is it difficult to write complex control flow? Do you wish you could easily do some basic math or string manipulation in your batch file? Then look for something that will help you with those problems without losing the low overhead, etc. of batch files.
I have written many shell scripts. My answer to this is "it depends". Using programming language is an overkill in most cases. However, if the logic belcomes more complicated, it is worth considering writing it with a programming language, because the latter is easier to maintain.
Glitch Thursday, August 28, 2008
I guess it depends on how many java components exist in your workflow. You probably don't want to start a JVM for some short jobs. Java is verbose. Maybe you should use beanshell or groovy.
Rick Tang Thursday, August 28, 2008
Design pattern: Don't execute every command directly. Instead structure the program as a series of pure functions that return a command string (or argument array). And then structure your main loop so that it calls the appropriate functions (based on what the user asks for) and executes the strings returned. It can handle errors, and also do some sanity checking about the order of steps. Python is ideal for this kind of structure, because it has real functions, unlike batch files, shell script, or Perl (you have to hack to even name your function arguments, wtf?) I have used this structure successfully from keeping automation from being an unmaintainable nightmare. The key is to modularize each step as a pure function, so it moved around trivially and reused in many places with no hassle. It's a trivial thing to do, but most people don't do it.
You might want to check out http://discuss.joelonsoftware.com/default.asp?joel.3.668393.14 where there is a list of applications that help automate this stuff.
anonymous Thursday, August 28, 2008
Forgot to add that if you stick with batch files (assuming DOS) then checkout http://www.jpsoft.com/tcmddes.htm as it provides extensions that greatly improve the language. Even if you don't use the extensions the batch debugger saves a lot of grief.
anonymous Thursday, August 28, 2008
++ to "it depends" A lot of processes have complicated processing and logging requirements that lightweight scripting isn't well suited to. Processes with that sort of nature probably need more control to reside in actual code, not shell scripts. Shell or batch scripts need to know how to deal with the output from the compiled stuff, but in a truly complicated domain, you're going to need to use a real programming language, whether that's Python or Java or C# or Lisp or Brainfuck or whothefuckcares, somewhere in the mix.
Bart, I'm not sure why you personally took offense at my post. I was making fun of the "5 lines" part. You can't do all of the things the OP is talking about with 5 lines in ANY programming language. At no point did I say that Python wasn't scalable or had good error handling. But writing scalable enterprise applications with good error handling takes a hell of a lot more than just 5 lines of code in Python. I just get tired of the mindset of some of these people that you can just write five lines of code in every situation. It might be fine for your company. Maybe they don't expect much. But some of us have REAL programming to do and I can guarantee you that five lines of ANY programming language isn't going to yield anything of any real value. But hey, maybe you have yet to learn that life lesson.
a REAL pro Thursday, August 28, 2008
"Bittorrent is python. That doesn't seem to have scalability issues." 1. It does. Just not the ones you might at first think it does. The python bittorrent client isn't doing 10G transfers; and there are places that would like that level of performance. 2. It's not "five lines" of Python, anyway.
Katie Lucas Thursday, August 28, 2008
I just enjoy watching people make stupid comments about stupid comments, is all. Especially when followed by infantile things like "you amateurs" and then signs of with a name like "a REAL pro". It shows a level of arrogance I haven't seen in a while. It inspires me to pour gas on the fire. Life would get pretty boring if there were no lessons left to learn. I'm sure I have at least one or two to go :-). I know full well you can't write 5 lines of anything for most situations, but for what is being discussed here, that is probably about right. Have fun with your REAL programming.
Then you don't mind showing me the five lines of Python code to do the following: "download a file from an ftp server, unzip it, move it around on the file system, read the file, process and save data to database, run another program for downstream processing and finally send emails with results." Yeah, I thought so Mr. Smarty Pants.
a REAL pro Thursday, August 28, 2008
Alright, you're right. It's more than 5 lines. Exaggeration, hyperbole and so on are, apparently, not allowed on these boards. They are stupid. If I'm allowed to use some of the wrappers and libraries I created, ftp'ing and moving the file to its desired location is literally a one liner. Unzipping it with this http://code.activestate.com/recipes/252508/ can be a one liner, not including the import statement. Reading the file and processing is totally dependent on what is meant by processing, but if the processing is based on lines in the file, as it often is with tab or comma delimited text files, your processing function can be called with the array of lines generated by open("myfile.txt","r").readlines() running another program in python is usually a one liner with the popen library assuming there isn't much involved with the program you are calling, so the downstream processing can also be a one liner. The only things in this task that require even a modest amount of effort in this example are the processing of the file in the database, which for all we know is calling another process or running a stored procedure, and sending the email which could be running another process or perhaps 3 or 4 lines of python using smtplib. I think in the email sending I've done it's about 3 lines to open the connection, create the mail and ship it. Granted it is only attaching a file to a blank email, but it doesn't sound like anything more involved that that here. And again, I've written my own wrappers for these things because I've had to do them a lot. So I guess I can't show you 5 lines. It's more like 10-15 assuming the processing is something simple like adding or removing records from the database. Maybe 20 if you want to get really, really crazy. Since the processing isn't spelled out, it's hard to say. Anyway, I can't show you 5 lines. You win. Let's have 3 cheers for a REAL pro. Hip hip, HOOOORAAAAAAAAAAAAAAAAAAAAAY!!!!!!!!!!!
I would suggest the following: - If you don't care about enterprise QoS like transactions, failover, scaling, HA, etc, ANT may be the easiest since (as was already mentioned) many of the functions are implemented for you. Once you become familiar with ANT, you can crank out new ANT tasks quickly. You have to manage the build.xml file, and most of your coding would shift to XML. A nice pro here is platform neutrality, since ANT is Java. A minus is that Java may not be the best technology for specific tasks that you're doing - parsing large text files for example, where perl would be better. - If you do care about enterprise QoS, but perhaps is not today, prepare your application to run in an enterprise batch server like WebSphere XD Compute Grid. You can download the BDS framework and other dev tooling for free ( http://www-128.ibm.com/developerworks/forums/thread.jspa?threadID=210834&tstart=0 and also http://www-128.ibm.com/developerworks/forums/thread.jspa?threadID=190623&tstart=0 ), and read up on Compute Grid as the batch server ( http://www-128.ibm.com/developerworks/websphere/techjournal/0804_antani/0804_antani.html ). - If you are fan of the Spring programming style, check out Spring Batch ( www.springframework.org/spring-batch ) to build your apps. The plan is for Compute Grid to execute both BDS Framework style applications and Spring Batch-style applications. - If you have a mix of batch steps that require enterprise QoS and steps that don't, you can use ANT for the non-enterprise work and use the Compute Grid ANT task (which can either run in J2SE or connect to the batch server) for the enterprise work. Thanks, Snehal
On Windows, I usually write these sorts of things using GNU make. I create a set of phony targets, one per subtask, and maybe create a phony "main" target that calls make recursively with each of the subtasks in turn in the right order. I find this works pretty well. You get a slightly improved batch language, and make will do the exit code checking for you. For the majority of batch files I have written, this is just the ticket. It's not all plain sailing (took me a while to find a low-dependency GNU make port for Windows that understood drive letters, for example...) but with the teething problems long since out of the way I've come to quite like this approach. Everything is in one place, the programming facilities are (to my mind) an improvement over those available in batch files, and testing individual tasks is very easy. (Compared to Unix shell scripts, it is probably less compelling, but being able to divide your task so straightforwardly into pieces may still prove valuable.) Of course, the moment you start to need any sophisticated logic, it all starts to fall apart... but that was always going to be the case ;) Friday, August 29, 2008 | |
Powered by FogBugz
