The Design of Software (CLOSED)

A public forum for discussing the design of software, from the user interface to the code architecture. Now closed.

The "Design of Software" discussion group has been merged with the main Joel on Software discussion group.

The archives will remain online indefinitely.

JDBC Insert Deficiency

Rant rant rant rant rant...

Okay, so I'm just posting this message to complain about what I think is a serious deficiency in the design of the JDBC framework.

It seems to me like JDBC (and ODBC) is primarily designed for queries that return results. Performing inserts via JDBC almost seems to be an afterthought.

What I'd really like to see is something like a ResultSet but...backwards. I'd like to populate a set of Row objects with data from my application and then push those rows into the database (using some sort of buffered insert of binary data).

Suuuuure, I can build a SQL query and then use a Statement object to accomplish the same thing, but it seems so wasteful. First I have the overhead of String concatenation to worry about while I build my query, and then, after I send the query to the db server, it's just going to end up parsing my SQL String before doing the insert anyhow. It seems like there ought to be some method for bypassing the data-to-string-to-data cycle.

Finally, I'm concerned about the accuracy of my floating-point data, since Java floats will get turned into String representations, and then the Strings will get converted back into floats by the database server. I'm worried that the data won't be consistent using a number of different db servers on a bunch of different platforms.

I know that MS SQL Server has a direct API for sending binary data into the server. And most servers (MS, MySQL, Oracle) have functionality that can parse CSV files. But why do you suppose there isn't a standardized way for doing these kinds of queryless inserts through JDBC?
Benji Smith Send private email
Wednesday, February 09, 2005
 
 
There is but it requires a JDBC 2.0 compatible driver, see the example at the top of the documentation for ResultSet:

http://java.sun.com/j2se/1.4.2/docs/api/java/sql/ResultSet.html

Alternatively, use a PreparedStatement, for example (and this is just off the top of my head, my apologies for syntax errors):

PreparedStatement ps = connection.prepareStatement("INSERT INTO NAMES(ID, NAME) VALUES (?,?)";
for (int i=0; i<data.length; i++) {
  ps.setLong(1,data[i].id);
  ps.setString(2,data[i].name);
  ps.execute();
}
Gerald Send private email
Wednesday, February 09, 2005
 
 
You can also use a batch, i.e. a bunch of inserts made together in one request.  See addBatch() for more details.  What more would you ask for?
Ryan Send private email
Wednesday, February 09, 2005
 
 
Yes, use a PreparedStatement.  The driver works with the database to avoid sending strings to the database over and over.

Depending on the driver (almost all work this way), you get exactly what you want - the SQL is sent once and parsed, then subsequently you send the data in the DB native format, so no datatype parsing back and forth.  Thus the "prepared" part of PreparedStatement.

You might also try something like http://jakarta.apache.org/commons/dbutils/

to abstract your JDBC access a little bit.
Dave C Send private email
Thursday, February 10, 2005
 
 
Since most database access is for making queries, and only occasional (by comparison to queries) inserts, it makes sense to me that inserts could be more expensive than queries.

Inserts have the possibility of comprimising the integrity of the database -- which SELECT's don't.

However, certainly try the other solutions offered here.  Just be aware of the underlying implications of what you are doing.
AllanL5
Thursday, February 10, 2005
 
 
If you use a prepared statement, it still has to perform a round-trip data transfer for each row you write.  If you use a batch, you can write thousands of rows with only one network round-trip.
Ryan Send private email
Thursday, February 10, 2005
 
 
Yeah, I'm familiar with batch inserts. And I'm familiar with prepared statments. I've used both concepts many many times. And they're both better than ordinary one-row-at-a-time SQL inserts.

But I *still* think that a direct, API-level method for inserting thousands of records would be ideal (a Batch insert still requires me to piece together a bunch of SQL insert statements by concatenating a bunch of Strings, and a PreparedStatement doesn't provide the ability to bulk insert thousands of records).
Benji Smith Send private email
Thursday, February 10, 2005
 
 
Use the new RowSet api. It basically provides a disconnected ResultSet. Chapter 5 of the JDBC Tutorial
Matt
Thursday, February 10, 2005
 
 
> a PreparedStatement doesn't provide the ability
> to bulk insert thousands of records.

Yes it does.  To build on Gerald's example:

PreparedStatement ps = connection.prepareStatement("INSERT INTO NAMES(ID, NAME) VALUES (?,?)";
for (int i=0; i<data.length; i++) {
  ps.setLong(1,data[i].id);
  ps.setString(2,data[i].name);
  ps.addBatch();
}
ps.executeBatch();
Joe Blandy
Thursday, February 10, 2005
 
 
If you are really planning to do massive bulk inserts, maybe you are looking at the wrong tool.. Create a CSV from Java and use a dataloader (like SQLLoader) to load the data...
Vaidhy Send private email
Monday, February 14, 2005
 
 
> If you are really planning to do massive bulk inserts,
> maybe you are looking at the wrong tool.. Create a CSV
> from Java and use a dataloader (like SQLLoader) to
> load the data...

Doesn't it strike you as somewhat odd to suggest converting all of your data into Strings so that you can write them to a comma-delimited text format, then writing all of that data to the file-system, and then using a separate application to parse all of that data, convert it back from Strings into various data types, then insert the records into the database, which end up getting saved back to the disk? Doesn't it seem like the real solution to the problem would be a special-purpose data conduit for pumping massive amounts of binary data directly into the db engine?
Benji Smith Send private email
Monday, February 14, 2005
 
 
If you don't like to work with low-level JDBC API then abstract it in you own framework or use existing ones (JDO, Hibernate or whatever).
I have no problems with JDBC insert performance. I developed tool which could read records from one DB and insert in other DB at speed 5000 and more records per second.
ilgvars
Tuesday, February 15, 2005
 
 
5000 records each second?  May I ask what kind of network topology and hardware you are working with?
Tim Clemons Send private email
Wednesday, February 16, 2005
 
 
5000 records each second?  Sounds like the table has no indexes and there are no constraints anywhere.  I'm with Tim on this one.  Tell us how it is possible to get that kind of performance.
some other dude
Thursday, February 17, 2005
 
 
> If you don't like to work with low-level JDBC API
> then abstract it in you own framework or use
> existing ones (JDO, Hibernate or whatever).

The problem is not that JDBC is too low-level. The problem, in my particular case, is that it's too high-level.

Most of the time, high-level database access--using SQL through JDBC--is exactly what I want. But occasionally, I want a low-level, lightning-fast method for shoving hundreds of megabytes of data throughy a binary pipeline.
Benji Smith Send private email
Thursday, February 24, 2005
 
 
Great set of replies here. I've utilized your suggestions to increase my application's insert performance and wonder if anyone can help me further.
Tomcat4.1.30, 1.4.2.07jdk, mysql4.1.8, ms sql server 2K,winxp, 3.2ghzIntel, 2gb ram. I'm copying a set of tables from MS Sql Server to MySql.
Current insert rate is 8min for 18K records, indexes are present on the mysql table.
snippet of insert code:
sourceResultSet is obtained from executeQuery("select * from sourceTable");
connection.preparedStatement("insert into blah values (?,?,?,?,?,?));
loop over sourceResultSet
loop over destination table columns
PreparedStatement statement =
statement.setObject( columnIndex                                                , sourceResultSet.getObject(sourceColumnName) );
end loop over destination table columns
statement.addBatch();
end loop over sourceResultSet
statement.executeBatch();

I don't know what else to jiggle to try and improve performance, besides removing the table indexes. Any ideas?

Also - sorry to subvert the discussion - started as a rant, here i'm asking for help - if this is woefully inappropriate please suggest a better location.
-wc
William Crighton Send private email
Saturday, February 26, 2005
 
 
hmm...well, I tested my app where the sql-server and mysql db weren't running on the same box and got results like this:
18K records - 10 seconds, 14 columns
15K records - 26 seconds, 50 columns

So I'm happy now - total time for 7 tables takes just under 50 seconds, which is fantastic. Thanks again for the very helpful string of posts.
William Crighton Send private email
Monday, February 28, 2005
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz