By Ivan, PhD., a Java developer at Murano Software’s team
This is the last post from our series about MongoDB. We have shown you the features, the drawbacks and now we will show you how it handles Java.
As mentioned before, MongoDB supports a number of drivers for different platforms; however, since it's mostly designed to operate in the Web's distributed environment, it comes naturally along Java technology. But enough words said, instead let's jump straight to code and see what it really looks like from the inside. To make it even more exciting, we put in some competition flavor by comparing MongoDB on Java with something very common and widely used, something like MySQL. So, we are about to assemble some kind of roughly made comparative performance test for MongoDB and MySQL with bulk inserts, selects and updates. Before we move on, I beg you not to take this performance test too seriously, since it's in no way a complete, redundant or ultimate performance test, and it's not intended to say some heavy 'yes' or 'no' toward a specific solution. It's merely an applied example to make your journey with MongoDB interesting and useful.
There is plenty of detailed documentation for both MySQL’s and MongoDB’s installation setup processes. They both are available as a binary distribution. If you are a Linux user, I encourage you to use repository distributions. Let's assume we have our environment set up and running, and we ready to start coding.
We need some generic testing and reporting suite and service running our insert/select/update routines for available database providers. You can download complete code to look into and play with it using this link:
Let's take a closer look at MongoDB’s database service, MongoDbOpService.java:
<code>
...
public class MongoDbOpService implements DatabaseOpService, InitializingBean {
...
@Override
public void cleanUp() {
col.remove(BasicDBObjectBuilder.start().get());
col.ensureIndex(BasicDBObjectBuilder.start().add("field1", 1).get());
}
@Override
public void afterPropertiesSet() throws Exception {
Mongo mongo = new Mongo(host, port);
col = mongo.getDB("test").getCollection("test");
cleanUp();
}
@Override
public void select(int numOfOps) throws DatabaseOperationException {
for (int i = 0; i < numOfOps; i++){
col.find(BasicDBObjectBuilder.start()
.add("field1", new BasicDBObject("$gt", i)).get());
}
}
@Override
public void update(int numOfOps) throws DatabaseOperationException {
for (int i = 0; i < numOfOps; i++) {
col.update(new BasicDBObject("field1", new BasicDBObject("$gt", i)),
new BasicDBObject("$set", new BasicDBObject("field2", 1)), false, true);
}
}
@Override
public void insert(int numOfOps) throws DatabaseOperationException {
for (int i = 0; i < numOfOps; i++) {
BasicDBObjectBuilder builder = new BasicDBObjectBuilder();
builder
.add("field1", (int) (Math.random() * numOfOps))
.add("field2", (int) (Math.random() * numOfOps))
.add("field3", (int) (Math.random() * numOfOps));
col.save(builder.get());
}
}
...
}
</code>
We start by just creating Mongo instance, with its constructor accepting the host address as a parameter. Then we want to create a test database and collect in it. The trick is that we don't have to worry about whether the database with the given name exists. getDB() will return existing names or create a new one for us, and the same applies to getCollection(). Since we are going to run through our routines several times, we want to ensure we have a clean collection for each pass, and we can do it by calling the cleanUp() service method. It will remove all documents from the collection in case this is not the first run and ensure (create) index on the field ('field1') that we are going to query later on. If you did the same thing from a JavaScript console, it would look something like this:
<code>
> db.test.remove({})
> db.test.ensureIndex({“field1”:1})
</code>
Where 'test' is the name of the collection, remove is given an empty JSON object as a filter argument, and ensureIndex is given an object with only one name as an index target field, while '1' indicates ascending indexing order. Worth mentioning is that almost everything inside MongoDB uses json extensively for filtering, querying, update operations, etc. Also note that there is nothing like data schema for the created MongoDB collection.
The next point of interest is the insert(int N) call that will insert N simple objects into our test collection, which in json will look like this:
<code>
{
“field1” : random_int,
“field2” : random_int,
“field3” : random_int
}
</code>
Now, when we have N documents in our collection, we can start playing and selecting them. The select(int N) call will make N queries to test the collection, trying to fetch all documents that have their field2 value greater than some integer (0,N]. Note that such query returns the cursor, so it's usually a good idea to consider paging and result limiting in your live application, especially on large data results. Otherwise you will definitely hit a cursor timeout issue or some other nasty distributed environment issues. The very same query in JS console would look like this:
<code>
> db.test.find({“field1”:{$gt:i}})
</code>
Note how we use the filter operator keyword '$gt' (greater than). There are two things to note here. First, there are plenty of such keywords in mongodb query syntax used for filtering, updating and some other cool stuff. You can find all the details in manuals. What is more important to note is that we actually use json object as a value for “field1.” So your documents can contain any number of nested documents in field values, as well as arrays and functions. This can lead to very interesting applications, including stored procedures. Just take a look at this example object:
<code>
{
_id : ObjectID(“4e0a078e69fea677c91d3742”),
name: “John”,
address:{
city:“Moscow”,
street:{
name: ”Lenin”,
type: “Square”
},
zip: 12345
},
sayHi : function(){print('Hello from function!');},
some_list:[1245968390, 2859408375, 8756203941],
some_object_list:[
{field1:”something1”,field2:1},
{field1:”something2”,field2:-1}
]
}
</code>
Even though it can look confusing, it turns out to be a very convenient way of making things work. Also, since this is actually an object, you can use “dot notation” to access any nested field. So if we match the object from the last example, we can do the following:
<code>
> a = db.test.findOne({name:”John”})
> a.address.street.type
Square
> a.sayHi()
Hello from function!
> a.some_list[0]
1245968390
</code>
Finally, update(int N) call makes N update operations, trying to find objects with the same query select() does, but this time we want to change 'field2' value of matched objects. Note that we are using the '$set' atomic modifier here, so that this update won't lock the object being changed, only the field value is to be updated.
Here are the results I've got on my local machine. I have to mention that even the simple and quick-made test we used shows how ridiculously slow bulk operation can be with the traditional relational approach. I basically stopped on the 10k set because it takes way too much time with little difference in results. In the meantime, using MongoDB, I've had times when I generated millions of documents for the demo collection, and it took a matter of minutes.
Test: MongoDB vs MySQL quick performance test (insert/select/update)
Environment: Ubuntu 11.04 Linux x86_64 2.6.38, i5-2400, RAM 8Gb
| Service | OpNum | Ins | InsAvg | Sel | SelAvg | Upd | UpdAvg |
| MongoDb | 10 | 1 | 0.1 | 1 | 0.1 | 1 | 0.1 |
| MySQL | 10 | 533 | 53.3 | 3 | 0.3 | 599 | 59.9 |
| MongoDb | 100 | 19 | 0.19 | 0 | 0.0 | 8 | 0.08 |
| MySQL | 100 | 5145 | 51.45 | 42 | 0.42 | 5866 | 58.66 |
| MongoDb | 1000 | 79 | 0.079 | 5 | 0.0050 | 53 | 0.053 |
| MySQL | 1000 | 49695 | 49.695 | 966 | 0.966 | 52245 | 52.245 |
| MongoDb | 10000 | 435 | 0.0435 | 18 | 0.0018 | 67182 | 6.7182 |
| MySQL | 10000 | 490376 | 49.0376 | 40992 | 4.0992 | 863177 | 86.3177 |

Figure 2 - Average select time, ms/record
![clip_image002[5] clip_image002[5]](http://www.muranosoft.com/Outsourcingblog/content/binary/Windows-Live-Writer/d50d22ef1343_901E/clip_image002%5B5%5D_thumb.gif)
Figure 3 - Average update time, ms/record
![clip_image002[9] clip_image002[9]](http://www.muranosoft.com/Outsourcingblog/content/binary/Windows-Live-Writer/d50d22ef1343_901E/clip_image002%5B9%5D_thumb.gif)
Epilogue
After having a brief look at MongoDB's features and trade offs, one can ask oneself, is this piece of technology production ready, and could I use it in my project? Well, it's up to you, but my word is that you should give it a try. The test we were using as an example cannot be taken to prove either MongoDB or MySQL is better. It just shows how these two systems can handle one specific test. What is absolutely exciting about MongoDB is that it's really easy to start working with. It's just a matter of an hour to layout some basic configuration and client code. There are huge advantages and yet a price to pay.