Why you should be using MongoDB/GridFS and Spring Data…

I recently delved into MongoDB for the first time, and albeit I was skeptical at first, I now believe it is my preference to use a NOSQL database over a traditional RDBMS. I rarely just fall in love with a new technology but the flexibility, ease of use, scalability and versatility of Mongo are good reasons to give it a chance. Here are some of the advantages of MongoDB.

  • NOSQL – A more object oriented way to access your data and no complex SQL  command to learn or remember
  • File Storage – Mongo is a master of storing flat files. Relational databases have never been good at this.
  • No DBA – The requirement of database administration in greatly minimized with NOSQL solutions
  • No schema, complex structures or normalization. This can be a good thing and also bad. Inevitably everyone has worked on a project that has been over normalized and hated it.
  • No complex join logic

Spring Data for Mongo

My first stop when coding against Mongo was to figure out how Spring supported it and without fail, I was not disappointed. Spring Data provides a MongoTemplate and a GridFSTemplate for dealing with Mongo. GridFs is the Mongo file storage mechanism that allows you to store whole files into Mongo. The Mongo NOSQL database utilizes a JSON-like object storage technique and GridFS uses BSON (Binary JSON) to store file data.

As the name implies, a NOSQL database doesn’t use any SQL statements for data manipulation, but it does have a robust mechanism to accomplish the same ends. Before we start interacting with Mongo, let’s look at some of the components I used to accomplish the examples I am going to show you.

  • Spring 3.1.0.RELEASE
  • Spring Data for MongoDB 1.1.0.M2
  • Mongo Java Driver 2.8.0
  • AspectJ (Optional) 1.7.0
  • Maven (Optional) LATEST

The very first thing we need to configure is our context.xml file. I always start a project with one of these but I use Spring annotations as much as possible to keep the file clean.

 <?xml version="1.0"?>
<beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns="http://www.springframework.org/schema/beans" 
        xmlns:context="http://www.springframework.org/schema/context"
	xmlns:mongo="http://www.springframework.org/schema/data/mongo"
	xmlns:tx="http://www.springframework.org/schema/tx"
        xsi:schemaLocation="http://www.springframework.org/schema/beans

http://www.springframework.org/schema/beans/spring-beans.xsd


http://www.springframework.org/schema/data/mongo

        http://www.springframework.org/schema/data/mongo/spring-mongo.xsd">

	<!-- Connection to MongoDB server -->
	<mongo:db-factory host="localhost" port="27017"
		dbname="MongoSpring" />
	<mongo:mapping-converter id="converter"
		db-factory-ref="mongoDbFactory" />

	<!-- MongoDB GridFS Template -->
	<bean id="gridTemplate" class="org.springframework.data.mongodb.gridfs.GridFsTemplate">
		<constructor-arg ref="mongoDbFactory" />
		<constructor-arg ref="converter" />
	</bean>

	<mongo:mongo host="localhost" port="27017" />

	<bean id="mongoTemplate" class="org.springframework.data.mongodb.core.MongoTemplate">
		<constructor-arg ref="mongoDbFactory" />

	</bean>

	<context:annotation-config />
        <context:component-scan base-package="com.doozer" />
	<context:spring-configured />

</beans>

In short, the context file is setting up a few things.

  • The database factory that the templates will use to get a connection
  • The MongoTemplate and GridFSTemplate
  • Annotation support
  • Annotation @Configuration support if needed (Optional)

Let’s take a look at my App class that is the main entry point for this Java application.

...
@Configurable
public class App
{

@Autowired
public MongoOperations mongoOperation;
@Autowired
public StorageService storageService;

ApplicationContext ctx;
public App() {

ctx = new GenericXmlApplicationContext("mongo-config.xml");
...

I am using AspectJ to weave my dependencies at inject them at compile or load time. If you are not using AspectJ, you need to lookup the MongoOperation and StorageService from the Context itself. The Storage Service is a simple @Service bean that provides an abstraction on top of the GridFsTemplate.

...

@Service("storageService")
public class StorageServiceImpl implements StorageService {

@Autowired
private GridFsOperations gridOperation;

@Override
public String save(InputStream inputStream, String contentType, String filename) {

DBObject metaData = new BasicDBObject();
metaData.put("meta1", filename);
metaData.put("meta2", contentType);

GridFSFile file = gridOperation.store(inputStream, filename, metaData);

return file.getId().toString();
}

@Override
public GridFSDBFile get(String id) {

System.out.println("Finding by ID: " + id);
return gridOperation.findOne(new Query(Criteria.where("_id").is(new ObjectId(id))));
}

@Override
public List listFiles() {

return gridOperation.find(null);
}

@Override
public GridFSDBFile getByFilename(String filename) {
return gridOperation.findOne(new Query(Criteria.where("filename").is(filename)));
}
}

...

Our StorageServiceImpl is merely making calls to the GridOperations object and simplifying calls. This class is not strictly necessary since you can inject the GridOperations object into any class, but if you are planning on keeping a good separation to be able to extract Mongo/GridFS later to go with something else, this makes sense.

Mongo Template

Now, we are ready to interact with Mongo. First lets deal with creating and saving some textual data. The operations below show a few examples of interacting with data from the Mongo database by using the MongoTemplate.

User user = new User("1", "Joe", "Coffee", 30);
//save
mongoOperation.save(user);
//find
User savedUser = mongoOperation.findOne(new Query(Criteria.where("id").is("1")), User.class);
System.out.println("savedUser : " + savedUser);
//update
mongoOperation.updateFirst(new Query(Criteria.where("firstname").is("Joe")),
Update.update("lastname", "Java"), User.class);
//find
User updatedUser = mongoOperation.findOne(new Query(Criteria.where("id").is("1")), User.class);
System.out.println("updatedUser : " + updatedUser);
//delete
// mongoOperation.remove(
//      new Query(Criteria.where("id").is("1")),
//  User.class);
//List
List<User> listUser =
mongoOperation.findAll(User.class);
System.out.println("Number of user = " + listUser.size());

As you can see, it is fairly easy to interact with Mongo using Spring and a simple User object. The user object is just a POJO as well with no special annotations. Now, let’s interact with the files using our StorageService abstraction over GridFs.

//StorageService storageService = (StorageService)ctx.getBean("storageService"); //if not using AspectJ Weaving
String id = storageService.save(App.class.getClassLoader().getResourceAsStream("test.doc"), "doc", "test.doc");
GridFSDBFile file1 = storageService.get(id);
System.out.println(file1.getMetaData());
GridFSDBFile file = storageService.getByFilename("test.doc");
System.out.println(file.getMetaData());
List files = storageService.listFiles();

for (GridFSDBFile file2: files) {
System.out.println(file2);
}

The great thing about Mongo is that you can store metadata about the file itself. Let’s look at the output of our file as printed by the code above.

{ "_id" : { "$oid" : "502a61f6c2e662074ea64e52"} , "chunkSize" : 262144 , "length" : 1627645 , "md5" : "da5cb016718d5366d29925fa6a2bd350" , "filename" : "test.doc" , "contentType" : null , "uploadDate" : { "$date" : "2012-08-14T14:34:30.071Z"} , "aliases" : null , "metadata" : { "meta1" : "test.doc" , "meta2" : "doc"}}

Using Mongo, you can associate any metadata with your file you wish and retrieve the file by that data at a later time. Spring support for GridFS is in its infancy, but I fully expect it to only grow as all Spring projects do.

Query Metadata

The power of Mongo also lies in the metadata concepts that I mentioned earlier and relational databases just don’t have this concept. Mongo stored implicit metadata about the files and it also allowed me to attach any data I wish onto a metadata layer. You can query this data in the same fashion you would query Mongo directly by using the . notation.

gridOperation.findOne(new Query(Criteria.where("metadata.meta1").is("test.doc")));

Map Reduce

Mongo offers MapReduce, a powerful searching algorithm for batch processing and aggregations that is somewhat similar to SQL’s group by. The MapReduce algorithm breaks a big task into two smaller steps. The map function is designed to take a large input and divide it into smaller pieces, then hand that data off to a reduce function, which distills the individual answers from the map function into one final output. This can be quite a challenge to get your head around when you first look at it as it requires embedding scripting. I highly recommend reading the Spring Data for Mongo documentation regarding Map Reduce before attempting writing any map reduce code.

Full-Text Search

MongoDB has no inherent mechanisms to be able to search the text stored in the GridFS files, however, this isn’t a unique limitation as most relational databases also have problems with this or require very expensive addons to get this functionality. There are a few mechanisms that could be used as a start to writing this type of mechanism if you are using the Java language. The first would be to just simply take the text and attach it as metadata on the file object. That is a really messy solution and screams of inefficiency, but for smaller files is a possibility. A more ideal solution would be to use Lucene and create an searchable index of the file content and store that index along with the files.

Scaling with Sharding

While very difficult to say in mixed company, Sharding describes MongoDB’s ability to scale horizontally automatically. Some of the benefits of this process as described by the Mongo web site are:

  • Automatic balancing for changes in load and data distribution
  • Easy addition of new machines without down time
  • Scaling to one thousand nodes
  • No single points of failure
  • Automatic failover

Configuration

  • One to 1000 shards. Shards are partitions of data. Each shard consists of one or more mongod processes which store the data for that shard. When multiple mongod‘s are in a single shard, they are each storing the same data – that is, they are replicating to each other.
  • Either one or three config server processes. For production systems use three.
  • One or more mongos routing processes.

For testing purposes, it’s possible to start all the required processes on a single server, whereas in a production situation, a number of server configurations are possible.

Once the shards (mongod‘s), config servers, and mongos processes are running, configuration is simply a matter of issuing a series of commands to establish the various shards as being part of the cluster. Once the cluster has been established, you can begin sharding individual collections.

Import, Export and Backup

Getting data in and out of Mongo is very simple and straight forward. Mongo has the following commands that allow you to accomplish these tasks:

  • mongoimport
  • mongoexport
  • mongodump
  • mongorestore

You can even delve into the data at hand to export pieces and parts of collections by specifying them in the commands and mixing in . notation or you can choose to dump data by using a query.

$ ./mongodump --db blog --collection posts --out - > blogposts.bson

$ ./mongodump --db blog --collection posts
    -q '{"created_at" : { "$gte" : {"$date" : 1293868800000},
                          "$lt"  : {"$date" : 1296460800000}
                        }
        }'

Mongodump even takes an argument –oplog to get point in time backups. Mongo’s backup and restoration utilities are as robust as any relational database.

Limitations of MongoDB

Mongo has a few limitations. In some ways, a few of these limitations can be seen as benefits as well.

  • No Joining across collections
  • No transactional support
  • No referential integrity support
  • No full text search for GridFS files built in
  • Traditional SQL-driven reporting tools like Crystal Reports and business intelligence tools are useless with Mongo

Conclusions

The advantages of MongoDB as a database far outweigh the disadvantages. I would recommend a Mongo NOSQL database for any project regardless of what the programming language you are using. Mongo has drivers for everything. I do however think that if you are in a certain scenarios where you are dealing with rapid, realtime OLTP transactions, MongoDB may fall short of competing with a high performance RDBMS such as Oracle, for example. For the average IT project, I believe Mongo is well-suited. If you still aren’t sold on Mongo by now, (I would be pretty shocked if you weren’t), then feast your eyes on the high-profile sites that are using MongoDB as their backend database today.

  • FourSquare
  • Bit.ly
  • github
  • Eventbrite
  • Grooveshark
  • Craigslist
  • Intuit

The list goes on and on… There are also several other NOSQL solutions out there that enjoy popularity.

  • CouchDB
  • RavenDB
  • CouchBase

Optional Components

I used several optional components for my exercises. I wanted to address these for the folks who may not be familiar with them.

AspectJ and @Configurable

Many folks would ask why I chose to use Aspect Weaving instead of just looking up the objects from the context in the App object. @Configurable allows you to use the @Autowired annotation on a class that is not managed by the Spring context. This process requires load-time or compile-time weaving to work. For the purposes of Eclipse, I use the ADJT plugin and for Maven, I use the AspectJ plugin to achieve this. The weaving process just looks for certain aspects and then weaves the dependencies into the byte code. It does solve a lot of chicken and egg problems when dealing with Spring.

Maven

If you are using Maven and you want all of the dependencies I used for the examples, here is the pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>com.doozer</groupId>
	<artifactId>MongoSpring</artifactId>
	<packaging>jar</packaging>
	<version>1.0</version>
	<name>MongoSpring</name>
	<url>http://maven.apache.org</url>

	<properties>
		<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
		<spring.version>3.1.0.RELEASE</spring.version>

	</properties>

	<dependencies>
		<dependency>
			<groupId>junit</groupId>
			<artifactId>junit</artifactId>
			<version>4.8.2</version>
			<scope>test</scope>
		</dependency>
		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-api</artifactId>
			<version>1.6.6</version>
		</dependency>
		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>jcl-over-slf4j</artifactId>
			<version>1.6.6</version>
			<exclusions>
				<exclusion>
					<artifactId>slf4j-api</artifactId>
					<groupId>org.slf4j</groupId>
				</exclusion>
			</exclusions>
		</dependency>
		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-log4j12</artifactId>
			<version>1.6.6</version>
			<exclusions>
				<exclusion>
					<artifactId>slf4j-api</artifactId>
					<groupId>org.slf4j</groupId>
				</exclusion>
			</exclusions>
		</dependency>

		<!-- Spring framework -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-core</artifactId>
			<version>${spring.version}</version>
		</dependency>

		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-context</artifactId>
			<version>${spring.version}</version>
		</dependency>

		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-aop</artifactId>
			<version>${spring.version}</version>
		</dependency>

		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-aspects</artifactId>
			<version>${spring.version}</version>
		</dependency>

		<!-- mongodb java driver -->
		<dependency>
			<groupId>org.mongodb</groupId>
			<artifactId>mongo-java-driver</artifactId>
			<version>2.8.0</version>
		</dependency>

		<dependency>
			<groupId>org.aspectj</groupId>
			<artifactId>aspectjweaver</artifactId>
			<version>1.7.0</version>
		</dependency>

		<dependency>
			<groupId>org.aspectj</groupId>
			<artifactId>aspectjrt</artifactId>
			<version>1.7.0</version>
		</dependency>
        <dependency>
			<groupId>org.springframework.data</groupId>
			<artifactId>spring-data-mongodb</artifactId>
			<version>1.1.0.M2</version>
		</dependency>

		<dependency>
			<groupId>cglib</groupId>
			<artifactId>cglib</artifactId>
			<version>2.2</version>
		</dependency>

		<dependency>
			<groupId>javax.persistence</groupId>
			<artifactId>persistence-api</artifactId>
			<version>1.0</version>
			<scope>provided</scope>
		</dependency>

	</dependencies>

	<build>
		<plugins>
			<plugin>
				<artifactId>maven-compiler-plugin</artifactId>
				<configuration>
					<source>1.6</source>
					<target>1.6</target>
				</configuration>
			</plugin>

			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-dependency-plugin</artifactId>
				<executions>
					<execution>
						<id>copy-dependencies</id>
						<phase>prepare-package</phase>
						<goals>
							<goal>copy-dependencies</goal>
						</goals>
						<configuration>
							<outputDirectory>${project.build.directory}/lib</outputDirectory>
							<overWriteReleases>false</overWriteReleases>
							<overWriteSnapshots>false</overWriteSnapshots>
							<overWriteIfNewer>true</overWriteIfNewer>
						</configuration>
					</execution>
				</executions>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-jar-plugin</artifactId>
				<configuration>
					<archive>
						<manifest>
							<addClasspath>true</addClasspath>
							<classpathPrefix>lib/</classpathPrefix>
							<mainClass>com.doozer.mongospring.core.App</mainClass>
						</manifest>
					</archive>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.codehaus.mojo</groupId>
				<artifactId>aspectj-maven-plugin</artifactId>
				<configuration>
					<complianceLevel>1.6</complianceLevel>
					<aspectLibraries>
						<aspectLibrary>
							<groupId>org.springframework</groupId>
							<artifactId>spring-aspects</artifactId>
						</aspectLibrary>
					</aspectLibraries>
				</configuration>
				<executions>
					<execution>
						<goals>
							<goal>compile</goal>
						</goals>
					</execution>
				</executions>
			</plugin>
		</plugins>
	</build>
</project>
Follow

Get every new post delivered to your Inbox.