15 recommendations to make you a better programmer

I often give the same advice over and over to programmers that I work with. Most of the time I give advice on things to do with regards to coding and that works out pretty well, but “don’ts” are just as important as the dos. Let’s look at the 15 things a programmer should watch for. Most of these are geared toward object-oriented languages, but some can apply to procedural,functional languages as well.

  1. Don’t forget the tenets of object-oriented programming, Inheritance, Encapsulation and Polymorphism.
  2. Don’t overuse inheritance. Inheritance should make sense. Ask yourself about the relationship. If “is a”, then you should inherit, if “has a” then it should be a relationship off the owner.
  3. Watch your method/function/subroutine lengths. If a method or function gets longer than 5-10 lines, you are probably missing an opportunity to abstract or extract functionality. The longer a method is, the more complex it will become exponentially.
  4. Before you start to “roll your own”, spend some time looking for open source solutions or blog articles where someone solved this problem before. There is nothing wrong with leveraging someone else’s hard work. Chances are someone else will take over your work at some point and it is easier for them if it is a solution that they can find support for by a Google search. Also, think about the testing, maintenance time with regards to rolling your own solution. Aside from that the chances that one person can produce a solution better than a community project is unlikely no matter how good you think you are.
  5. Don’t hack. There is a lot to be said for code written during a time crunch, but more often than not, programmers will use this excuse to shortcut a solution and not take time to do it the way they know it should be done. You want the next programmer to look at your code and pat you on the back for your solution, not curse you in disgust.
  6. Don’t forget about reusability. Think about every line of code you write. Ask yourself if what you are doing is going to be repetitive by you or someone else. If it is, then abstract it to a utility class and reuse it. Never just copy code from one place to another when you could extract it to a utility or utilize polymorphism to address the need.
  7. Don’t use obscure variable names. It should be very clear what data a variable contains when another person looks at your code.
  8. Don’t forget to ask for a code review or design review. No one is perfect. You should always walk through your code with a peer with both of you sitting side by side. Explain your rational, what techniques you used and ask the reviewer for recommendations. Two heads are better than one. Also, you should do this early and often. Don’t wait until you finish a project to ask for a review because by then, it may be too late to fix.
  9. Don’t use global or member variables when a local one would suffice. I’ve seen this a few times before. A junior programmer will scope their variables as wide as possible. This not only causes confusion when others look at the code, but it can cause unintended consequences in your applications.
  10. Don’t forget about threading and thread-safety. Threading is a difficult concept for unseasoned programmers. It can bite you really quick if you don’t think about it. Complex applications may have many threads accessing the same resources and if you’re not concentrating on managing this, then you can get junk data, crashes and unexpected results. And DO NOT synchronize everything as a solution to thread safety, else performance will suffer.
  11. Don’t code first and ask questions later. You should understand the problem domain and the goals you want to accomplish before you even write one line of code. Ideally, you will design the application and run through your mental sanity checks in your head well ahead of actually putting code onto a screen.
  12. Don’t forget about unit testing. Unless you just enjoy spending hours and hours testing your application or sitting with your QA resource, you should unit test your code at the lowest levels and run these tests as regression tests along with your builds. Remember that a unit test is for very small bits of code. A unit test does not take the place of actual functional testing of your application, but it sure makes it easier.
  13. Don’t forget to comment and don’t over comment. If you want to provide yourself a hint, reminder or give that hint to another programmer, use a comment to make the point. Don’t over comment your code either as too many comments are an indication of complexity and you need to revisit your code for simplification or refactoring.
  14. Don’t forget to refactor your code as you go. If you see areas of your code you need to revise, do so as early an opportunity as possible. If you wait, the problem can be compounded by other code that utilizes it. Never wait until the end of a project to refactor, you will never get the chance and by then, it is a daunting task.
  15. Don’t forget to layer and loosely couple. Do not forget to keep your code as loosely coupled as possible. A good strategy is to layer your code, e.g. DAO layer, service layer, integration layer, Controllers, UI layer, etc. For example, a UI Layer should never access classes directly from the DAO Layer, but should utilize the controllers to access data, which in turn access service layer and so on.

While this is not an all inclusive list, it does give a programmer a great advantage. Being a programmer is definitely about working smarter and not harder.




Integrating Spring 3.1 and Lucene 4

Integrating Spring 3.1 and Lucene 4 is a fairly trivial matter, but I didn’t want to use the XML configuration so utilizing the @Configuration annotation, I was able to configure the Lucene indexer, analyzer and queryParser.  Here is the configuration code:



public class AppConfig {

    private @Value("#{appProperties['index.location']}") String indexLocation;
    private @Value("#{appProperties['index.source']}") String indexSource;
    public Analyzer getAnalyzer() {

        return new StandardAnalyzer(Version.LUCENE_40);

    public FSDirectory getFSDirectory() throws IOException {

         File location = new File(indexLocation);

        if (!location.exists() || !location.canRead()) {
              System.out.println("Creating directory: '" +location.getAbsolutePath()+ "'");

        return FSDirectory.open(location, new NativeFSLockFactory() );

    public IndexWriter getIndexWriter() throws IOException {
        IndexWriterConfig iwc = new IndexWriterConfig(Version.LUCENE_40, getAnalyzer());
         // Optional: for better indexing performance, if you
           // are indexing many documents, increase the RAM
             // buffer.  But if you do this, increase the max heap
           // size to the JVM (eg add -Xmx512m or -Xmx1g):
               // iwc.setRAMBufferSizeMB(256.0);

        // NOTE: if you want to maximize search performance,
        // you can optionally call forceMerge here.  This can be
          // a terribly costly operation, so generally it's only
        // worth it when your index is relatively static (ie
           // you're done adding documents to it):
          // writer.forceMerge(1);

        IndexWriter writer = null;
     try {
        writer = new IndexWriter(getFSDirectory(), iwc);
        indexDocs(writer,  new File(indexSource));

     } catch (Throwable t) {

         System.out.println("Unable to create IndexWriter!: " + t.getMessage());


        return writer;

    public IndexSearcher getIndexSearcher() throws IOException {

        return new IndexSearcher(DirectoryReader.open(getFSDirectory()));

    public StandardQueryParser getQueryParser() throws IOException {

        return new StandardQueryParser(getAnalyzer());

    static void indexDocs(IndexWriter writer, File file)  throws IOException {
      // do not try to index files that cannot be read
       if (file.canRead()) {
          if (file.isDirectory()) {
            String[] files = file.list();
            // an IO error could occur
             if (files != null) {
              for (int i = 0; i < files.length; i++) {
                indexDocs(writer, new File(file, files[i]));
          } else {

            FileInputStream fis;
           try {
              fis = new FileInputStream(file);
            } catch (FileNotFoundException fnfe) {
              // at least on windows, some temporary files raise this exception with an "access denied" message
              // checking if the file can be read doesn't help

           try {

              // make a new, empty document
              Document doc = new Document();

              // Add the path of the file as a field named "path".  Use a
              // field that is indexed (i.e. searchable), but don't tokenize
              // the field into separate words and don't index term frequency
              // or positional information:
              Field pathField = new StringField("path", file.getPath(), Field.Store.YES);

              // Add the last modified date of the file a field named "modified".
              // Use a LongField that is indexed (i.e. efficiently filterable with
              // NumericRangeFilter).  This indexes to milli-second resolution, which
              // is often too fine.  You could instead create a number based on
              // year/month/day/hour/minutes/seconds, down the resolution you require.
              // For example the long value 2011021714 would mean
              // February 17, 2011, 2-3 PM.
              doc.add(new LongField("modified", file.lastModified(), Field.Store.NO));

              // Add the contents of the file to a field named "contents".  Specify a Reader,
              // so that the text of the file is tokenized and indexed, but not stored.
              // Note that FileReader expects the file to be in UTF-8 encoding.
              // If that's not the case searching for special characters will fail.
              doc.add(new TextField("contents", new BufferedReader(new InputStreamReader(fis, "UTF-8"))));

              if (writer.getConfig().getOpenMode() == OpenMode.CREATE) {
                // New index, so we just add the document (no old document can be there):
                System.out.println("adding " + file);
              } else {
                // Existing index (an old copy of this document may have been indexed) so
                // we use updateDocument instead to replace the old one matching the exact
                // path, if present:
                System.out.println("updating " + file);
                writer.updateDocument(new Term("path", file.getPath()), doc);

            } finally {


public class App

    public MongoOperations mongoOperation;
    public StorageService storageService;

    ApplicationContext ctx;

    public App() {

      ctx =  new AnnotationConfigApplicationContext(AppConfig.class);

Let’s get fancy with @Configuration with Spring

Spring has changed a lot over the years to make things more flexible and convenient for developers. Annotations in Spring 3 really hit home, but recently, Spring has added features that almost completely eliminate the need to XML all together. In the past, you still needed an XML configuration file if you wanted to utilize third-party code as Spring beans but you could use annotations to demarcate your own code. With the latest Spring code, you can use a class for your configuration. Let’s see how it works.

import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.context.annotation.ImportResource;
import org.springframework.context.annotation.PropertySource;

public class AppConfig {

    private @Value("#{appProperties['index.location']}") String indexLocation;

    public String getIndexlocation() {

         return indexLocation;



//App.class main
ApplicationContext ctx =  new AnnotationConfigApplicationContext(AppConfig.class);

There is a lot going on here, but it may not be apparent by the small amount of code we have written. This code does the following:

  • Maps a class as the configuration for Spring
  • Loads an XML Property file (There are still some things I prefer to do in the XML)
  • Creates a String bean of type String and returns the definition of a property found in the Property file

While the property example is not necessarily useful in this example, you can see the flexibility of the properties using Spring expressions to access them. The first question you might ask is why am I still loading an XML file since the @Configuration annotation eliminates the need for it. If you declare a Bean in the class, you need to inject properties into it in most cases so this is a little extra work and on top of that you are writing some code that needs to be maintained. Using the XML declaration, you can use property substitution as parameters to an existing class and no code needs to be placed your configuration class.

So how do you determine when to put class in the XML and when to declare it as a bean? Here are my general rules:

  • If you create a class, the demarcate it with a Spring stereotype (@Component, @Service, @Repository, @Controller, @Configurable, etc.)
  • If the class is a class from a third-party jar, then place the configuration in the XML
  • If the class is from a third party but you want finer grain control over the events of instantiation and circumstances, then create the Bean using the @Bean annotation in the class containing the @Configuration annotation

Pretty simple rules to follow…

There are several other annotations that can be used in the class containing the configuration as well such as @Depends-On and @Value.

Java,.NET Caching with Spring

If you want a major performance boost out of your application, then caching should be a part of your strategy. No doubt you have experienced moments in coding where you needed to store sets of data so that you don’t have to go back to the source to get them every time. The simplest form of caching is lazy loading where you actually create the objects the first time in memory and from there on out, you access them from memory. In reality, caching gets a lot more difficult and has many considerations.

  • How do I cache in a distributed environment
  • How do I expire items in the cache
  • How do I prevent my cache from overrunning memory
  • How do I make my cache thread-safe and segment it

All of these are concerns that you will have if you “roll your own” solution to caching. Let’s just leave the heavy lifting to the Spring Framework and we can go back to concerning ourselves with solving the complex problems of our domain.

Spring has a caching mechanism/abstraction for both Java and .NET, although the Java version is far more robust. Caching in Spring is accomplished through AOP or Aspect Oriented Programming. A caching annotation (Java) or attribute (.NET) can be placed on a method or a class to indicate that it should be cached, which cache should be used and how long to keep the resources before eviction.

Java Spring Cache with EHCache

In Java, caching with Spring couldn’t be easier. Spring supports several different caching implementations but EHCache is the default and by far my favorite. EHCache is robust, configurable and handles a distributed environment with ease. Let’s look at how we can add the ability to cache to a Spring project.

<beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:schemaLocation="http://www.springframework.org/schema/beans http://www.springframework.org/schema/beans/spring-beans.xsdhttp://www.springframework.org/schema/cache http://www.springframework.org/schema/cache/spring-cache.xsd">
  <cache:annotation-driven />

<!-- Ehcache library setup -->
<bean id="ehcache" class="org.springframework.cache.ehcache.EhCacheManagerFactoryBean" p:config-location="ehcache.xml"/>

Now that we have our cache setup, we can start to utilize it.

@Cacheable(name="records", key="recordList") //Cache the output of records, return it if already cached
public Collection findRecords(RecordId recordId) {...}

@Cacheable(name="records", key="recordsList", condition="recordType == 2") //Only cache if record type is 2
public Collection findRecords(int recordType) {...}

@CacheEvict(value = "records", allEntries=true) //Reload the cache, evict all entries
public void loadAllRecords() {...}

In the above example, we specify through the annotation that the records collection will be stored in a cache named “records” and the key to access the collection will be called “recordList”. The key parameter is optional. We also displayed an example of using Spring Expression language to process the cache conditionally. Remember that caches are defined either dynamically like above or in the ehcache.xml. For most complex caching scenarios, you will want to define the cache in the ehcache.xml with eviction and distribution rules and Spring will find it by the name parameter in the annotation.

What about .NET?

In .NET, you have a very similar mechanism to managing a cache.

<!-- Apply aspects to DAOs -->

[CacheResult("MyCache", "'Record.RecordId=' + #id", TimeToLive = "0:1:0")]
public Collection GetRecord(long RecordId)
   // implementation not shown...

Remember that with the .NET cache, you provide the caching implementation in the XML just as you do in the Java version. In this example, we have used the provided AspNetCache.

What about controlling the cache yourself, querying the cache and more complex operations? Well even that is simple by merely autowiring the cache class or retrieving it from the context.

EhcacheCacheManager cacheManager;

EHCache cache = cacheManager.getCache("records");
Collection records = cache.get("recordList");

Built-in .NET Caching

Fortunately for .NET users, there is also a built in caching framework already in .NET. This technique is used mostly in the MVC and ASP world and I am not particularly fond of it since it is specifically geared for the web side of an application. I would prefer it to be more generic like the Spring solution, but it also has some advantages such as you can configure the caches in the web.config, you can also create custom caching implementations that utilize different mechanisms for cache management. Here is a great example utilizing MongoDB. The .NET version of the cache works much the same way as the Spring one does. Here are some examples with configuration.

[OutputCache(CacheProfile="RecordList", Duration=20, VaryByParam="recordType")]
Public ActionResult GetRecordList(string recordType) {


Now the configuration in the web.config…


          <add name="RecordList"  duration="3600" />

If we are deploying our application to the cloud in Azure, we can use the AppFabric cache as demonstrated here.

Hibernate, Data Nucleus, JPA Result Caching

Another thing to keep in mind is that when you are using tools such as Hibernate, caching is built in with these solutions. There is a built-in second level cache implementation that is provided when you configure Hibernate and I tend to use EHCache for this as well. You must remember to add the Hibernate Cache annotation onto your objects at the class level that you want to cache as well. A properly setup ORM solution with Spring and Hibernate with a properly configured second-level cache is very hard to beat in performance and maintainability.


We have really done a lot with very little thanks to Spring and caching. While caching is powerful and will help improve the performance of your application, if it is overdone, it can cause problems that are difficult to diagnose. If you are caching, make sure you always understand the implications of the data and what will happen throughout the caches lifecycle.

The many types of technology clouds

If you remember from school, there are many different types of clouds. You have stratus, cumulus and nimbus just to name a few. In technology, the cloud is no different. The cloud has a different meaning depending on who you talk to and they are all mostly right but they can be referring to totally different things.

Cloud Architecture

Let’s look at the Wiki definition of a cloud

Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over anetwork (typically the Internet). The name comes from the use of a cloud-shaped symbol as an abstraction for the complex infrastructure it contains in system diagrams. Cloud computing entrusts remote services with a user’s data, software and computation.

There are many types of public cloud computing:[1]

In the strictest sense, cloud computing is not merely just moving your software or services onto a server outside your firewall, but it’s enlisting a service with “cloud” properties like flexible horizontal scalability through elasticity. Cloud architecture is not at all uniform as well. It also means different things to the vendors that provide the services and it they implement it how they choose. Amazon, VMWare Cloud Foundry, Stackato and Azure are some of the hottest cloud architecture services out there today.

Consumer Cloud

From a consumers perspective, the cloud means that your data is stored on the internet somewhere. For example, the iCloud product from Apple keeps all your data in sync between computers and iOS devices. The iTunes Match service stores your music on servers and the same goes for the Google Music service.

Outsourced Cloud

To many IT professionals, the cloud is just a way to offload services they currently have in house to a server outside of their infrastructure managed by someone else to varying levels. Take for example that many organizations have shifted having internal mail servers to migrating to Google for Business, where every aspect of the infrastructure is managed by Google. Another example, might be moving your internal Microsoft Exchange mail server to a hosting provider. The end result, again to varying degrees, is greater reliability and less burden and reliance on internal IT resources. JFrog Artifactory for dependency management has a cloud service or a server you can download. If I purchase this service, can I say that my dependency management is in the cloud regardless of how JFrof implements their cloud service? The answer to this question is yes.

The True Cloud

While the cloud may mean different things to different people, preparing an application for a cloud architecture is not as simple as one might think. There are all kinds of decisions that need to be made and coding practices that should be adhered to. Typically, if you follow Java conventions, then you won’t have much of an issue with cloud deployments. Let’s look at an example Java web application and some of the concerns that need to be addressed prior to cloud deployment to a service such as Cloud Foundry.

  • Your application must be bundled as a war file
  • You application must not access the file system. (Utilize the class loader to load files from within the war, but don’t write to files)
  • Resources/Services should be looked up via JNDI
  • If you cache, utilize caching solutions like EHCache that propagate automatically across instances
  • Persist data to a database such as MongoDB, MySQL or PostgreSQL to guarantee access

For the most part, if you follow the J2EE web application conventions regarding deployments, you won’t have an issue with a deployment to a true cloud environment.

Cloud Architecture Dynamic Scaling vs. Autoscaling

One of the advantages to a cloud architecture is the ability to scale up instances of your application and only pay for what you utilize from your provider. There are two types of scaling, the first is Auto-scaling where a tool is utilized to measure load and will automatically increase instances of your application to accommodate the need and the second if dynamic scaling where you control the scale based on a load forecast. An example of dynamic scaling would be if you normally have a load of 1000 concurrent users and you have an event which will ramp up your load to 10,000. In a more traditional setting, you would have purchased servers and hardware to accommodate the worst case scenario which leaves your infrastructure largely utilized most of the time causing inefficiency and waste. In a cloud scenario,you might have 10 instances to service your normal load and for the 24 hour period of your event, you scale to 100 instances and when the smoke clears, you return to 10 instances. All this can be done with a single click of a button in a cloud infrastructure.

Automatic scaling is provided by some cloud providers in the form of a tool. Amazon provides such a tool that will monitor the load and allow you to set triggers to determine how many instances to ramp and when. Automatic scaling is very useful as it handles the unforeseen, but it should not be used in lieu of true capacity planning. It can also be dangerous and advanced intelligence needs to be built in to determine the different between junk and legitimate traffic. For example, think of what would happen if a DoS attack was performed against your application. Remember you pay based on the number of servers you ramp up.

Under the hood, scaling works by issuing commands to the server through an API. For example, with Cloud Foundry, the vmc command will let you monitor load and add instances which in turn creates a JSON request that gets sent to the server to give it instruction. You can also use a third party tool that interfaces with it in the same fashion or you can also build in the scaling intelligence into your own application by hitting the server API yourself. Using the latter technique, your application can control itself, making it very intelligent.


Regardless of how you see the cloud and how you utilize it, the endgame for your organization should be to offload the burden from your internal staff, decrease your expenses, provide greater uptime and flexibility and give you the ability to scale dynamically.


If you are using Java, you should be using Spring

I spend a fair amount of time evangelizing the Spring Framework, and with good reason. Spring is not only a great lightweight container framework that provides IoC (Inversion of Control) and Dependency Injection, but it pretty much has a tool, component for every task that you can think of when dealing with the day to day ins and outs of programming.

You probably have a need for Spring and you don’t even know it if you haven’t used it before. Most developers at one time or another have created frameworks to accomplish tasks like remoting, JMS, MVC, database interactions, batch work, etc., so I would label Spring as “The framework” for such tasks instead of succumbing to the “roll your own” urge. I started using Spring back in 2006 and I have not, in over 50 Java projects since then, neglected to utilize it to some extent. It has reduced the amount of code I have to write, allowed me to dynamically wire dependencies together at runtime and even provided tools for tasks that I thought I was going to have to write something custom to accomplish.

Spring was born as a solution to the heavy-weight EJB and J2EE container environments. It reduces the overhead of J2EE, allows the usage of containers that are not J2EE compliant like Tomcat and Jetty and provides a consistent API that most developers these days are familiar with. Here are some example of what Spring can do:

  • Dependency Injection (e.g. create a database pool object factory in XML and inject that into objects at runtime)
  • Eliminates the need to write specific code for singleton patterns
  • Allows you to turn a POJO into a service with a mere annotation
  • With Aspects, it allows you to inject values into classes that are not managed by Spring
  • Spring has an abstraction on top of over 100 different frameworks in Java
  • Spring MVC is the most concise and robust MVC framework
  • Spring provides JPA, Hibernate and DataNucleus support and will allow transaction demarkation
  • Spring provides AOP capabilities to allow method interception and point cuts
  • Exposing POJO methods as web services is as simple as adding Apache CXF to the mix
  • Annotation support is richer than any other framework
  • Spring is the most widely used Java framework
  • Property file loading and substitution in XML

Spring is not only a Java tool, in fact, Spring.NET is available for the .NET platform. It is usually a little bit behind the Java version but it is out there.

What are these new concepts AOP, IoC and Dependency Injection?

Usually a discussion of Spring always amounts to explaining the concepts that are at the core of the framework. Let’s take a look at each of them and what they give you. IoC and Dependency Injection go hand in hand. IoC is the concept and Dependency Injection is the mechanism. For example, you create a service class on your own and now you need to manage that class by ensuring it only has one instance, you also need to get that class reference to other classes to utilize so you create a mechanism for that. Now you need transaction support so you write that in, but you also need to dynamically read in properties that are for the environment you are running in and it goes on and on. As you can see, it not only gets complicated, but that is a lot of code you are writing and maintaining yourself. Spring provides it all and through XML or annotations (preferrable the latter), you can with one simple Plain Old Java Object (POJO) accomplish all of this through Spring conventions and inject the values of the objects into your service or inject your service into any other class by simply this.

//Service class Spring Bean
public MyService implements IService {

public void doThis();


//MVC Controller Class

public MyController {

MyService myService

public Report doThat() {




In just a few lines of code, we created a singleton service that is transactional and we created a controller to call that service. There are way more complex things we could do here. For example, using the OpenSessionInView Pattern, we could get the Controller to control opening the transaction and closing it to allow for multiple service calls to use the same transactional context. We could also change the isolation level of the transaction. The point here is that we used Dependency Injection to demonstrate what IoC can do.

AOP or Aspect Oriented Programming is an advanced concept in the Spring world. AOP is merely the separation of cross-cutting concerns. The Transactional support of Spring is similar to AOP. The ability of Spring and AspectJ to allow you to inject objects into other objects that are not managed by Spring is another great example. Transactions, Security, Logging and anything other than the business at hand is a cross cutting concern. The goal of AOP is to separate those away from the code. If you didn’t use AOP, then you would have to actually control these elements your self. Take for example that you want to check that a user is valid before method calls. Without AOP, you would have to write some checkUserIsValid() method and call it at the beginning of each method. Using AOP, you could merely mark with an annotation or Aspects that each method of a certain class call another method on another class as an interceptor.

Spring is also for simple projects

You may be thinking Spring is too heavy weight for the task at hand… nonsense. I will guarantee that Spring, used properly, will reduce the amount of code in your project by at least 25%. That is 25% less code for your to maintain or write in the first place. Also, Spring provides even tools to accomplish the small tasks such as the following:

  • Finding resources on the classpath or file system (ResourceLocator)
  • Finding classes with a certain annotation
  • Generating JSON/XML from objects and vice versa (Jackson support)
  • load multiple property files and substitute variables inside of your spring XML files (Useful when promoting to different environments)
  • Ability to treat a JNDI resource as just another bean
  • Ability to treat a web service as just another bean
  • JDBCTemplate for issuing database queries and batch framework for batch operations
  • Spring Data for NOSQL support with Mongo
  • MVC Content negotiation to convert a POJO to JSON,XML, PDF (iText), Excel (POI) and more
  • Security integration that supports Windows, Active Directory, Kerberos, Basic, Digest, etc.
  • Robust Testing adapters for Junit and TestNG

I could spent a week delivering Spring training to a group of developers and only scratch the surface of what is there. Without a doubt though, when I have a tough problem to solve or a simple Java project, I always utilize parts of Spring. Conveniently, Spring is broken up into modules so that you can only include the ones that have the functionality you need to avoid causing any project bloat.


With Spring being the #1 Java framework, I highly recommend spending some time getting familiar with it and I recommend getting some training as well from someone like myself who is an expert with the framework who can show you everything it has to offer before you start utilizing it. You can also get training directly from vmWare, the company that owns SpringSource.

Why you should be using MongoDB/GridFS and Spring Data…

I recently delved into MongoDB for the first time, and albeit I was skeptical at first, I now believe it is my preference to use a NOSQL database over a traditional RDBMS. I rarely just fall in love with a new technology but the flexibility, ease of use, scalability and versatility of Mongo are good reasons to give it a chance. Here are some of the advantages of MongoDB.

  • NOSQL – A more object oriented way to access your data and no complex SQL  command to learn or remember
  • File Storage – Mongo is a master of storing flat files. Relational databases have never been good at this.
  • No DBA – The requirement of database administration in greatly minimized with NOSQL solutions
  • No schema, complex structures or normalization. This can be a good thing and also bad. Inevitably everyone has worked on a project that has been over normalized and hated it.
  • No complex join logic

Spring Data for Mongo

My first stop when coding against Mongo was to figure out how Spring supported it and without fail, I was not disappointed. Spring Data provides a MongoTemplate and a GridFSTemplate for dealing with Mongo. GridFs is the Mongo file storage mechanism that allows you to store whole files into Mongo. The Mongo NOSQL database utilizes a JSON-like object storage technique and GridFS uses BSON (Binary JSON) to store file data.

As the name implies, a NOSQL database doesn’t use any SQL statements for data manipulation, but it does have a robust mechanism to accomplish the same ends. Before we start interacting with Mongo, let’s look at some of the components I used to accomplish the examples I am going to show you.

  • Spring 3.1.0.RELEASE
  • Spring Data for MongoDB 1.1.0.M2
  • Mongo Java Driver 2.8.0
  • AspectJ (Optional) 1.7.0
  • Maven (Optional) LATEST

The very first thing we need to configure is our context.xml file. I always start a project with one of these but I use Spring annotations as much as possible to keep the file clean.

 <?xml version="1.0"?>
<beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"

	<!-- Connection to MongoDB server -->
	<mongo:db-factory host="localhost" port="27017"
		dbname="MongoSpring" />
	<mongo:mapping-converter id="converter"
		db-factory-ref="mongoDbFactory" />

	<!-- MongoDB GridFS Template -->
	<bean id="gridTemplate" class="org.springframework.data.mongodb.gridfs.GridFsTemplate">
		<constructor-arg ref="mongoDbFactory" />
		<constructor-arg ref="converter" />

	<mongo:mongo host="localhost" port="27017" />

	<bean id="mongoTemplate" class="org.springframework.data.mongodb.core.MongoTemplate">
		<constructor-arg ref="mongoDbFactory" />


	<context:annotation-config />
        <context:component-scan base-package="com.doozer" />
	<context:spring-configured />


In short, the context file is setting up a few things.

  • The database factory that the templates will use to get a connection
  • The MongoTemplate and GridFSTemplate
  • Annotation support
  • Annotation @Configuration support if needed (Optional)

Let’s take a look at my App class that is the main entry point for this Java application.

public class App

public MongoOperations mongoOperation;
public StorageService storageService;

ApplicationContext ctx;
public App() {

ctx = new GenericXmlApplicationContext("mongo-config.xml");

I am using AspectJ to weave my dependencies at inject them at compile or load time. If you are not using AspectJ, you need to lookup the MongoOperation and StorageService from the Context itself. The Storage Service is a simple @Service bean that provides an abstraction on top of the GridFsTemplate.


public class StorageServiceImpl implements StorageService {

private GridFsOperations gridOperation;

public String save(InputStream inputStream, String contentType, String filename) {

DBObject metaData = new BasicDBObject();
metaData.put("meta1", filename);
metaData.put("meta2", contentType);

GridFSFile file = gridOperation.store(inputStream, filename, metaData);

return file.getId().toString();

public GridFSDBFile get(String id) {

System.out.println("Finding by ID: " + id);
return gridOperation.findOne(new Query(Criteria.where("_id").is(new ObjectId(id))));

public List listFiles() {

return gridOperation.find(null);

public GridFSDBFile getByFilename(String filename) {
return gridOperation.findOne(new Query(Criteria.where("filename").is(filename)));


Our StorageServiceImpl is merely making calls to the GridOperations object and simplifying calls. This class is not strictly necessary since you can inject the GridOperations object into any class, but if you are planning on keeping a good separation to be able to extract Mongo/GridFS later to go with something else, this makes sense.

Mongo Template

Now, we are ready to interact with Mongo. First lets deal with creating and saving some textual data. The operations below show a few examples of interacting with data from the Mongo database by using the MongoTemplate.

User user = new User("1", "Joe", "Coffee", 30);
User savedUser = mongoOperation.findOne(new Query(Criteria.where("id").is("1")), User.class);
System.out.println("savedUser : " + savedUser);
mongoOperation.updateFirst(new Query(Criteria.where("firstname").is("Joe")),
Update.update("lastname", "Java"), User.class);
User updatedUser = mongoOperation.findOne(new Query(Criteria.where("id").is("1")), User.class);
System.out.println("updatedUser : " + updatedUser);
// mongoOperation.remove(
//      new Query(Criteria.where("id").is("1")),
//  User.class);
List<User> listUser =
System.out.println("Number of user = " + listUser.size());

As you can see, it is fairly easy to interact with Mongo using Spring and a simple User object. The user object is just a POJO as well with no special annotations. Now, let’s interact with the files using our StorageService abstraction over GridFs.

//StorageService storageService = (StorageService)ctx.getBean("storageService"); //if not using AspectJ Weaving
String id = storageService.save(App.class.getClassLoader().getResourceAsStream("test.doc"), "doc", "test.doc");
GridFSDBFile file1 = storageService.get(id);
GridFSDBFile file = storageService.getByFilename("test.doc");
List files = storageService.listFiles();

for (GridFSDBFile file2: files) {

The great thing about Mongo is that you can store metadata about the file itself. Let’s look at the output of our file as printed by the code above.

{ "_id" : { "$oid" : "502a61f6c2e662074ea64e52"} , "chunkSize" : 262144 , "length" : 1627645 , "md5" : "da5cb016718d5366d29925fa6a2bd350" , "filename" : "test.doc" , "contentType" : null , "uploadDate" : { "$date" : "2012-08-14T14:34:30.071Z"} , "aliases" : null , "metadata" : { "meta1" : "test.doc" , "meta2" : "doc"}}

Using Mongo, you can associate any metadata with your file you wish and retrieve the file by that data at a later time. Spring support for GridFS is in its infancy, but I fully expect it to only grow as all Spring projects do.

Query Metadata

The power of Mongo also lies in the metadata concepts that I mentioned earlier and relational databases just don’t have this concept. Mongo stored implicit metadata about the files and it also allowed me to attach any data I wish onto a metadata layer. You can query this data in the same fashion you would query Mongo directly by using the . notation.

gridOperation.findOne(new Query(Criteria.where("metadata.meta1").is("test.doc")));

Map Reduce

Mongo offers MapReduce, a powerful searching algorithm for batch processing and aggregations that is somewhat similar to SQL’s group by. The MapReduce algorithm breaks a big task into two smaller steps. The map function is designed to take a large input and divide it into smaller pieces, then hand that data off to a reduce function, which distills the individual answers from the map function into one final output. This can be quite a challenge to get your head around when you first look at it as it requires embedding scripting. I highly recommend reading the Spring Data for Mongo documentation regarding Map Reduce before attempting writing any map reduce code.

Full-Text Search

MongoDB has no inherent mechanisms to be able to search the text stored in the GridFS files, however, this isn’t a unique limitation as most relational databases also have problems with this or require very expensive addons to get this functionality. There are a few mechanisms that could be used as a start to writing this type of mechanism if you are using the Java language. The first would be to just simply take the text and attach it as metadata on the file object. That is a really messy solution and screams of inefficiency, but for smaller files is a possibility. A more ideal solution would be to use Lucene and create an searchable index of the file content and store that index along with the files.

Scaling with Sharding

While very difficult to say in mixed company, Sharding describes MongoDB’s ability to scale horizontally automatically. Some of the benefits of this process as described by the Mongo web site are:

  • Automatic balancing for changes in load and data distribution
  • Easy addition of new machines without down time
  • Scaling to one thousand nodes
  • No single points of failure
  • Automatic failover


  • One to 1000 shards. Shards are partitions of data. Each shard consists of one or more mongod processes which store the data for that shard. When multiple mongod‘s are in a single shard, they are each storing the same data – that is, they are replicating to each other.
  • Either one or three config server processes. For production systems use three.
  • One or more mongos routing processes.

For testing purposes, it’s possible to start all the required processes on a single server, whereas in a production situation, a number of server configurations are possible.

Once the shards (mongod‘s), config servers, and mongos processes are running, configuration is simply a matter of issuing a series of commands to establish the various shards as being part of the cluster. Once the cluster has been established, you can begin sharding individual collections.

Import, Export and Backup

Getting data in and out of Mongo is very simple and straight forward. Mongo has the following commands that allow you to accomplish these tasks:

  • mongoimport
  • mongoexport
  • mongodump
  • mongorestore

You can even delve into the data at hand to export pieces and parts of collections by specifying them in the commands and mixing in . notation or you can choose to dump data by using a query.

$ ./mongodump --db blog --collection posts --out - > blogposts.bson

$ ./mongodump --db blog --collection posts
    -q '{"created_at" : { "$gte" : {"$date" : 1293868800000},
                          "$lt"  : {"$date" : 1296460800000}

Mongodump even takes an argument –oplog to get point in time backups. Mongo’s backup and restoration utilities are as robust as any relational database.

Limitations of MongoDB

Mongo has a few limitations. In some ways, a few of these limitations can be seen as benefits as well.

  • No Joining across collections
  • No transactional support
  • No referential integrity support
  • No full text search for GridFS files built in
  • Traditional SQL-driven reporting tools like Crystal Reports and business intelligence tools are useless with Mongo


The advantages of MongoDB as a database far outweigh the disadvantages. I would recommend a Mongo NOSQL database for any project regardless of what the programming language you are using. Mongo has drivers for everything. I do however think that if you are in a certain scenarios where you are dealing with rapid, realtime OLTP transactions, MongoDB may fall short of competing with a high performance RDBMS such as Oracle, for example. For the average IT project, I believe Mongo is well-suited. If you still aren’t sold on Mongo by now, (I would be pretty shocked if you weren’t), then feast your eyes on the high-profile sites that are using MongoDB as their backend database today.

  • FourSquare
  • Bit.ly
  • github
  • Eventbrite
  • Grooveshark
  • Craigslist
  • Intuit

The list goes on and on… There are also several other NOSQL solutions out there that enjoy popularity.

  • CouchDB
  • RavenDB
  • CouchBase

Optional Components

I used several optional components for my exercises. I wanted to address these for the folks who may not be familiar with them.

AspectJ and @Configurable

Many folks would ask why I chose to use Aspect Weaving instead of just looking up the objects from the context in the App object. @Configurable allows you to use the @Autowired annotation on a class that is not managed by the Spring context. This process requires load-time or compile-time weaving to work. For the purposes of Eclipse, I use the ADJT plugin and for Maven, I use the AspectJ plugin to achieve this. The weaving process just looks for certain aspects and then weaves the dependencies into the byte code. It does solve a lot of chicken and egg problems when dealing with Spring.


If you are using Maven and you want all of the dependencies I used for the examples, here is the pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">




		<!-- Spring framework -->




		<!-- mongodb java driver -->