Wednesday, June 04, 2008

The Future of Enterprise

What is the future of enterprise software? Is it JRuby on Rails? What about Scala? Maybe a .NET stack? What is the next big shift?

In my opinion, the next big shift is to grid computing. Things like EC2, Hadoop, and HBase will become more and more popular. Commercial versions and vendors will spring up over the next few years. Enterprises will start pushing more and more of their computation into batch grid work. Column store databases will replace traditional RDBMS's for data warehousing (this is already happening with Teradata). Transactional applications will work on cached stores that are pulled from a large grid where they were processed and analyzed. Marketing will be on demand, but pre-computed.

In the rush to understand and market to the consumer, more and more companies have been moving to real-time analytics. However, the faster you need a decision, the less time you have to think about it. That is why grid computing will be so important. Your think time has to be done in advance. Transactional queries will be relegated to closing deals or finding a pre-computed offer.

The main challenge, as I see it, will be to find and use data structures that can be refreshed quickly. Google is in a fortunate position. They can refresh boxes asynchronously. If customer A enters query Q1 and then enters query Q2 it is ok if those two queries hit different data sets and return different results. However, if we're not dealing with a search application but instead something akin to a customer recognition or record linkage application, then the same person should always receive the same link. This is why versioning will become so important in the future. The customer will get a certain version of the data sets and will continue to use that version until their application finishes at which point they can be upgraded. This requires more server side storage, but allows the client the ability to use a consistent data set. This type of versioning and auto-update software will be necessary in a commercial form. In addition, quick nearest-neighbor searches will need to be commercialized. Clients will want to know quickly which market segment a client falls into. Nearest neighbor searches are the key to understanding that and they will need to be generalized and distributed over the next few years.

So, the enterprise is changing. The relational database will move from its position of dominance to just another tool. The grid will take its place as a hammer looking for nails, and applications that can process terabytes of data quickly will be deemed must haves for enterprise data centers across the world. The Enterprise Service Bus (ESB) will continue to shuffle transactional data around, but the backend will now be distributed data stores that will be versioned and querable by categorized nearest neighbor searches. Who will be the vendor of these tools? I have no idea. Many, like Hadoop, will be open source. Some, like Teradata, already exist. Others will be proprietary and don't even exist currently. Regardless, it will be a lot of fun and I'm looking forward to the ride.

No comments: