Thursday, January 08, 2009

Rules for running an IT organization

The last post was how to run a research organization. In this post, I'm going to set my sights even higher and tell how to run an IT organization. These rules have been picked through my studies of various companies.

1. Hire the best people - I've harped on this before. So have others and others and still others. If you don't know by now that there is at least a 10x difference between the top programmers and everyone else then you haven't been paying attention. I can't say much on this topic that hasn't already been written, but I will point out that you need to be proactive about hiring the right people. It's not just the free food. It's about providing employees with incentives to do great work. If you just need a warm body, then provide him with a steady paycheck. But if you work in IT, you don't want warm bodies, they are not in the 10x group. You want the best, so you have to make it worth it. You have to make them feel important and wanted. You have to put effort into finding them at the top colleges, recruting them, and retaining them. If you are a small company, you can provide stock options. A large company? You can provide free food :-) Either way, you have to provide challenging problems, a sense of ownership, and other fantastic people to work with. Remember, if you have ten of the 10x group that is equivalent to one-hundred of everyone else. If you don't think that is true because of parallelism, etc... you are wrong. Parallelism doesn't apply because the communication pathways increase leading to lower overall productivity. Just like you don't get a linear increase with most parallel algorithms, the same applies for programmers. The communication overhead gets in the way. Therefore, with ten 10x programmers you can be more productive than with one-hundred other programmers. And you don't even have to pay them 10x as much, though you will have to pay some extra. With the savings, you can provide more stock options or free food :-)

2. Make the source available to all - Everyone in the company should have access to all the source code. At Google, the designers of the library can easily update it, because they can open everyone else's code up in eclipse, choose refactor, and then save and recommit it. They are not blocked. In addition, an open code base implies group accountability. Too often, programmers get too attached to their work. They don't see the flaws that are inevitable in their programs. An open code base allows everyone to view and comment on everyone else's work. It forces programmers to realize that they are mortal and make mistakes. Eventually, they will come to know that the group is better than the individual and their work improves due to peer review. Forcing an open code base allows this to happen sooner and with more acceptance.

3. Dedication to infrastructure - Who is responsible for the build system? Who writes the core infrastructure components? Who sets up the distributed computing system? Who determines which dependency injection framework should be used? All of these are core infrastructure decisions and they all matter. As we'll see in a future bullet point, standardizing the infrastructure is vital. You want your application programmers to focus on their individual application. If every application programmer has to focus on their infrastructure, then you are losing 3 to 6 months out of every project. If it takes a week to set up an automated build system and a week to attach the test harness and another week to set up the distributed key/value store, then you have lost 3 weeks of your project to things that have to occur for nearly every project! Instead, you want application programmers to think about their application. You want them to worry about delighting their customers. They need an automated build system, but they shouldn't have to think about which one to use, how to set it up, or what happens when it isn't working. That is infrastructure. In the past, there were extra teams responsible for hardware, networking, and OS setup. These were not the responsibility of programmers. Now, we're adding additional layers. The build system, test framework, and distributed computing platform are additional infrastructure components that must be standardized and managed elsewhere. There could be additional infrastructure components for your organization as well. For instance, Google uses dependency injection so often that they wrote their own framework for it. Notice that I didn't say every team wrote their own framework. Instead, one team wrote it and maintains it and everyone else uses it. This consistency and willingness to use other people's work makes for a successful company. As another example, Google has a Java collections library that the company uses. Every team can take advantage of this without having to rewrite it. In other words, the goal of the infrastructure group is to find and eliminate duplication throughout the company. This could be with hardware, applications, or libraries. Regardless, duplication is the enemy of the IT organization and it must be eliminated!

4. Repeatability - Everything that is put into production must be a repeatable process. Not only that, but it must be repeatable by someone else! This means both applications and documentation must be written and available that show how to repeat it. Once again, we're trying to seek out and eliminate duplication. Programmers in the future shouldn't have to reverse engineer or re-create your application. If it is worth putting into production, then it is worth documenting and ensuring repeatability.

5. Enforcing standards - You want programmers to feel empowered, but you also want productivity. You need standards to ensure the latter, but you need only the right standards to ensure the former. Standardizing on one build system (perhaps per language) ensures that everyone can access and build everyone else's code. Standardizing on a common base class ensures pandamonium. You want to standardize tools, not techniques. Not only does this allow programmers to quickly move from one project to the next, but it also provides continual feedback and improvement on your internal tools. When a programmer says, "Tool X doesn't provide capability Y so I'll write my own" you are in for disaster. Now, every project begins with a 3 week tool writing cycle. Instead, if the programmer would just fix Tool X then everyone else can take advantage of capability Y. Libraries work the same way. Pick a dependency injection library and use it across the organization. It doesn't matter if one team happens to like Spring over Guice. They are both open source and you can alter it to suit your needs. Just pick one and be consistent. Get everyone moving in the same direction. That way, when they increase others' velocity they move in that direction even faster.

6. Don't start from scratch - If nothing exists, then you have to start from scratch. However, if you have a working product, then don't start rewriting it from scratch. The only thing you do is create new bugs that you don't know about instead of fixing the old ones you already knew about. Now, that doesn't mean that you shouldn't take a troublesome component and rewrite it. That doesn't even mean that over the course of a year you don't eventually rewrite 90% of the application. It does mean that you work piecemeal, with legacy code. You create tests, if you don't have them. You write documents, if you don't have them. You continually improve the product you have. Then, when you are ready, you begin refactoring. You improve its structure a little bit at a time. Just enough to add your new functionality with new tests. Then a little more and a little more. Eventually, your crufty working system blossoms into a beautiful piece of artwork. Well, you'll think so anyway, but the next developer won't understand why you did X, Y, and Z and he'll want to rewrite it from scratch. That's the problem and the point. When we don't understand how something works, we want to rewrite it so that we do. However, the next maintainner doesn't understand it either, so the rewrite circle continues and is vicious. If something works, use it, clean it, modify it. Don't rewrite it.

7. Be dedicated to testing - The best organizations have dedicated testers. This is not a coincidence. Recognizing quality as a core attribute of a product is vital to having a quality product. Manufacturing companies have known this forever. Why is it that IT shops think they can ignore quality and still have a quality product? Google's Chrome has unit tests for highly ranked web pages, automated UI testing, and random input testing. Not only that, but they also ran other test suites against it. For instance, it passes 99% of webkit's layout tests and passes all but 2 of jQuery's unit tests. Quality is a core attribute. At Microsoft, there is a Software Development in Test job role. Each team is assigned one or more of these resources to ensure they deliver a quality product. These people are top notch programmers that love to break things. They are not random guys off the street. They can code just as well as the SDE's and it is vital that they can do so. Testing today is about coding. It is about double entry bookkeeping. It is about automation and repeatability. People expect thier products to work, out of the box. They expect future releases to be backwards compatible. They expect a quality product. To delivery that, quality must be a feature. The organization must be dedicated to quality - having a special testing division is one way to commit to that level of dedication.

8. Metrics - "In God we trust, all others bring data." Metrics are at the core of process improvement. How can you possibly know if you are improving if you don't measure that improvement? Would you be satisfied with your water treatment plant if it told you the water quality was improving because it looked a bit clearer? No. You'd want to know the Ph levels, the amount of sediment in the water, etc... You expect a quality product to have metrics that back up its quality. If speed is a feature, then you'd expect metrics around how fast the product will go in certain conditions. If accuracy is a feature, then you'd expect benchmarks showing the accuracy against other products or human judges. In all cases, metrics are vital to a product in order to show improvement and achievement of goals. The one caveat is to be sure you are measuring the right thing. It is inevitable that programmers will find a way to improve the metrics. If the metrics are measuring the right thing, then that is great. If the metrics are measuring the wrong thing, say lines of code, then you have a recipe for disaster. Make sure that your metrics are measuring externally visible things, not internal ones. You want improvements to your metrics to affect the customer, not your programmer's salary. Everyone cheats the system, you want those cheats to have a positive impact on your final product.

Well, there you have it. 8 rules for running an IT organization. Hopefully they will make your company the next IT powerhouse!

No comments: