Saturday, September 05, 2009

C# Timings

Recently, I was adding a bit of debug code to measure how long it took to run various parts of my program (yes, I know a profiler would be more accurate, but I just needed a rough estimate). So, I had 3 or 4 areas in my code where I did something akin to

TimeSpan timer = new TimeSpan();
DateTime startTime = DateTime.Now;
Type1 rval = DoIt(a, b);
timer += DateTime.Now - startTime;

Firstly, this code is ugly. I'm obviously doing many different things here: timing code, calling a function, storing it's result, etc.... It would be much better if I could wrap the timing code into its own function. So, I esentially want something like:

TimeSpam timer = TimeIt( () => DoIt(a, b) );

The code for TimeIt is simple to write

TimeSpan TimeIt(Action a)
{
DateTime startTime = DateTime.Now;
a.Invoke();
return DateTime.Now - startTime;
}

However, that doesn't save the return value of DoIt, so that is useless for my purposes. I need TimeIt to return the value from DoIt. To do this, I make TimeIt a generic method and have it accept a function instead of an action. I pass the TimeSpan in as a reference argument. The calling code looks like

TimeSpan timeSpan = new TimeSpan();
Type1 rValue = TimeIt(() => DoIt(a, b), ref timeSpan);

The TimeIt function looks like

public static T TimeIt(Func f, ref TimeSpan timeSpan)
{
DateTime startTime = DateTime.Now;
T rVal = f.Invoke();
timeSpan += DateTime.Now - startTime;
}

There, now we have a simple and generic way to time a single method that returns a value. If you wanted to go further, you could create a class that held the TimeSpan value so that you wouldn't have to pass it in as a ref, but this worked for my needs and was clean enough so I stopped here. If I need this code again, I'll refactor it into another library and perhaps into its own class. Until then, I'll enjoy the cleanliness and conciseness.

Monday, July 20, 2009

A letter to my representative

I sent a letter to my representative regarding the health care bill and would like to repost it here for others to comment.

Hello Representative Inslee,

I wanted to write a quick letter in opposition to the upcoming health care reform. Believe it or not, health care is undergoing a major revolution today. The problem is that the traditional "insurance" programs of the past are becoming harder to use because people have confused "insurance" with "comprehensive coverage". Allergy medicine, maintenance meds, etc... are not part of any insurance program. Insurance is responsible for covering major and unexpected medical issues. Over the last 5 years, more insurance companies are coming to terms with this and are inventing things like Health Savings Accounts, which, though unpalatble to many consumers, are a cost effective means of giving consumers choice over how to best spend thier medical allowance. Unfortunately, the private sector has not sorted everything out yet, so now is a very inopportune time for the government to intervene. The government is not innovative nor is it cost effective. Instead, it simply throws constiutents money at a problem with little effect until they stop complaining about it. I do recognize the importance of helping our elderly and needy, but I also recognize the precarious position of the american economy and the health care industry. Despite Obama's claim, now is NOT the time to act. Now is the time to rely on capitalism and market forces to sort out what the future should hold and then fit our current medicare and medicaid plans into that future. The economy, the health care industry, and your constituents need you, Representative Inslee, to be a pillar of salt in today's wax and wane congress. You have been in the past and I trust you will be again.

Thanks for your time,
Tanton Gibbs
Redmond, WA

Sunday, June 28, 2009

What should Yahoo! do?

Yahoo!'s CEO, Carol Bartz, has stated firmly that Yahoo! is not a search company. In fact, Yahoo!, she states, is much closer to a portal. The goal is not search, but editorial comments and a local feel. A portal, if you will, from a major media/technology firm. Unfortunately, in the coming years, portals will be less and less necessary as people will get their news from a swath of algorithmically mined sources across the social media landscape. I'm afraid the era of Yahoo! dominance has ended and will not return.

Like IBM in the 90's, Yahoo! must redefine itself. Yahoo! must sell the mills.

Again, like IBM, who has become a major integrator, and Kimberly-Clark, who has become a consumer goods powerhouse, Yahoo! must set out on the next phase of it's life - it must enter and dominate a new area or be relegated to a technological side show until it dies. But what should it pursue? To answer that question, we need to look at its strengths and products, especially those that are hard to mimic, such as those related to its core technology.

Yahoo! has world class data centers, meaning they can store and process data cheaper than anyone else (except, perhaps, google). Yahoo! also has a data processing platform that allows it to analyze web scale data quickly and efficiently. It's also beginning to build a dataflow language which allows developers to be productive. Finally, Yahoo! has a respected research organization which keeps it at the forefront of areas such as machine learning and information retrieval.

The question then becomes: "What business needs massive computational power, data storage and processing ability, and makes heavy use of machine learning and information retrieval?" The biggest one I can think of is data marketing. Analyzing billions of consumer and business records as quickly as possible and making decisions on the fly is what data marketing is all about.

Imagine a person walking into an electronics store. Currently, if the person buys something he may get a coupon to promote his buying something more, later. However, this is suboptimal for two reasons. The simplest reason is that the coupon may be unnecessary. The patron may have been planning to return even without the coupon and so the coupon represents wasted revenue. The second reason is that the coupon may be too late. It could be that if the patron would have been given the coupon on the way in then he might have bought more or upgraded or any other number of things that could have produced more revenue. It could even be that a customer doesn't buy anything at all, but would have purchased if a coupon was given beforehand. The holy grail of marketing is to sell an item to each person at an individualized cost. You want that person to pay as much as possible for the item. If person A will pay $100 for a camera and person B will pay $300 for the same camera then you want to issue the $200 off coupon to person A and no coupon to person B.

Now, imagine a system where a customer is recognized by an image recognition system the moment they walk in. Their data is retrieved instantly and a machine learning algorithm is run to determine what the person is shopping for and what the person is willing to pay. From their online profile and twitter account, the algorithm is able to determine that the person recently broke their camera and is looking to replace one. Using data from a company such as this one the algorithm can tell that they are struggling economically and will want to buy a cheaper camera. However, it also knows from statistics that a coupon for the higher priced camera has a good probability to make the consumer stretch his budget and go for the "prestige" item. A few more checks against the data center reveals that the current store location has excess inventory of the higher priced camera and therefore the algorithm decides to dispatch a bigger discount coupon for the better model. The sales attendant walks to the customer and presents the coupon, escorting the customer to the camera aisle. The sale is made and the computer gets an extra volt in its bedtime snack ;-)

More seriously, Yahoo! could perform those computations more efficiently and with better accuracy than an enterprise. Enterprise data centers are not going to reach the efficiency of Yahoo!'s data center. Moreover, enterprise IT programmers are not going to have the time or penchant to do the necessary IR and ML research. Yahoo! is uniquely positioned to do those things; furthermore, since it is dangling on the precipice of disaster now is a good time to bite the bullet and make the change. Not that I expect them to. Change is something that is hard to accept and Ms Bartz doesn't seem like the type to change their core business...if she ever figures out what that is. More than likely the model I express above will be adopted by a new comer...perhaps by these guys.

Sunday, May 24, 2009

The Perils of Fanaticism

I found this article on reddit the other day and was immediately struck by its stupidity. Now, usually I wouldn't use such a harsh word, but the ideas contained in it are so preposterous that it doesn't really deserve any better word.

I first want to say that I have no opinion about the book described or the author of the book. I have not read the book and I don't know the author. For all I know, After the Software Wars could be about bunny slippers. Instead, I want to be very clear that I'm arguing only with the points contained in the article.

The article describes how open source produces superior software to a closed source model and uses, as its first example, Wikipedia. It mentions that Wikipedia is not software, but uses it as an example anyway. Seriously. I can't deny that no closed source institution can provide the same quality of data as Wikipedia. However, Wikipedia deals with data, not code, and data entry can be done by anyone. It is very different from a program that requires someone know how to program before they change it.

Next, the article uses two examples to prove that open source software is better than closed source: FireFox and the Linux kernel (over IE and Windows). Wow, this is the height of arrogance and a circular argument. There is no demonstrable way to prove their superiority; I doubt even 4 out of 5 dentists would agree that one is better than the other.

The next part of the article that I find incredulous is that he believes that Google Docs will never catch on, while Linux and OpenOffice will. Yeah, sure. People are going to avoid using something that doesn't take any effort to change to, makes your data available everywhere, and is constantly updated versus something requiring a complete modification of your current computing environment, has to be shared separately, and is updated using magic incantations. While this may sound like I'm a Linux newbie, I'm not. I'm just pointing out the fact that, to most people, yum is something you say before dinner, not the way to update your programs.

The MOST ignorant thing, however, was the following: "If Microsoft, 20 years ago, built Windows in an open way, Linux wouldn't exist, and millions of programmers would be improving it rather than competing with it."

Yeah, right. We don't have various flavors of Linux. We don't have emacs and Xemacs. We don't have hundreds of open source projects that do the same thing in slightly different ways. Come ON! Competition is in our blood. Doing things "our way" is what makes programmers unique. You could have the most awesome open source project on the planet and someone would find a reason to fork and change it. That's just how we work. The idea that everyone will give up their belongings, join a commune, and hum "Linux is king" is not realistic. Just look how many open source programming languages there are. Look how many open source unix-like variants there are. Get a grip on reality and stop pontificating your idiotic ideals.

Ok, next quote. "The biggest difference between Windows and Linux is that free software contains thousands of applications, installable with one click, and managed as one set." First off, I'm not sure what that means. I have never seen a Linux application that was installable with one click. I definitely prefer the Windows model of installation to the Linux model. In addition, the Windows and Linux model of uninstallation is similar, IMO, so I'm not sure what it means to be "managed as one set", nor how that differs from Windows. As for thousands of applications, that is true, but Windows also has a number of free applications and that seems to be growing as the .NET platform usage increases.

The idea that software developers around the world will just give up the idea of profits and join together in harmony is ridiculous. Furthermore, proprietary software has benefits. You can judge the financial soundness of a company to see whether their software will be around in 10, 15, or 20 years. Just look at sourceforge to see the number of free software projects that are abandoned.

I believe strongly in the value of free software and open source software (and I even know the difference). However, I also believe strongly in the value of proprietary software. For a case study in what happens when you are the champion of free software, just look at Sun. Oh, you can't, it's been bought by Oracle, a closed source shop. Yeah, Microsoft should really follow in their footsteps. Sheesh.

Monday, February 23, 2009

Yahoo! Interview

I realize this write up is a bit late, seeing that I already work for Microsoft, but I wanted my interview collection to be complete, so I want to add this one and the interview with Google. Realize, that it has been six or so months since I interviewed with Yahoo!, so my memory has faded a bit. However, I'll go over what I remember.

After my interview with Google, I didn't think I would interview with Yahoo! because it was in the same area of the country - an area I was not impressed with. However, a friend of mine, who works for Yahoo! talked me into interviewing. He said that he could show me around so that I would enjoy the area. I wasn't convinced of that, but at the least it was a free trip to see an old friend, so I decided to take him up on it. In hindsight, I'm very glad I did. First, because getting the Yahoo! offer allowed me to get other offers, but, more importantly, because my friend would passed away in December of last year and it was to be the last time I saw him.

My friend, Nathan, submitted my resume to two groups in Yahoo! The group that contacted me was the Pig group. This was fortunate, because I had just begun using Pig and Hadoop at work. I had also begun interacting with the Pig group through their mailing list. Therefore, I was familiar with the product and could talk about it in our very first call.

I was actually impressed with the phone screenings for Pig. We talked about things relevant to the product, such as how to handle large memory footprints with Java and various join algorithms for large data sets. We also discussed some more trivial things such as some of the differences between C++ and Java and what a virtual function is. I actually got confused at this point and described how compilers typically implemented a virtual function instead of what they did, but we eventually sorted things out.

After the first phone interview, I was fairly confident that things went well. This was mostly due to the fact that the interviewer told me that things went well :) She was the only phone screener to be that blunt. The next phone screener discussed process with me. We talked about TDD, documentation, program management and other things similar to that. It went well and they invited me to Silicon Valley to interview in person.

In retrospect, the best part about going to the Yahoo! interview was getting to see Nathan again. It was the last time I was to see him before the motorcycle accident that led to his death. He took me around the coast and we picked up some strawberries and cherries at a roadside stand. He also took me for a "backstage" view of the Yahoo! scene. It appeared rather normal, other than the guy who had a mini-bar in his cube.

Back to the interview. Like the Amazon interview, all of the Yahoo! interviews took place in the same room. The interviews varied in style, some were more puzzling (how do you find a loop in a linked list), some were more practical (tell everything you would do to design a highly available high throughput web server).

Throughout the interview, I felt I needed far too many hints. I think I did well on the design portions, but the algorithm sections were my weak point. I ended up getting the answers, but hints were necessary. Toward the end, I was just being self-deprecating. I remember one question they asked about how many people I had recommended for hiring (not many about 3) and how many I had interviewed (lots). I made the comment that for my previous company we didn't get Stanford grads and I qualified it by saying that not even I could be a Stanford grad, I just wasn't that good. This interview plus the Google interview plus Microsoft interviews from years before had beaten me down. I was now completely convinced I was useless.

In the end, though, I came away with the impression that the people were very intelligent and would have been great to work with. They talked about the "architect" path that I could take so that I didn't have to become a "people manager" (which sounded good). They seemed to have a great company and it would have been loads of fun. However, I didn't expect an offer.

Surprisingly, they called not too long after the interview with an offer. By this time, I had an interview scheduled with Amazon.com and so I asked to make my decision after that point.

As you know, I didn't choose to go to Yahoo!, but that interview and offer gave me the confidence going into the Amazon.com and Microsoft interviews. Had I not gone to the Yahoo! interview I don't know how I would have done in the other interviews. I know I would not have been as confident and that could have prevented me from getting the position I have now. For that, I'm thankful to both Yahoo! and Nathan.

Thanks Nathan, I'll miss you, prophet!

Tuesday, January 27, 2009

Short-sightedness

A friend of mine sent me this blog post on why google web drive won't kill windows or anything else. To be honest, I'm surprised by the author's short-sightedness.

First, Scott mentions DropBox as a pre-existing replacement for GDrive. He then points out that Google plans to tie the GDrive in to Google Docs and that DropBox doesn't currently have that functionality. However, he doesn't see that as a game changer. What he doesn't comprehend is that Google has more than just Google Docs. Google has Gmail, Calendar, YouTube, Blogger, and an ever growing number of other sites. They also have an operating system (Android). So, you'll be able to turn on your netbook and have it sync your email, documents, favorite shows and blogs, etc... immediately from the cloud. Don't think that will happen? GMail is already offering an offline syncing mechanism through Gears through GMail Labs How much longer before they expand the syncing mechansim to work with other things like Google Docs, YouTube, etc... Google has consistently been able to deliver on big ideas and this one is one of the biggest.

Scott also mentions the trust issue. Who wants Google to store their most personal documents? I think this will become less of an issue over time. Already people are using services such as carbonite to back thier computer up online. How much different is it to trust an encrypted Google cloud? I think this issue will stay a hot topic for a few years, but in 4 or 5 years when everyone is using the cloud more and more it will become a non-issue except for the most sensitive documents. Google is already heavily advertising its security features.

Another issue is downtime. What happens when the cloud goes offline? Once again, I think this will be less of an issue over time as the cloud becomes more stable. Even now, how often does GMail go down? I think my internet provider goes down more often than GMail does. Moreover, I can't get much done without an internet connection anyway, so offline availability doesn't really help me out much. I think the more Google convinces people that the cloud technology is stable the more they will flock to it and use it. After all there are many benefits to cloud technology such as redundency, multi-computer availability, etc...

Scott is right in saying that there are currently alternatives in DropBox and Windows Live Sync, but only Windows Live Sync has the capability of rivaling GDrive. Microsoft has services in its hotmail mail service and its online office suite. They will have to continue to integrate those into the cloud to keep up with Google. Not only that, but if they could integrate their next XBox platform into the cloud so that you could store your games on the cloud (or just download them directly from the cloud) then that would be a big plus and something that Google can't currently rival. Having a home media system synced to Microsoft's cloud could promote using the Microsoft cloud for other things such as mail and documents.

I certainly believe in Microsoft's ability to beat back the Google threat, but I'm not narrow minded enough to think that GDrive is not a threat. It is the backbone of the internet operating system that Google is building to take on Microsoft.

Monday, January 26, 2009

The Google Threat

Disclaimer: I work for Microsoft on their Live Search product. In no way is what I'm blogging about representative of the actual views of Microsoft. I'm much too low level to have any input or insight into Microsoft's thought process.

I see a lot of people on various message boards comment on how Microsoft should stop funding Live Search and Online Services. Many people say that Microsoft should just focus on what they do best (developer tools, operating systems, and office) and ignore this whole "Internet Thing (TM)". Microsoft loses millions of dollars a year competing with the Google Behemouth with very little to show for it in terms of market share. Why not just cut your losses, spin off the division, and let it die a horrible death?

I'll tell you why. Google is Microsoft's biggest threat. Google threatens Microsoft's operating system dominance, their developer tools dominance, and their office dominance. If this were the Civil War, Google would be the North. They are the ones competing on Microsoft's home turf. They are trying to burn Atlanta, and some would say they are succeeding. As a Microsoft employee, I am not at all scared of Apple. They will continue to tinker with their iPhone and Macs and a few hardcore Apple fans will always be there to keep them going. Don't get me wrong, I think the iPhone is a huge improvement. I own one. I like it (but don't love it). However, Apple isn't making inroads into Microsoft's core. Objective C is not the Visual Studio killer and OS X won't kill Windows. Apple wants to produce the "perfect system". And that's fine. They'll end up creating something very beautiful that will be mimiced by others, but it will take them years to do so and they'll do so single mindedly. They won't be diverse enough to kill the Microsoft behemouth. Google, on the other hand, is thinking big. They have a plan that goes after all of Microsoft's core competencies and they are taking the fight to MS.

Let's look at a few examples:

1. Office vs Google Apps for Business - Take a look at a number of business that are evaluating or have switched to Google Apps for Business. Now a few things have to be said. Many of these companies are using Google Aps in addition to MS Office. They are using Office for thier internal communications and confidential emails and Google Aps for less sensitive material. For now, that is the best they can do. In addition, many of the companies are switching from other systems such as sendmail, so that's not a direct gain against MS. Nevertheless, how long is it before Google provides encrypted email and a guarantee of privacy? How long before they win a big MS client and other companies start looking at the cheaper Gmail system with reduced administrative costs? This is an obvious attack and one that is gaining momentum.

2. Visual Studio vs Google Web Toolkit - This is a bit of a misclassification. Really, Google Web Toolkit is more of an attack on Microsoft's Azure Platform than on Visual Studio itself. However, Visual Studio is included in the assault. With Eclipse, Google Web Toolkit, and many libraries such as jUnit, Guice, etc... Google has teamed up with the Open Source community to take on MS. Google wants developers to develop on its "platform" just like Microsfot wants developers to develop on its "platform". That is why Microsoft developed Silverlight. Silverlight allows developers to take their .NET familiarity and transfer it to the browser. This keeps people in the Microsoft environment. Google takes the same approach. They want people to use its services, so they provide the Google Web Toolkit to keep people in their environment. It's also a good testing ground for larger conquests, providing computing power and frameworks for the enterprise. Once developers become comfortable with the Google environment, it will be an easy transition to open up their cloud to businesses and allow them to develop and deploy on it.

3. Windows vs Android - Shouldn't this be Windows Mobile vs Android? No, definitley not. Android is a direct attack on Windows and the desktop. Already many people are speculating that Android will be released on netbooks soon.

See Android notebook coming early next year?, Android netbooks? Wouldn't it be lovely, and Android netbook is a possibility. Google will use their operating system to keep your information and applications in the cloud and you will be able to access them from any computer, especially your Android netbook. You'll log in and immediately see your desktop with your applications that are stored on the cloud. When you click on an application, it will download and begin running immediately (though many applications will still work through the web, like GMail, blogger, etc...). Netbooks are already taking a chunk out of Microsoft's sales and having Google's name on it will only increase sales. Also see this article on how Netbooks sales are killing Microsoft.

I haven't tried to paint bleak picture on purpose. Instead, I was trying to show that Google is taking the battle to Microsoft and Microsoft must respond. Search, in particular Live Search is a key component to that. But, it is not the only component. Windows 7 and Azure are other key components. In the future, people's computers will live on the cloud. Search will go beyond finding web pages. Instead, you'll perform searches for applications that will fit with your current settings. You might even rebuild your OS components in the cloud specifically to fit your library versions. You'll search for a song and not only get an mp3, but also a list of movies that you own that have that song in them. You might, if your search options are set correctly, even get a list of similar songs, or songs you played before or after that song. Search is the backbone of the next generation computer and the next generation HCI. Google, I believe, backed into this and is now expanding it to its inevitable conclusion. Microsoft realized it after the fact and is rushing to catch up. Regardless, we need competition and I believe that Microsoft has the resources and the dedication to provide that competition.

Needless to say, the things I outlined above are not going to happen overnight, but Google is taking a long term view. So is Microsoft. Google is using their advertising revenue to subsidize their desktop pursuits. Microsoft is using their desktop revenue to subsidize thier advertising pursuits. Both are in it for the long haul. It will take years for Google to implement some of the things I discussed above. I expect 8-10 years will pass before all of our data and programs live in the cloud. I expect another 10 years will pass before most businesses data and programs live in the cloud. Both companies will be here after that time has passed, which company will be the dominant one? I have no idea, but it will be exciting to find out!

Saturday, January 10, 2009

Browser Toolbars

Recently, Microsoft announced a deal with Dell to distribute the MSN toolbar with new computer purchases. This comes after previous announcements with Sun and Lenovo. While I am excited about the traffic this will bring to Live Search, I have to say that I hate browser toolbars. Seriously, what good are they? Most browsers already have the functinality that the toolbar provides: search box, term highlighting, etc... The MSN Toolbar does provide a few interesting twists like automatically launching Live Messenger, but on the whole, who needs them? They just use up space and memory. My wife's computer has both the Yahoo! toolbar and the Google toolbar on it. I have to go take them both off now...

You know, what I'd really like is a browser that prohibits toolbars, that I could use. And while I'm on that topic, over Christmas I installed extra RAM in my father-in-laws computer because it was going slow (it only had 256 MB of RAM). I also noticed that he hadn't upgraded from IE 6 and had some Yahoo! toolbar that provided virtual tabs of some sort. Why hadn't the browser updated itself to IE7? Seriously, why doesn't our software just fix itself? Why don't my drivers automatically update? ARRRGGG!!!!

Ok, I feel better.

Friday, January 09, 2009

New Layout

As you can tell, I picked a new layout from blogger. Let me know what you think.

Thursday, January 08, 2009

Rules for running an IT organization

The last post was how to run a research organization. In this post, I'm going to set my sights even higher and tell how to run an IT organization. These rules have been picked through my studies of various companies.

1. Hire the best people - I've harped on this before. So have others and others and still others. If you don't know by now that there is at least a 10x difference between the top programmers and everyone else then you haven't been paying attention. I can't say much on this topic that hasn't already been written, but I will point out that you need to be proactive about hiring the right people. It's not just the free food. It's about providing employees with incentives to do great work. If you just need a warm body, then provide him with a steady paycheck. But if you work in IT, you don't want warm bodies, they are not in the 10x group. You want the best, so you have to make it worth it. You have to make them feel important and wanted. You have to put effort into finding them at the top colleges, recruting them, and retaining them. If you are a small company, you can provide stock options. A large company? You can provide free food :-) Either way, you have to provide challenging problems, a sense of ownership, and other fantastic people to work with. Remember, if you have ten of the 10x group that is equivalent to one-hundred of everyone else. If you don't think that is true because of parallelism, etc... you are wrong. Parallelism doesn't apply because the communication pathways increase leading to lower overall productivity. Just like you don't get a linear increase with most parallel algorithms, the same applies for programmers. The communication overhead gets in the way. Therefore, with ten 10x programmers you can be more productive than with one-hundred other programmers. And you don't even have to pay them 10x as much, though you will have to pay some extra. With the savings, you can provide more stock options or free food :-)

2. Make the source available to all - Everyone in the company should have access to all the source code. At Google, the designers of the library can easily update it, because they can open everyone else's code up in eclipse, choose refactor, and then save and recommit it. They are not blocked. In addition, an open code base implies group accountability. Too often, programmers get too attached to their work. They don't see the flaws that are inevitable in their programs. An open code base allows everyone to view and comment on everyone else's work. It forces programmers to realize that they are mortal and make mistakes. Eventually, they will come to know that the group is better than the individual and their work improves due to peer review. Forcing an open code base allows this to happen sooner and with more acceptance.

3. Dedication to infrastructure - Who is responsible for the build system? Who writes the core infrastructure components? Who sets up the distributed computing system? Who determines which dependency injection framework should be used? All of these are core infrastructure decisions and they all matter. As we'll see in a future bullet point, standardizing the infrastructure is vital. You want your application programmers to focus on their individual application. If every application programmer has to focus on their infrastructure, then you are losing 3 to 6 months out of every project. If it takes a week to set up an automated build system and a week to attach the test harness and another week to set up the distributed key/value store, then you have lost 3 weeks of your project to things that have to occur for nearly every project! Instead, you want application programmers to think about their application. You want them to worry about delighting their customers. They need an automated build system, but they shouldn't have to think about which one to use, how to set it up, or what happens when it isn't working. That is infrastructure. In the past, there were extra teams responsible for hardware, networking, and OS setup. These were not the responsibility of programmers. Now, we're adding additional layers. The build system, test framework, and distributed computing platform are additional infrastructure components that must be standardized and managed elsewhere. There could be additional infrastructure components for your organization as well. For instance, Google uses dependency injection so often that they wrote their own framework for it. Notice that I didn't say every team wrote their own framework. Instead, one team wrote it and maintains it and everyone else uses it. This consistency and willingness to use other people's work makes for a successful company. As another example, Google has a Java collections library that the company uses. Every team can take advantage of this without having to rewrite it. In other words, the goal of the infrastructure group is to find and eliminate duplication throughout the company. This could be with hardware, applications, or libraries. Regardless, duplication is the enemy of the IT organization and it must be eliminated!

4. Repeatability - Everything that is put into production must be a repeatable process. Not only that, but it must be repeatable by someone else! This means both applications and documentation must be written and available that show how to repeat it. Once again, we're trying to seek out and eliminate duplication. Programmers in the future shouldn't have to reverse engineer or re-create your application. If it is worth putting into production, then it is worth documenting and ensuring repeatability.

5. Enforcing standards - You want programmers to feel empowered, but you also want productivity. You need standards to ensure the latter, but you need only the right standards to ensure the former. Standardizing on one build system (perhaps per language) ensures that everyone can access and build everyone else's code. Standardizing on a common base class ensures pandamonium. You want to standardize tools, not techniques. Not only does this allow programmers to quickly move from one project to the next, but it also provides continual feedback and improvement on your internal tools. When a programmer says, "Tool X doesn't provide capability Y so I'll write my own" you are in for disaster. Now, every project begins with a 3 week tool writing cycle. Instead, if the programmer would just fix Tool X then everyone else can take advantage of capability Y. Libraries work the same way. Pick a dependency injection library and use it across the organization. It doesn't matter if one team happens to like Spring over Guice. They are both open source and you can alter it to suit your needs. Just pick one and be consistent. Get everyone moving in the same direction. That way, when they increase others' velocity they move in that direction even faster.

6. Don't start from scratch - If nothing exists, then you have to start from scratch. However, if you have a working product, then don't start rewriting it from scratch. The only thing you do is create new bugs that you don't know about instead of fixing the old ones you already knew about. Now, that doesn't mean that you shouldn't take a troublesome component and rewrite it. That doesn't even mean that over the course of a year you don't eventually rewrite 90% of the application. It does mean that you work piecemeal, with legacy code. You create tests, if you don't have them. You write documents, if you don't have them. You continually improve the product you have. Then, when you are ready, you begin refactoring. You improve its structure a little bit at a time. Just enough to add your new functionality with new tests. Then a little more and a little more. Eventually, your crufty working system blossoms into a beautiful piece of artwork. Well, you'll think so anyway, but the next developer won't understand why you did X, Y, and Z and he'll want to rewrite it from scratch. That's the problem and the point. When we don't understand how something works, we want to rewrite it so that we do. However, the next maintainner doesn't understand it either, so the rewrite circle continues and is vicious. If something works, use it, clean it, modify it. Don't rewrite it.

7. Be dedicated to testing - The best organizations have dedicated testers. This is not a coincidence. Recognizing quality as a core attribute of a product is vital to having a quality product. Manufacturing companies have known this forever. Why is it that IT shops think they can ignore quality and still have a quality product? Google's Chrome has unit tests for highly ranked web pages, automated UI testing, and random input testing. Not only that, but they also ran other test suites against it. For instance, it passes 99% of webkit's layout tests and passes all but 2 of jQuery's unit tests. Quality is a core attribute. At Microsoft, there is a Software Development in Test job role. Each team is assigned one or more of these resources to ensure they deliver a quality product. These people are top notch programmers that love to break things. They are not random guys off the street. They can code just as well as the SDE's and it is vital that they can do so. Testing today is about coding. It is about double entry bookkeeping. It is about automation and repeatability. People expect thier products to work, out of the box. They expect future releases to be backwards compatible. They expect a quality product. To delivery that, quality must be a feature. The organization must be dedicated to quality - having a special testing division is one way to commit to that level of dedication.

8. Metrics - "In God we trust, all others bring data." Metrics are at the core of process improvement. How can you possibly know if you are improving if you don't measure that improvement? Would you be satisfied with your water treatment plant if it told you the water quality was improving because it looked a bit clearer? No. You'd want to know the Ph levels, the amount of sediment in the water, etc... You expect a quality product to have metrics that back up its quality. If speed is a feature, then you'd expect metrics around how fast the product will go in certain conditions. If accuracy is a feature, then you'd expect benchmarks showing the accuracy against other products or human judges. In all cases, metrics are vital to a product in order to show improvement and achievement of goals. The one caveat is to be sure you are measuring the right thing. It is inevitable that programmers will find a way to improve the metrics. If the metrics are measuring the right thing, then that is great. If the metrics are measuring the wrong thing, say lines of code, then you have a recipe for disaster. Make sure that your metrics are measuring externally visible things, not internal ones. You want improvements to your metrics to affect the customer, not your programmer's salary. Everyone cheats the system, you want those cheats to have a positive impact on your final product.

Well, there you have it. 8 rules for running an IT organization. Hopefully they will make your company the next IT powerhouse!

Tuesday, January 06, 2009

Applied Research

At one point in my career, I had the experience of being on an applied research team. In fact, I was one of its founding members. At the time, I wasn't sure what applied research was; even now, I can't say for sure I know.

For our team, applied research was new product development. Honestly, I think that had disastrous effects. We were in limbo between delivering a new product and maintaining our "research" mantra. In the end, we wanted product teams to adopt and maintain our product. This, too, led to tension. Product teams want to feel comfortable with what they will eventually own. They want to use and support things they have used and supported in the past. Research products are wild-cards and not to be used unless nothing else like it exists. Even then, most product teams will choose to rewrite it. Take, for instance, Pig. Pig is a dataflow language created by Yahoo! Research to run on Hadoop. It was created to prove a point, but it was also created to be used. However, once it transfered to a product team in Yahoo!, it was scheduled for rewrite. I'm not sure the cause. It could be Not Invented Here syndrome. It could be that research teams are not focused on the long term, so they deliver a short term product. It could be that research teams don't know how to code or that product teams and research teams speak a different language, so they can't see eye to eye. Regardless, having a research team deliver to a product team is doomed to failure, if you count failure as an eventual rewrite of the product or technology.

To help guide my thoughts on this matter, I had the good fortune to speak to Hector Garcia-Molina of Stanford University. If you don't know who this guy is, you are missing out. Stop and go read the wikipedia entry linked to above. He recounted to me his description of what a research organization should do. I'll recount here and embelish a little.

First, a research organization should publish. The benefit of publishing is that it brings notariety/publicity to your organization. The company's sales force can go to battle with slides that reference conference proceedings. Of course, no sales person would read the proceedings, but it does make their presentation look legit. In addition, presenting at conferences allows the researches to make the aquantence of people like Dr. Garcia-Molina and bring new insight and innovations back. Finally, publishing creates an attractive employment environment. New graduates from top schools want to go to a place that has a publishing history so that they can continue their research. To get a graduate of MIT or Stanford, you need a publication record. I'll add one more to his list. I think a research department that publishes shows a commitment to innovation by the company; a commitment that roots itself in the culture and makes the company a hotbed of innovation.

Second, a research organization can be used as a SWAT force. Tackling a hard problem or subset of the problem. I think this can be the area where the most "good" can be done internally. Instead of creating a product, create an extension to an existing product. There are always "next version" features that never get created. These features either provide too little value or are too difficult. It is that latter segment where the research department can really shine. Since they are not constrained by time to market, etc..., the research department can think outside the box and take the time to create an "academic" solution. For example, PageRank was an academic solution to the problem of which web sites were more popular. It is a beautiful recursive algorithm that just happens to produce great results. Is it a perfect algorithm? No, of course not. Was it better than the "engineering" algorithms of the time? You betcha! It was what happened when two academics got together and had the time to think about the problem. They realized the probelm was similar to that of research paper citations, so they devised an algorithm that treated the problem thusly. Had they had to meet an arbitrary deadline so that their employer could make the trade show deadline, they would not have come up with PageRank. That is the benefit of a research team. Not to create some fancy product for the trade show, but to take the existing product the next step. To finish out the "next version" features and do so elegantly.

The final task for a research organization is to benefit the company at large. There are many ways to do this. One is to create best practices and spread them out to the other development teams. Another is to investigate various technologies and report on how they could/should be used throughout the company. An additional way could be to create the infrastructure to ensure developer productivity if no other team is responsible for it. Do developers have access to distributed key/value stores? If not, install MemcacheDB or CouchDB, and help developers to connect to it by creating modules in various languages. Do developers understand and have access to technologies like Hadoop and Pig? If not, create demos and give a roadshow. All of these things lay in the "best practices" umbrella and often organizations don't have the teams in place to create and distribute the "know-how" required to use them.

In all three cases, the applied research team stayed away from the product team's core product. Instead, the research team focuses on small, manageable pieces that fit nicely into the strategy already outlined by the product team. This creates trust and will foster future collaborations between the product and research teams.

Good luck and let me know of your experiences dealing with research teams!

Saturday, January 03, 2009

Thursday, January 01, 2009

Facebook lament

I have too many facebook friends. I lament it. I only have about 5 that I actually want to monitor and communicate with; however, when people request friendship I don't want to say 'no, I care nothing for you.' I really want a "preferred facebook" view where I can just track and interact with those I care to. One would think that it is not that big of a deal, but it really is. For instance, I just went to superpoke a friend of mine and it took far too long because I had to find that person in my list of friends (and I don't have *that* many). It actually reduces the amount of time I spend on facebook.

Less is more.

Infinities and Series

Those of you who read this blog (all 1 of you!) know that the number of integers is infinite and the number of irrational numbers is infinite, but the latter infinity is greater than the former infinity. I tend to think of this as mathematical shenanigans. The idea that one infinity is larger than another just doesn't meet the "beauty" test that accompanies mathematics. In the past, I've argued that infinities don't exist, so we can make contradicting remarks about them to our hearts content. It's similar to asking the question "Can God make a rock so big even he can't lift it?" The words make sense and are in the right order, but the semantics of the sentence are off. They produce a barber paradox that belongs to the realm of meaninglessness. They show that Goedel is alive and well and relevant for today's meta-mathematical problems, if only we'd heed his words. Ok, back to infinities.

First, can we create a more intuitive reason for why one infinity might be greater than the other, a reason that doesn't fall back on establishing a one-to-one correspondence with the integers?

First, let's establish that the integers exist in one-dimensional space. In fact, they exist on a number line which is the definition of one-dimensional space :-) But, let's consider a slightly different definition of dimensionality. In this definition, we'll look at the number of infinite dimensions. In the case of integers, the length of the integer is finite. No matter how big the integer becomes, there is always a finite number of digits. Even if the integer explodes to a googolplex digits, there is still a finite number of them. Therefore, the only infinity is in how many integers there are, the size of the actual integer is finite. So, there is only one dimension of infinity.

Let's now look at rational numbers. In the case of rational numbers you might say that there are two dimensions of infinities. The first dimension is the number of rational numbers. There are an infinite number of rational numbers. The second dimension is that some rational numbers have an infinite decimal expansion. For example, 1/3 has an infinite expansion of 0.333333333... Therefore, rational numbers have two dimensions of infinity and should be larger than integral numbers, right? Not so fast. That's just their decimal expansion. If you keep them in their functional form, then we get a different story. All rational numbers can be expressed as the division of two integers. Moreover, we know that both integers have a finite number of digits. Finally, we know that adding two integers with a finite number of digits will produce a third integer with a finite number of digits. Therefore, there is a representation that is finite in length for every rational number. That leads to the logical conclusion that all rational numbers have one dimension of infinity.

Next up, irrational numbers. Irrational numbers extend to positive and negative infinity, giving one dimension of infinity. In addition, they have an infinite expansion in every base, which gives them a second dimension of infinity. So, we can easily see why there are more irrational numbers than rational numbers, because we allow irrational numbers to have an infinite expansion!

So, if we can't represent irrational numbers with a fixed number of rational numbers, what can we represent them with? Why, an infinite number of rational numbers, of course! For example, PI can be represented by the following series (one of many): PI = 4 * (SUM[k=0 to inf] (-1^k)/(2k+1))

So, in essence, we have an infinite set of numbers each composed of an infinite set of numbers. Two dimensions of infinity!

But wait! If we can create a series to represent each irrational number, does that mean that there exists a representation that is not infinite and therefore we can count them, similar to the rational numbers above? No. With rational numbers there was a finite number of integers that created the rational number. With irrational numbers there is an infinite number of rational numbers.

So, it is true that the number of irrational numbers exceeds the number of rational numbers, right? Well, maybe. I think it is fair to say that an infinitely expanded irrational number doesn't exist, so we're back to the land of Goedel. It is worth thinking of an irrational number as a function of an infinite number of rational numbers much the same as a rational number is a function (division) of two integers. Functions are often useful in mathematical manipulations, but that doesn't make them "real". The number 3 is a physical, real number. You can count out 3 things. The number PI is not a physical, real number. We can only approximate it. In fact, complex analysis is based around the number i, which is a function (sqrt) applied to -1. As long as we don't expand the function, we can do mathematics with it, but if we ever need to expand it our equations blow up. The same thing is true with irrational numbers, they are mathematical niceties. Abstractions that we can manipulate as long as we don't look too closely. Similarly, questions about abstractions such as if one abstraction is infinitely larger than another abstraction requires you to look to closely, so you get crazy results, just like if you had really taken the sqrt of -1 or divided by infinity (another function).

So, the next time you see PI, take it for what it is, a function that can be evaluated to the necessary precision. A mathematical abstraction that can be admired from afar. Just don't get too close.