Sunday, November 23, 2008

C# Awesomeness

As you can imagine, my current employer gives me a number of opportunities to work in C#. I've really been enjoying myself, but only scratching the surface of what C# was capable of. So, tonight, I tried a few more interesting ideas that use the C# anonymous types and lambda function capabilities.

The first one is a pretty simple copy of ruby's IO.each_line function. My goal is to have the C# function encapsulate the logic of opening and closing the file and me pass in a lambda that does the processing I want (just like Ruby's).

Here's the code

1 public delegate void IOProcessor<T>(T line);
2 public static void ForEachLine(string fileName, IOProcessor<string> processor)
3 {
4 using (StreamReader reader = new StreamReader(fileName, Encoding.UTF8))
5 {
6 while (!reader.EndOfStream)
7 {
8 processor(reader.ReadLine());
9 }
10 }
11 }

First, I declare a delegate. In C or C++, it would be a function pointer. Here, it is more than that because it can be generic, which is utter coolness in a box. We'll see the point of the genericity in a moment.

The next line declares my function which takes the file name and the delegate (we will be giving it a lambda.) Here, we explicitly set the generic parameter to a string because we know that is what we are retrieving from the file.

The using statement opens the file, specifying a UTF-8 encoding. It will also ensure the file is closed whether or not an exception occurs.

Finally, we have a loop that reads each line and passes it to the delegate until we run out of lines. Very simple and straightforward.

If we wanted to use this code to sum up the lines in a file we could write code like the following.

1 int sum = 0;
3 ForEachLine("my_integers.txt", (line) => { sum += int.Parse(line); });

After line 3 finishes, sum contains the sum of each line in the file.
For instance, if the file was
Then sum would be 100.

Another thing I do a lot of is handle delimited files. Usually, I'll open one up, perform a line by line manipulation of the file, and close it back. Of course, the obvious string.Split() method comes to mind, but that is somewhat unsatisfactory because you are still dealing with integer offsets into an array. How do I know that offset 3 is the anchor text and offset 7 is the ubercool feature? It would be nice, if I could specify their names in the beginning and then work with objects.

So, to handle that, I created a function that would objectify my delimited files.

Here goes:

1 public static IEnumerable<T> EachDelimitedLine<T>(
2 string fileName, char delimiter, T outputObject)
3 {
4 Type outputType = outputObject.GetType();
5 using (StreamReader reader = new StreamReader(fileName, Encoding.UTF8))
6 {
7 while (!reader.EndOfStream)
8 {
9 string[] fields = reader.ReadLine().Split(delimiter);
10 yield return (T) Activator.CreateInstance(outputType, fields);
11 }
12 }
13 }

I'll break this function down line by line in the next blog. Specifically, it takes in a prototype object called outputObject and creates an object like that from each line of the delimited file, using the strings after splitting to initialize the object. I call this function like

1 foreach(var x in
2 EachDelimitedLine("delimitedFile.txt", ',',
3 new { Title = "", Body = "", Anchor = "" })
4 {
5 if (x.Title == "Thoughts of Me")
6 {
7 // do something with my blog...
8 }
9 }

As you can see, I specify a prototype object of an anonymous type and my code uses that prototype to create objects from the delimited file. Of course, in production code you would want checks for too few fields or too many fields so that you don't get the weird error messages you would get from this code by default. However, I think it gives you a flavor of the power of C#.

Now, for the final piece, we combine the ideas of the ForEachLine function and the EachDelimitedLine function to get a ForEachDelimitedLine Function. This also shows you the power of the generic delegate.

1 public static void ForEachDelimitedLine<T>(
2 string fileName, char delimter, T outputObject, IOProcessor processor)
3 {
4 foreach (var x in EachDelimitedLine(fileName, delimter, outputObject))
5 {
6 processor(x);
7 }
8 }

In this case, the IOProcessor now accepts a generic argument T which is the same type as the prototype object I pass in (called outputObject).

Let's look at the above example code redone to use the ForEachDelimitedLine function

1 ForEachDelimitedLine(
2 "delimitedFile.txt", ',', new { Title = "", Body = "", Anchor = "" }, (x) =>
3 {
4 if (x.Title == "Thoughts of Me")
5 {
6 // do something with my blog...
7 }
8 }

There, no more foreach statement. Now, the logic is part of the function itself.

Next, I plan to extend my C# emacs mode to generate a class around my static functions and compile and run for me so I can have a C#-script of sorts. Don't get me wrong, I still love and use perl on a daily basis, but C# is often much faster than perl, surprisingly enough, and seems to be able to handle memory more efficiently as well. However, I do admit that is anecdotal and not based on any tests I have performed. On a more pragmatic side, it is easier to integrate with exsiting C# libraries from C# rather than writing a SWIG interface.

Hope you enjoyed!


Anonymous said...

this is nice - thanks for posting

John & Jennifer Savage said...

What in the world kind of C# are you using? What is
(line) => { sum += int.Parse(line); } and
new { Title = "", Body = "", Anchor = "" } ? Never have I seen such syntax...

Tanton said...

Hi John and Jennifer,

The (line) => {...} is a lambda. You can read more about them here: Lambda Expressions C# Programmers Guide or bing them: C# lambdas. The other construct is an anonymous type. You can read about them here: Anonymous Types C# Programmers Guide or bing them: C# Anonymous types.