Thursday, December 28, 2006

Similar Characters

I try to keep track of people who are similar to me: past, present, and fiction. I have three, currently, that I believe fit the bill.

1) House - I've blogged on this before, but Greg House from the TV show House is very similar to what I am like. I think he has let his pride control him more than I have and that accounts for a few differences. But, on the whole, we are very similar.
2) Isaac Newton - We have similar interests, religious beliefs, and personalities. He, of course, was far more brilliant that I ever could be, but we are very similar in our pursuits.
3) Brad Paisley - Yeah, the country singer. It's uncanny how many ideas in common we've had. In fact, before they released his "Two people fell in love song", I commented to my wife that everything that has ever been was because two people had sex. Mine was more crass, but that's beside the point. When my wife heard it on the radio, she immediately came to find me and show me my idea had become a song. It was funny. I still say he has spies that watch me...I need to go now...I have to elude them!

Tuesday, November 21, 2006


I watched a history channel special the other night on UFOs. It had some interesting interviews on it. One thing that was eye opening was the replay of a nasa astronaut saying over the radio that he was looking at the alien spaceship out of his window. He said it very matter-of-factly and very clearly. He was neither surprised nor concerned. I have no problem believing in aliens. However, the Roswell stuff is a bit far fetched. How is it that the aliens can fly billions of miles across the universe and then crash in New Mexico? Doesn't make a whole lot of sense to me.

However, I propose the following. I think the next candidate for president of the United States should run on a platform of revealing all government knowledge of UFOs. I'd vote for that person, I think it'd be cool to know. What about you?

Monday, November 20, 2006

Unfair Infinities

We all know that the cardinality of the set of all integers is countably infinite. To me, it seems unfair that the cardinality of the set of real numbers is uncoutably infinite. It also seems odd that it is unintuitive to most (including me). However, I now understand why it was unintuitive to me. The key point to remember is that mathematicians don't allow integers to have an infinite number of digits. Instead, you can increase the integer by 1, but at any given time it has a fixed number of digits. However, with real numbers, there can, and often does, exist an infinity of digits. For instance, 1/3 is 0.333333.... So, not only do the real numbers extend countably to infinity, they also can have an infinite number of digits. In essence, they are infinite in 2 dimensions, whereas integers are only infinite in 1 dimension. I would imagine that complex numbers will one day be proven to be infinite in 3 dimensions, but who knows.

To be honest, I think it is a bit silly. We're not actually talking about different orders of infinities here, we're only talking about the existence of a function M that will map from one infinite set to another. If we can find that function, then we say the sets have an equivalent cardinality. Otherwise, the cardinality of one set is said to be greater than the cardinality of the other. If we could extract M from mathematics, and instead use an algorithm, then M could simply be, pick a real number, get next biggest integer, rinse, repeat. However, algorithms are not yet mathematical (unless you're Stephen Wolfram).

Tuesday, October 10, 2006

Readable Code

I think, to structure your code readably, you have to partition your code into different levels. Perhaps the different levels are written in different languages, perhaps not. The top level is the business level. At this level, you should be speaking a mostly declarative language and should be side effect free. Your code should be straightline. The second level is the logical level. At this level, you can have conditions and iterations and anything in the first level. The third level allows anything in the second level plus side effects. Finally, the fourth level allows comparative and arithmetic operations plus anything in the third level. The interesting thing to note, I believe, is that side effects are more readable than comparative and arithmetic code. Of course, it could be that comparative and arithmetic code is less useful than side effect code, so it is pushed to a lower level.

Let's look at an example:

extern int debug;

void foo()
int i;
for(i = 0; i < 100; ++i)
printf("%d\n", i);

This is a level 4 function. It has side effects, comparisons, etc... Let's try to turn it into a full 4-tiered function (yes, you wouldn't do that to something like this, but just for example.)

// level 1 - straightline
void foo()

//level 2 - conditions
void print_numbers_to(int count)
if (in_debug())
loop_to(count, print_integer);

// level 3 - side effects
void print_integer(int i)
printf("%d\n", i);

// level 4 - comparisons
int in_debug()
return debug == 1;

// level 4 - side effects
void loop_to(int count, void (*func)(int))
int i;
for(i = 0; i < count; ++i)

See how much better that is?

Tuesday, September 12, 2006


I put some responses in with comments. However, I would like to say that I was imprecise in a previous's a habit of mine. My statement should not have been that deficit spending causes war. Cos is entirely correct that war causes deficit spending, not the other way around. You can check the comments for the full scoop, but I was attempting to say that the ability to deficit spend leads to war. If you can't fund the war, you won't have the war (or you won't be successful at it). If you have easy access to funds, then you can easily create a war machine. If not, it is much harder. It is not deficit spending, but the ability to deficit spend. Check the comments ;-)

Response 3

Now, I'd like to respond to a few things I agree with from Cos's reply

1. I really like the idea of the states paying the Senator's salaries. This further reduces the federal stranglehold on our country. That is just pure brilliance!

2. My hope is that people who have a concern for our country would want to be electors. However, now that you mention it, I wouldn't be surprised if party-liners didn't run for electors just to be paid back at a later date by the party nominating them for an office position. I really don't want to enable a lot of restrictions, because that doesn't lead to a naturally balancing system. For instance, we could say that electors can never be publicly elected, but that is just silly in the long run. My hope is that the more educated, politically motivated people would become electors...however that is naive.

Nevertheless, what I realistically expect to happen is that entitlement spending drops dramatically, and socialistic programs begin to decline. When the voters don't elect you based on the entitlements you give them, you no longer give entitlements. Obviously, an elector could run on a platform of entitlements, but it doesn't really help him. He's not getting a huge salary and a plush office for life, so why does he care. His sole purpose is to do what is right for the country, because it is right, not because he's getting compensated. It could be a naive position, and I know it will eventually be corrupted, but I'm hoping that won't be for a number of years, when the system is stable enough to support a number of years of corruption before being reset again.

Response 2

I'd like to respond to a portion of a very well thought out comment from Cosbert Callis:

3) Frankly neither suggestion of the barter system or the gold standard, nor the idea of eliminating the Fed represents an understanding of economics 101.
(as a fiscal conservative, with a BA in Political Science and a Minor in Economics, you would have failed any econ class I was in with these ideas..)

Foreign trade (including deficit spending, which is a form of foreign trade) represents one of the most important natural forces for peace in the world. People just do not make WAR with trading partners. I don't where your original thought inevitably leads to war and inflation. comes from, there is NO, ZERO, NADA in the way of empirical evidence to suggest there is a single iota of truth in that statement.

First, deficit spending is not always a form of foreign trade. Deficits can come in many forms. First, you can sell bonds to your own country. Second, you can have the Fed print more money, thus causing inflation, but giving you the money you requested. Third, you can sell bonds to other countries. I would only classify the last as deficit spending.

Second, deficit spending MOST DEFINITELY causes war. You would have failed any history class I took with your ideas. War's have to be funded in some way. Without funding, you can't fight. There are many ways to get funding, but the easiest is through deficit spending. Do you think the war in Iraq would have happened if we would have had to raise taxes to pay for it? People understand taxes: they understand the effect that taxes have on their pocketbooks. With deficit spending, they don't understand the correlations and are less likely to complain. Therefore, deficit spending is VITAL in starting and maintaining a war machine. Without easily obtainable money, there are far fewer wars.

Third, we didn't always have a national bank. The current Federal Reserve system was established in 1913. Notice that it DIDN'T provide economic stability, as was its mandate. We still went through booms and busts (including the great depression). Before 1913, we had a myriad of different banking styles, including no central bank from 1837 to 1862. Do I think we need banking regulations? Sure. Do we need a central bank? No.

Fourth, a return to the gold standard is independent of the federal reserve system. The Gold standard was not revoked until 1971, almost 60 years after the establishment of the federal reserve. The goal of the gold standard is to reduce inflation. I would say that it did it's job pretty well considering the amount of inflation that has happened between 1971 and today.

Finally, trading partners most definitely go to war. The US traded with Iraq (oil and weaponry). Iraq traded with Kuwait. The US invested in Germany before WWII. The examples go on and on. In a global economy, there is more than one way to get at any resource, so it becomes much easier to bite the hand that feeds you. But, it becomes much harder to bite if you have no teeth. By eliminating easy access to money (via deficit spending through the Fed), we can take out a few rows of teeth.

Friday, September 08, 2006


I love this quote from Ron Jeffries. I found it on a thread in his Agile Forum.

A fanatic is anyone who believes differently from us, and just as strongly.

Thursday, September 07, 2006


Mark made a very intelligent comment on my previous post and I'd like to respond to a few of his points.

1. Repeal the 17th amendment:
This sounds good on paper, but what guarantees that the state congress isn't going to simply choose their Senator soley on party lines for just the possibility of future political favors? Even the lowest of state politicians harbors hopes of being president.

I want the state congress to choose their Senator for future political favors. The point is that the people do what is in their best interest, but the state politicians will do what is in their best interest. What is in their best interest is greater state powers. The people's interest is different. If a Senator lessens the state powers, then the state will not vote him back in. Therefore, the Senator must pander to the state in order to be re-elected. Why is that important? Because state's don't care about gay rights, flag burning, and national health care. State's care about schools and roads and infrastructure. All of the stupidity that is pure popular pandering goes away.

2. I like the idea of taking away the presidential election from the idiot masses who decide solely on superficial traits of candidates ("He's a normal guy who I think I could have a beer with." Who the hell wants to have an average person in the most powerful position in the world??). Anyway, I digress. The problem with this is that you're assuming that the electoral college will be composed of people who are more intelligent than the masses as well as unaffiliated with any political party. That's a tough thing to find I suspect.

I do believe that the electorial college will be slightly more educated than the typical voter, but that's not the main benefit. The main benefit is that the electorial college has no constituents. They don't get paid, they shouldn't get bribes (having fewer people will allow more scrutiny on their financial affairs), and they have no real reason to want to get re-elected. Therefore, they won't pick a president because he promises more welfare or free healthcare. People pick a president because he promises entitlements or "No new taxes" or whatever. They vote in what they think is their best interest (unfortunately, they don't understand the consequences). The electorial college could safely ignore those issues and pick a president based on what is best for the country. Since they don't care about re-election, they are free to make more intelligent choices.

3. Let's go back to the barter system. Most politicians really have nothing to offer other than words, so that should sufficiently weaken the ever-growing hegemony those in power represent. :-)

I believe a return to the gold standard is very similar to going back to a barter system. By preventing the devaluation of the dollar because of deficit spending, we can force politicians to tax us explicitly instead of through inflation.

Waiting on the World to Change

I really like John Mayer. Especially his new song. However, I have to disagree with his approach. First, the song:

me and all my friends
we're all misunderstood
they say we stand for nothing and
there's no way we ever could
now we see everything that's going wrong
with the world and those who lead it
we just feel like we don't have the means
to rise above and beat it

so we keep waiting
waiting on the world to change
we keep on waiting
waiting on the world to change

it's hard to beat the system
when we're standing at a distance
so we keep waiting
waiting on the world to change
now if we had the power
to bring our neighbors home from war
they would have never missed a Christmas
no more ribbons on their door
and when you trust your television
what you get is what you got
cause when they own the information, oh
they can bend it all they want

that's why we're waiting
waiting on the world to change
we keep on waiting
waiting on the world to change

it's not that we don't care,
we just know that the fight ain't fair
so we keep on waiting
waiting on the world to change

and we're still waiting
waiting on the world to change
we keep on waiting waiting on the world to change
one day our generation
is gonna rule the population
so we keep on waiting
waiting on the world to change

we keep on waiting
waiting on the world to change

Now, the analysis
The sentiments are good, but he misunderstands a fundamental part of human nature. The elite that are corrupting our society now will hand their corruption to their chosen successors. Just because those successors are from our generation doesn't mean they will be any less corrupt. They will continue to attempt to plunge us into a feudalistic society with them at the helm. The fact that we're "waiting" just makes them smile broader. They know we're powerless to stop them so they're taking full advantage of it. Our freedoms have been tromped upon. Our civil liberties abolished. We have our own version of the SS (we call it Homeland Security - anyone notice how these guys go after child porn? No one complains because we all know child porn is bad, but how is that homeland security? It's not. It's the start of our own version of the secret police.) Why will our generation act any differently? The fringe, artistic types such as John Mayer have never supported war or abdication of civil liberties; however, their generations didn't set things right. Why does John think his will? He's completely wrong. Instead, he's promoting the attitude that suits the would-be dictator's perfectly. He's sitting back and watching it all happen. Eventually, it will be too late to stop it.

What can we do to stop it? We have to take away their power base. Of course, we can't rely on them to do it, because they like their power base. Instead, we have to use the one constitutional outlet that they haven't taken away. We need a 2/3 majority of the states to support certain constitutional amendments.

I propose three amendments that should erode the power base of those in charge.
1. Repeal the 17th amendment. This amendment took away states rights and consolidated power to the federal government by moving the election of the senators to the people. Previously, the states chose the senators that represented them. This made it far less likely that a media campaign could buy an election. It also ensured that the person would actually support the state instead of what made them electable. Now, it is all too easy to support a law because it is "popular". Since you only answer to the people, that is all that matters. The Senate is no more than a balanced House of Representatives. Instead, it is supposed to represent the states. Senators would have a much harder time revoking states rights if they had to answer to the state congress.
2. Make the party nomination of presidential candidates AFTER the election of the electorial college. Furthermore, electors should not be allowed to express allegiance to a political party. Currently, the population, not the electorial college, elects the president. This is NOT how it was designed to work. The writers of the consititution knew that the general population is too stupid to choose the correct presidential candidate. Therefore, the general population should choose a bunch of smart people, who choose the president. This has been lost on today's generation. We have to bring back that purpose. The popular vote for a president should never be taken.
3. We must erode their monetary control of our society. The last amendment should abolish the federal reserve and return us to a gold standard. For the reasoning behind this, I would recommend Griffin's The Creature from Jekyll Island. It is a very informative (and very large) book about the history of the federal reserve, fiat money, and fractional reserve banking. It shows how control of the money system directly leads to a controlling, manipulative, feudalistic government.

We have to stop waiting on the world to change and use what power we have as a people to ensure our future.

Saturday, September 02, 2006

Cultural Progression

How do we, as a culture, define our differences with previous cultures. Do we define it by our use of technology? I don't think we can. There are cultures that have advanced technology, yet have similar cultures to their ancestors who didn't have such technology. I think we have to find something that shows motion, not advancement. To that end, I propose we use music. It appears to me, that music is the definition of culture. In other words, by sharing music, you share your culture. By blending music, you blend your culture. Furthermore, music is always changing and adapting. It blends the past into something completely new and unexpected. Music seems to define our progression. Music defines our culture. Explore that concept and let me know what you think.

Saturday, August 12, 2006


Boo doesn't have generics. This makes me sad. Since I want a strongly typed matching solution, I _really_ want generics. I also want macros. ARRRRRGGGGG! Why won't people give me EVERYTHING. One option is to use arrays instead of lists. Another option is to add generics to the language. I think I'll start with the first and then migrate to the second. That should be sufficient for my needs.

Ok, now on to the O/R mapping.

First, since this is all about types, we're going to add the type information for (most) things in the database. Why, you ask? Shouldn't we know if it is a FirstName or a LastName because we have knowledge of the table? Yes, that is correct, but you're thinking too simplistically. In true OO fashion, FirstName and LastName will be nothing more than base classes. We will end up with an AsianFirstName, AngloFirstName, HispanicFirstName, etc... The same is true for last names. Then, when we consult our statistics, we'll be able to use statistics based on the frequency of the name within its ethnic culture (and also within the geographic location). Therefore, we want to be able to generate and store this extra type information. In addition, we'll want to use the type information when comparing names with the matching engine. We might use a completely different function when comparing a HispanicFirstName to an AngloFirstName as we would a HispanicFirstName to an AsianFirstName (auto-reject, anyone?). By emploring multi-method dispatch on the type information, we can quickly choose the right matching logic.

But enough skipping ahead, back to the O/R mapping.

Obviously, each Element will be stored in its own field. It will have an associated type information field. It may also have a pointer into a metadata table. I'm not sure on that one yet, we'll have to see what sort of metadata we will keep that is outside of the type system.

An entity becomes a little more complicated. Remember that an Entity is a collection of other entities, groups, and elements.

Let's look at two different Entities:

class Name(Entity):
first_name as FirstName
last_name as LastName
middle_initial as Nullable(MiddleInitial)
name_suffix as Nullable(NameSuffix)

Wow, what's that Nullable thing? Well, the type system should include whether or not the field can be blank, and Nullable is just as good a choice as any. I'm really starting to want to do this project in O'Caml. I'm getting very close to breaking open the docs on F#. Of course, now that I think about it, C# might be a good choice. I wouldn't need to modify the parser if I had introspection, which C# gives you. Plus it is strongly typed and has generics...hmmm...I had forgotten introspection...drat!

Ok, back to the task at hand.

In most cases, the Name entity will have a Name table. Each element within the entity will correspond to a field in the table. There will also be the associated type field (and perhaps metadata fields). Each record in the table will also have a unique, primary key.

That was easy...

Now, for a more complicated example:

class Person(Entity):
name as Name
address as Address
ssn as Nullable(SSN)
birthday as Nullable(Date)

For a person we will have a unique primary key, but we will also store the primary key of both the name and address information. The ssn and birthday elements will be stored "in-line" like the name elements were in the previous example.

Of course, we might want to force denormalization of the table...we could try something like

class Person(Entity):
name as Name
address as Address

Now, name has an inline attribute and will not go in a separate table. Instead, the name fields will be placed in the Person table. However, when we extract the Person object from the table, we'll extract a Name object as well, so you can't tell the difference from the user side.

I'm hesitant to allow an [inline] attribute, because you get to the same point as with C++ and its inline modifier. The compiler can't inline without you telling it, so you're forced to make decisions that the compiler should be able to make. Therefore, if we have an [inline] modifier it will be more like "auto" in C++, a hint but the compiler can do what it wishes with regards to inlining. Hopefully, it's usage will vanish just like auto's.

Ok, next time we'll look at the O/R mapping of groups. I'm still not using the mailing list from SF because I sent things to it that I never got back, so I'm waiting until I have a successful test run before I move there for good.

Sourceforge Site Available

You can now go to The STARS Sourceforge Site and sign up on the see-stars-devl mailing list. I hope to make most of the technical discussions through that list so that they will be archived and open for discussion. I will also post previous blog entries to the mailing list for historical purposes.

Friday, August 11, 2006

SourceForge site will be the sourceforge site. It has been created and I am in the process of getting someone to set it up ;-)

Secondly, I'd like to discuss how the type system will interact with the database. First, I said that we're going to be strongly typed. This is not just fancy terminology, this will affect how the database is structured. For instance, we might want to subdivide names for etnicities. John Doe might get the standard US ethnicity, but Wing Fe might get an asian ethnicity. This can be represented by using subtypes. Therefore, this type information must be stored in the database for fast access. In addition, we might have types for strongly cohesive groups, loosely cohesive groups, etc... (perhaps even a composite group that composes multiple strongly cohesive groups so you can see the heirarchy).

But wait! You say. If we're only doing updates then it won't take too long to create all this information, but our initial database population will take FOREVER! People will get tired of waiting! To that, I say, you're absolutely right. That is another beauty of callbacks. We're not going to do it for the initial population. Instead, we're going to use reasonable defaults, but we're going to allow for processes to continually improve the data. I think consumers expect this. They want their data fast, but they also want it right. They hope that over time their data gets better. To ensure that, we'll have reasonable defaults (so, they won't get the Asian/American matching function, they'll get the statistically based one), but when we correct that decision later, we'll let them know through the callbacks. So, they can make their business decisions quickly and then revise them when the situation merits it.

Of course, if they want to wait around, they definitely can, but we need to be adaptive and fast and continuously improving!

Thursday, August 10, 2006

Two quick things

1. All operations must be idempotent. I'm not sure (yet) how to enforce may just be a really strong suggestion.

2. Versioning will be a big part of the system. We need to be able to add fields to an entity and remove fields from and entity, inline an entity (more on that later) and extract an entity. I'm sure there are a number of "refactoring" tools that come from this, but I want them FIRST, not "when there is time." For instance, if I want to add a Title to a Name, then that needs to be as easy as adding title as Title to the Name class and running an upgrade program with an optional map function to create the title given the name (the map could set them all to a default (blank) title or it could try to guess a title of Mr or Mrs based on a derived gender).

Right now, I know I want automated upgrades when I
a) add a field
b) remove a field
c) inline an entity
d) extract an entity
e) add/change the validation method
f) add/change the normalization method
g) add an implied attribute

Also, we better darn well be able to downgrade!!!!!


Now that we can define the objects (though not rigourously, that's coming later), we need to know how to create them. Obviously, object creation could possibly mean invoking the database for lookups and the like, so it needs to be well abstracted. In this case, we'll use the abstract factory pattern to create factory objects which will create our entities. For example, if you define the following entity:

class PersonAtAddress(Entity):
name as Name
ssn as SSN
birthday as Date
address as Address

Then you will end up with the following abstract factory

class AbstractFactory:
def getNameFactory:
def getAddressFactory:
def getPersonAtAddressFactory:

class PersonAtAddressFactory:
def Create(name_key as Name.ID, ssn as SSN, birthday as Date, addr_key as Address.ID):

Now, we have a consistent, programmatic way to create entities. We can wrap these calls in CORBA or SOAP or whatever, but the foundation is solid.

Next time, we'll start looking at how the objects map to a relational database.

Wednesday, August 09, 2006

Two Things

I'm pretty sure of two things at this point. First, the name of the system will be the Staticly Typed Advanced Recognition System (STARS). I hope to get a sourceforge site for it up soon.

Second, the system will be written primarily in Boo and run on mono. Boo has all the right things for this project.

1. Strongly Typed
2. Type Inference
3. Macros
4. User defined compile steps (this will be very useful when creating the database schema from the source files).
5. .NET/Mono compatible - at some point people will want to run their own code, there is a likelihood that the code will be .NET.
6. Clean syntax - based around python (I prefer Ruby, but...)
7. Duck typing
8. Multi-thread capable
9. Functional + Object Oriented

As soon as I get the SF project up, I'll post a link to it here. I'd like to write the specifications in the project's mailing list.


Callbacks are the key to a good recognition system. A typical batch recognition system forgets about the importance of letting consumers know as soon as information is available. However, this is the lynchpin to a good recognition system.

Let's assume that a typical use case is the following:

1. Run a large file through the system to create a repository
2. Run the files through the repository to ensure correct linkage
3. Rinse and repeate monthly

You have to run the file through twice because you don't know what might happen later in the system to change one of your records. This is because the system is not set up to tell you about events.

If, instead, we allowed the sytem to tell you about important things that are happening, you would be able to complete your run in one pass. So, what needs events. Well, first let's say that we'll use a publish/subscribe mechanism so that only those events that we're interested in will be delivered to us. Second, let's say make the rule that anything that could have an impact on the end result should have an event fired. That means that any time an Entity or Group is created or deleted as well as any time an Entity is moved from one Group to another. I would say that Element updates should be allowed to have events, but not forced to. It could be that updating the salary field doesn't affect anything and you don't need that information to be disseminated.

There are lots of optimizations you can do to make this fast and I don't want to get into those right now, but suffice it to say that the event/callback mechanism can make for an extremely flexible (and efficient!) system.

Obviously, the code for the callback won't be in the same file as the code defining the entities. However, we may want to augment the event with some information at event generation time; therefore, we allow the override of the OnX methods (where X is something like Consolidation).

For example:

class Consumer < Group
def Consolidation(Group other):
if ...:
consolidation_reason = ...
elseif ...:
consolidation_reason = ...

def OnConslidation(ConsolidationEvent event):
event.reason = conslidation_reason

I'm not sure, but you might even be able to suppress events...I don't necessarily like that, but it could come in handy.

Adding operations

First, for the sake of this post, let's change the syntax a bit. Instead of saying

Entity X

We're going to say

class X < Entity

Basic types, like FirstName, will become less basic and will inherit from Element (or in this case, a derivitive: StringElement)

class FirstName < StringElement

Groups will go from the generic Group(PersonAtAddress) to

class PersonAtAddress < Group

Now, this is not going to be the final syntax, but I want you to think of it as inheritance, because we're going to overload functions.

For instance, an Element should know how to validate itself. A simple example might be

class FirstName < Element
def validate:
return representation =~ /[a-zA-Z-]+/

A more complicated example might make a SOAP call to the validation server defined for that type (we'll see how to do that in a later post).

Elements also need to know how to normalize themselves. For instance, a name might wish to be represented in all upper case:

class FirstName < StringElement
def normalize:

Another operation you're probably screaming for by now is creation. In this case, the StringElement does the right thing for you, it sets the internal representation to a passed in value. However, you might want to do more. For instance, you might want to keep a map of how often you see each first name to handle statistical based maching. Therefore, you want

class FirstName < StringElement
static Hash seen # would be @@seen in ruby
def initialize(String value):
seen ||= new Hash(0)

Other methods might include update and clear.

For Entities, we need the following operations:

initialize - pass in a value for each of the "member variables" and assign them if they are consistent. Otherwise, throw an exception.
validate - validate the state of an entity
update - update one of the fields of an entity

Update is the most interesting because we could update one or many of the fields. I think it might be interesting to use nil for fields we don't want to update, but I'd rather not. Perhaps a hash? But I don't want to miss a field on accident, and I'd like it to be "compile-time" checked. So, we're back to nils.

class Name < Entity
FirstName first_name
MiddleName middle_name
LastName last_name
NameSuffix name_suffix

def update(FirstName fn,
MiddleName mn,
LastName ln,
NameSuffix ns):
first_name.update(fn) unless fn.nil?
middle_name.update(mn) unless mn.nil?
last_name.update(ln) unless ln.nil?
name_suffix.update(ns) unless ns.nil?
rescue => { rollback } #undo all operations in the transaction (maybe a validate fails?)

I'm fairly convinced that a transaction is the right way to handle this situation, but I'm not convinced of the syntax, there are definitely other ways to handle it. For instance, you could have an updater that makes the determination based on the exit strategy of the function.
FirstName::Updater updater(first_name, fn)
It could also handle the nil? case. If the function exits normally, the updater commits - otherwise it rollsback. Regardless, a transaction is needed for exception safety.

You may also delete an entity. I imagine that this should return a boolean on whether or not the delete should succeed, but delete's probably shouldn't fail.

A Group has some additional operations. Since a group is just a collection of entities, it may have an entity added to it or removed from it. It addition, it may also have a list merged into it or part of another list sliced into it.

Let's look at each operation in the abstract:

original list -> [A,B,C] value -> D new list -> [A,B,C,D]

original list -> [A,B,C] value -> B new list -> [A,C]

original list -> [A,B,C] value -> [D,E,F] new list -> [A,B,C,D,E,F] new value -> []

The merge adds all of the elements of the value into the list and deletes them from the value.

original list -> [A,B,C] value -> [D,E,F], 1, 1 new list -> [A,B,C,E] new value -> [D,F]

The splice takes an array and a begin and end offset and adds those elements to the new list while removing them from the old.

The merge we will call a consolidation and the splice we will refer to as a split.

An important property of a Group involves it's MetaGroup. The MetaGroup is the Set that consists of all the instances of the Group.

So, let's say we have a Consumer Group. We want to say that a consumer can not appear in more than one group. That means that the order of the union of all groups is equivalent to the sum of the order of all groups. To do this, we say that for every group, it's MetaGroup must be a true Set, and cannot include duplicate entities.

Now, back to the operations. Each one of these operations must take place in a transactional environment. For example, an update can fail because of an implied field mismatch.

What's an implied field? A field in a group may be declared strongly or weakly implied. I'm not sure how this declaration will take place (I hate the .NET attribute syntax). However, once it is so declared, it will enforce conformance for all the elements in a group. A weakly implied field will ensure a no conflict match for a field across all the entities in a group. So, if we say that NameSuffix is weakly implied for the Consumer Group, that means all Consumers in the same Consumer Group must have the same Name Suffix (or a blank one). A strongly implied field removes the ability for blanks to match.

Ok, this post has gone on long enough. In future posts, we'll return to the formal specification of these operations.

Type safe recognition

I believe that recognition (part of Customer Data Integration) can (and should!) be made strongly type safe. Basic types (in the US) might include FirstName, LastName, SSN, Street, CompositeStreet. More advanced types would include an Entity (a collection of basic types) and a Group (a collection of entities).

Notice how this mirrors a programming language. A basic type would be something like an int or float. An entity is like an object, and a Group would be a collection of objects (think an array).

This leads to a few interesting questions. Let's consider the following:

Entity Name:
FirstName first_name
MiddleName middle_name
LastName last_name
NameSuffix name_suffix

Entity Address:
CompositeStreet street_line_1
CompositeStreet street_line_2
City city
State state
Zip zip

Entity Person:
Name name
SSN ssn
Date birthday

Entity PersonAtAddress:
Person person
Address address

Entity Consumer:
Group(PersonAtAddress) occupancies
SSN preferred_ssn
Address preferred_address

At this point, we have types for our recognition system that can be manipulated and understood by both humans and computers. We can further augment them. I'll work on describing the augmentation/annotation of the types in the next blog.


Should an entity be allowed to contain more than one group?
Should an entity be allowed to contain another entity that contains a group?
For example:

Entity Household:
Group(Consumer) members
Address preferred_address

More to come!

Saturday, June 10, 2006

What to do

My old job gave me many ideas on how a company should be run. The second job I took, at the beginning of this year, changed some of those. Basically, this blog was a running commentary on things I wanted changed at work. With my current job, I have no complaints. It makes it hard to write about. We do code reviews, unit testing and integration testing. We do some pair programming, when it makes sense. We have continuous integration and we hire smart people. We try to do more with less quantity and more quality. It is, in many respects, an enviable situation. So, I'm going to start posting more at Thoughts of He and less here, because I don't have a lot to gripe about, anymore. What a great feeling that is!

Tuesday, June 06, 2006

I'm Back!

I'm back with all new material! How cool is that!

My last employer blocked Blogger, so I stopped posting, since it was a pain to get around the firewall. However, my new employer is much more lenient in what they allow.

What have I learned? What haven't I learned is a better question. My previous job, which lasted all of 5 months, taught me TIBCO and Java/J2EE. However, what I really feel like I took from there is the value of logging and some new techniques on how to retain information. I also learned what management should really be like (and what it shouldn't!). I learned how to do more with less, and how to prioritize based on ROI instead of a manager's whim. All in all, it was a very positive experience and I hope we cross paths again someday.

In my current role, I'm learning how productive one can be when all distractions are taken away. I'm also learning how to test properly and ensure the software works correctly. This is also more "mission critical" software, so I'm getting my first exposure to that side of things.

This employer is also away from my home state, Arkansas, so my family is going through a transition period. I'm not expecting it to be easy, but I'm sure we'll pull through.

Until next time!

Wednesday, March 15, 2006


Last night on American Idol, Stevie Wonder referred to "Chicken Little"'s voice as "interesting". This was the kiss of death. In other words, Stevie was implying that his voice was horrible and he had nothing better to say. For me, in school, the worst thing to hear was not that I was interesting, but that I was smart.

When you're smart, it often happens that it is your only attribute. No one notices anything else. You're labeled as "the smart kid". Sometimes, you get worse monikors, but we won't disucss those :-) I spent most of my childhood and adult life hiding from that term. In school, I never wanted to tell others what I made on my tests. I never announced my grades like others did. My goal was to blend in completely. In my adult life, I never bragged on my degrees or how fast I got them. At my previous workplace, the only way most people knew I had a Ph.D. was that someone else told them. I never announced it or put it in my email signature or anything else. When I see people with a Ph.D. at the end of their name, I laugh inwardly. "Why would someone want to draw that attention to themselves?" I would ask. When I taught at the college level, I had my students call me by my first name.

Deep down, this was only to avoid the "smart" label. I hated it so much from my childhood that I would have done anything to avoid being labeled with it. But my new year's resolution is acceptance. Last week, I dug my Ph.D. degree out of a box in my office and put it up in my cube. It's a first step.

Sunday, March 05, 2006

ALAR Conference

Long time, no post. However, I'm back from the ALAR Conference and I have a little time, so I'll post. Hopefully, this won't be the last for a while. The ALAR Conference is a data engineering and grid computing conference put on by the Acxiom Laborator for Applied Research. There were a number of great presentations there. I had a paper accepted into the conference, so I got to speak as well.

I brought two important things away from the conference. The first is that I really miss research, especially applied research. I'm going to have to seriously consider how to inject research into my current job, or move on to something new. Research has been a part of my life for so long. In many ways, it defines who I am.

The second thing I took away from the conference was the need to examine distributed game programming. It appears to me that they have some insight on how to handle a massive number of incoming transactions. Sure, they cheat a little, but it might teach us how to cheat in similar or different ways. For instance, they use "realms" to limit the number of players on a single server. Data engineers can also mimic that by the use of a good, pre-defined partition key. They also allow lossy transactions. Since you're going to be sending another transaction in the next milisecond, it's ok if we drop one. I don't know how to replicate that ability with data, but I'm willing to investigate the possibility.

Lastly, it was a great chance to meet up with old friends both from the Corporate world and from the Academic world. I was amazed at how many people I actually knew there.

Well, until next time...