So Part two, Introduction to the GenoType. GenoType is a corruption of the word Genetic Archetypes and so it's different than the standard genotype, phenotypes thing but it's kind of cool word. I kind of like it and I like the idea of the Genetic Archetypes or Epigenotype is another way of putting it. In essence really what it is, is there is so many processes -- how many ways you can come down a mountain and how many values can there be on a mountain -- the art -- theoretically like I say in the book, you can make a strong case that there should be 7.5 billion GenoTypes, but that's going to be silly because how many things in this world that are done redundantly.
We do many things redundantly and it turns out that by the time you do some statistical analysis in term of how many things you can group under the same function, it doesn't take too many. Ultimately, it's like you say, well, tractors can do lots of different things. They can pull stumps out of the ground, they can dig holes, they can drill holes, they can pull rocks, they can chop down trees, but you don't need too many different types of tractors to do all that. You do need a certain type of tractor, maybe tractor needs to be high so that it can go over the tree trucks and tree stumps.
The problem with that, maybe it knocks the tips over, because its center gravity is too high. Well maybe, if we make the center gravity lower, but now it can't go over rough terrain. So there are certain constraints that do tell you how many different jobs you need to have done. Physical numbers that tell you, okay, I can't do this with this. And that's again part of this algorithm like approach to these characterization, these epigenetic characterizations. But basically, pass to certain point, it needs 7.5 billion to tell you what is involved in making good tractor. You just maybe need two or three variations.
How do you wind up with GenoTypes? You wind up with GenoTypes, you start of first by looking a gene frequencies. The classic genes are well know to the frequencies aren't just about near of the classic serological genes. By that I might mean blood types, testatrix, haplotypes, HLA antigenes, any of those things, a hemoglobin type, haptoglobin types. They are all published and we have not only the publish frequencies, but we have demographics with regard to key populations and frequencies. These are extracted from our work of a great brilliance called History and Geography of Human Genes by Cavalli Sforza.
That's generally where you could start it. I mean, you can go into looking this is another keynote work which was Cummins Fingerprints Palms and Soles, and you wind up, okay, here is fingerprint pattern that co-relate to blood groups, here are fingerprints patterns that co-relate to nationality, here are fingerprints occurrences on the particular fingers that co-relate to -- so what you are trying to do is weave this holistic approach to the data that allows you to understand based upon those structures that I don't want to do the same thing twice.
Now again this is not a luxuriant and statistic, but if you can understand this key premise, you will understand how we got six, which is that if you look what's called multivariate analysis. Multivariate analysis is statistical tool that lets you take multidimensional data and crunch it down and you crunch it down by essentially trying to find what are called eigenvalues which are direction through the data that particular direction and capitulates the most amount of variants. So that if I drew the line through the data, any other way, I would have less variants in that line.
Then, you then make a determination that's called orthogonal to that, which means that now by mathematical rules if this is my vector that has maximum variants, at 90 degrees will be the next component of that data, and that's called principal component analysis. That's the basis of most of these characterizations.
Now of course, I won't tell you, I did this ahead to make statistical packages that do this, but I will give you good example. Well, you might be familiar with this, but basically you have been collecting data on people now for four years through a lot of softwares that I have written so we go in the office, we measure fingers, we meat this, chief complains, blah, blah, blah and out. We are doing it to essentially give people information dialectally, but on the other side that data is going into a series of table that's allowing me to analyze the occurrences independently. So in additional to all the published genetic data, or the published demonological data, we have a core database of about a thousand people that we followed for the last five years with regard to the occurrences of these tendencies in and off themselves.
And as since we really have been -- there is a cute little drawing. And basically, what you are doing with these types of these multivariate tools, is you are compressing data from something that you can't deal with to something that you can. So for instance, if you look at this, like take this example of Christmas tree. Let's say I have a Christmas tree here and I want to tell you where this Christmas ornament is. It's going to be very hard to me to tell you that it's decide by the décor a little bit underneath the other one. I mean I could describe it to you, but it would be a description that would be useless.
So what can I do? I could take a slide projector and project it so that I reduce the Christmas tree to two dimensions and now I can say it to you, oh! It's just one over here we are talking about. See, what I mean? We loose a little something, but we gain a little something. We loose a little bit of take on reality but we gain the ability to have take on reality, because the other thing is too complicated for us as mere humans. We can't think in three dimensions, four dimensions, five dimensions. Mathematicians can think in 20 dimensions. I write computer programs that routinely have 30 dimensions in them, but I can't think of 30 dimensions.
So people as are mere humans, we have to go down at two dimensions and tables and graphs and things and we want to do that. When we do that, we want those things to become as pure and as close as what we can get and really the best tools for that are principal component analysis, multivariate data analysis, cluster analysis and factor analysis, because these are those these things you take all those numbers, you throw them on the tables and you decide to yourself, okay, what's the thing I am looking for and then after that it's just, okay, well then I am looking for that it means I am looking for this and if I am looking for this, it means I am looking for this and it means I am looking for this, I am looking for this. Simple from the complex.
Here is a simple -- again I just want to take you through this, because it gives you an idea how these things come into formation. Here is three things. Here is RhD is the ability of a person to be blood group Rh negative. ABO O is that's the ability of a person be blood type O and then Duffy A is FY A. So what you are looking at is the percentage of Africans who RH negative is 20, the percentage of Africans who are blood type O is 69 and a percentage of Africans who are Duffy A is 11 and then blah, blah, blah, Asia, Europe, America blah, blah, blah.
You generate a series of what are called eigenvalues, a variation that tell you that these are the accesses of the variation and you generate a matrix that tells you the number relative to each other. Now of course, what does this mean to you? It means nothing to us, right? These numbers that have no relevance, but if we plot them, we could see that if I take all that just those three genes, I can plot out on a map how distant those people are. So by looking at those three genes, you can tell all Australians are not too similar Europeans. Of Africans are in their own world, Asians are sort of more in the center, but we got remember this is math, so that it turns out that it looks like it's in the middle, but it turns out it kind of coming at you because you are looking at two dimensions and it's really three. So
Transcription by:
Scribe4you Transcription Services