Welcome to the play list on statistics, something I’ve been doing for some time. So anyway, I just want to get right into the mid of it and I'll try to do as many examples as possible. Hopefully, it gives you the feel for what statistics is all about. And really, just to kind of start off in case you’re not familiar with it, although I think a lot of people have an intuitive feel for what statistics is about.
Well, in very general terms, it’s kind of getting your head around data and it can broadly be classified with three categories; you have descriptive. So, say you have a lot of data and you wanted to tell someone about it without giving them all of the data, maybe you can kind of find indicative numbers that somehow represent all of that data without having to go over all of the data. So, that would be descriptive statistics, which is also predictive. Well, I'll kind of group them together.
There’s inferential statistics, this is when you use data to essentially make conclusions about things. So, let’s say you’ve sampled some data from a population. We’ll talk a lot about samples versus populations but I think you have just the basic sense of what that is. If I serve three people who are going to vote for president, I clearly haven’t surveyed the entire population. I’ve surveyed a sample. But what inferential statistics are all about are if we can do some math on the samples, maybe we can make inferences or conclusions about the population as a whole. Well anyway, that is just the big picture of what statistics is all about. So, let’s just get into the mid of it and we’ll start with the descriptive.
So, the first thing that I would want to do or I think most people would want to do when they’re given a whole set of numbers and they’re told to describe it, they say, “Well, maybe I can come up with some number that is most indicative of all of the numbers in that set or some number that represents kind of the central tendency.” This is a word you’ll see a lot in statistics books; the central tendency of a set of numbers. And this is also called the average. And I'll be a little bit more exact here than I normally am with the word average.
When I talk about it in this context, it just means that the average is a number that somehow is giving us a sense of the central tendency or maybe a number that is most represented of a set. I know that sounds all very abstract but let’s do a couple of examples.
So, there’s a bunch of ways that you can actually measure the central tendency or the average of a set of numbers and you’ve probably seen these before. They are the mean and actually, there are types of means but we’ll stick with the arithmetic mean. Later, when we talk about stock returns and things, we’ll do geometric means and maybe we’ll cover the harmonic mean one day. There’s the mean, the median and the mode. And statistic speak, these all can kind of be represented of a data sets or a population central tendency or a sample central tendency and they all are collectively can be forms of an average. And I think when see examples, it will make a little bit more sense.
In everyday speak, when people talk about an average, they’re usually talking about the arithmetic means and I think you’ve already computed averages in your life. And normally, when someone says, “Let’s take the average of these numbers.” They expect you to do something; they want you to figure out the arithmetic mean. They don’t want you to figure out the median or the mode. But before we go any further, let’s figure out what these things are. So, let me make up a set of numbers.
So, let’s say I have a number 1. Let’s say I have another 1. Let’s have a 2, a 3. Let’s say I have a 4. That’s good enough. We just want a simple example. So, the mean of the arithmetic mean is probably what you’re most familiar with and when people talk about average, that’s essentially adding up all the numbers and you divide it by the numbers that there are. So, in this case, it would be 1 + 1 + 2 + 3 + 4 and you’re going to divide it by 5 numbers. And this is what? 1 +1 is 2. 2+ 2 is 4. 4 + 3 is 7. 7 + 4 is 11, so this is equal to 11/5. That is 2 1/5, so that is equal to 2.2. And so someone could say, “Hey! You know that is a pretty good representative number of this set. That’s the number that all of these numbers you can kind of say are closest to. Or 2.2 represent the central tendency of this set.” And in common speak, that would be the average but if we’re being a little bit more particular, this is the arithmetic mean of this set of numbers. And you see, it kind of represents them. If I didn’t want to give you the list of five numbers, I could say, “Well, you know I have a set of five numbers and their mean is 2.2.” And it kind of tells you a little bit of at least where the numbers are. We’ll talk a little bit more about how you know how far the numbers are from that mean in probably the next video.
So, that’s one measure. Another measure, instead of averaging it this way, you can average it by putting the numbers in order, which I actually already did. So, let’s just write them down in order again; 1, 1, 2, 3, 4. And you just take the middle number. So, let’s see, there are 5 numbers. So, the middle number is going to be right here. The middle number is 2. There are two numbers greater than two and there are two numbers less than two and this is called the median. It’s actually very little computation. You just have to essentially sort the numbers and then you find whatever number where you have an equal number greater than or less than that number. So, the median of this set is two. And you see, I mean that’s actually fairly close to the mean and there is no right answer. One of these isn’t a better answer for the average. There are just different ways of measuring the average. So, here it’s the median.
I know what you might be thinking, “Well, that was easy enough when we had five numbers. What if we had six numbers?” What if it was like this? What if this was our set of numbers; 1, 1, 2, 3, 4, 4?” So now, there is no middle number. Two is not the middle number because there are two less than and three larger than it. And then three is not the middle number because there are two larger and three smaller than it, so there is no middle number. So, when you have a set with even numbers and someone tells you to figure out the median, what you do is you take the middle two numbers. Then you take the arithmetic mean of those two numbers. So in this case of this set, the median would be 2.5. Fair enough but we’ll just put this aside because I want to compare the median and the means and the mode for the same set of numbers but that is a good thing to know because sometimes it can be a little confusing and these are all definitions. These are kind of mathematical tools for getting our heads around numbers. It’s not like someone saw one of these formulas on like the face of the sun and says, “Oh, that’s part of the universe, that this is how the average should be calculated.” These are human constructs to kind of just get our heads around large sets of data. This isn’t a large set of data but a set of five numbers. If we had 5 million numbers, you could imagine that we don’t like thinking about every number individually.
Anyway, before I talk more about that, let me tell you what the mode is. The mode to some degree is the one that I think most people probably forger or never learn. When they see it in exam, it confuses them because they’re like, “Oh, that sounds very advanced.” But in some ways, it is the easiest of all of the measures of central tendency or of average. The mode is essentially what number is most common in a set.
So in this example, there are two 1’s and there is one of everything else. So, the mode here is one. So the mode, you can kind of say as the most common number and then you could kind of say, “Hey Sal! What if this was our set; 1, 1, 2, 3, 4, 4?” Here, I have two 1’s and I have two 4’s. This is where the mode get s a little bit tricky because either of these would have been a decent answer for the mode. You could actually say the mode of this is 1 or the mode of this is 4 and it gets a little bit ambiguous and you probably want a little clarity from the person asking you. Most times on a test, when they ask you, there’s not going to be the same thing. There will be a most common number in the set. So now, you’re saying, “Oh, why wasn’t just one of these good enough? Why did we learn averages? Why don’t we just use averages or why don’t we use arithmetic mean all the time? What’s median and mode good for?” Well, I'll try to do one example of that and see if it rings true with you and you could think a little bit more.
Let’s say I had this set of numbers; 3, 3, 3, 3, 3 and 100. So, what is the arithmetic mean here? What’s the mean here? So I have five 3’s and 100, so it would be 115 ÷ 6 because I have six numbers. 115 is just the sum of all of these. How many times the six is going to 115? So, it’s equal to 19 1/6. Fair enough. I just added all the numbers and divided it by how many they are but my question is, “Is this really a representative of this set?” I mean I have ton of 3’s and then I have 100 all of a sudden and we’re saying that the central tendency is 19 1/6. 19 1/6 doesn’t really seem indicative of this set. Maybe it does depending on your application but it does seem a little bit off. I mean, my intuition would be that the central tendency is something close to three because there’s a lot of 3’s here. So, what would the median tell us?
We already put these numbers in order. If I’ve given you out of order, you’d want to put in this order and you say, “What’s the middle number?” Well, let’s see. The middle two numbers, since I have an even number, our 3 and 3. So, if I take the arithmetic mean of 3 and 3, I get 3. And this is maybe a better measurement of the central tendency or of the average of this set of numbers. Essentially what it does is, by taking the median, I wasn’t so much affected by this really large number that is very different than the others. In statistics, they call that an outlier. If you talked about average home prices, maybe every house in the city is $100,000.00 and then there is one house that cost a trillion dollars. And then if someone tells you the average house price was a million dollars, you might have a very wrong perception of that city. But the median house price would be $100,000.00 and you’d get a better sense of what the houses in that city are like. So similarly, this median maybe gives you a better sense of what the numbers in this set are like because the arithmetic mean was queued by this with what they’d call an outlier. And being able to tell what an outlier is, it’s kind of one of those things that statisticians will say, “Well, I know when I see it.” There isn’t really a formal definition for it but it tends to be a number that really kind of sticks out and sometimes it’s due to a measurement error or whatever.
And then finally, the mode. What is the most common number in this set? Well, there are five 3’s and there is 100, so the most common number is once again 3. So in this case, when you had this outlier, the median and the mode tend to be maybe a little bit better about giving an indication of what these numbers represent. Maybe, this was just a measurement error but I don’t know. We don’t actually know what the representative. These are house prices and I would argue that these are probably more indicative measures of what the houses in an area cost. But if this is something else, if this scores on a test maybe, maybe there is one kid in the class. One out of six kids who did really, really well and everyone else didn’t study and this is more indicative of kind of how students at that level do on average.
Anyway, I am done talking about all of this. I encourage you to play with a lot of numbers and deal with the concepts yourself. In the next video, we’ll explore more descriptive statistics and instead of talking about the central tendency, we’ll talk about how things spread apart and how things are away from the central tendency. See you in the next video.
Transcription by:
Scribe4you Transcription Services