Tuesday, February 18, 2003

Improving the ranking: Discounting scoring

As I mentioned below, I’m not happy with the overall scoring system as I think it is overvaluing scoring points, but I wasn’t sure how to adjust for this. I could make up a fudge factor to lower the value of points scored, but that would have been an ad hoc, arbitrary solution. Thinking it over, I’ve come up with something better, although it’s still not perfect.

I’m happy with the way I’ve valued everything else besides points. I think those numbers are good both in themselves and in relation to each other. The question is how to scale the value of points scored to the value of all the other stats. To do this, I went back and looked at the total value of the stats for an averge team. Adding up all their rebounds, assists, etc., I found a total non-scoring value for the team of 38.4 points. Since they scored 94.8 points per game, that means the remaining value from points should add up to 56.4. Now, the efficiency adjusted scoring value for the team was 106.5 points (this value is higher than the actual points scored because of the way I calculated it, by excluding TO’s from consideration. This makes sense to me, as there is some value in simply chucking up a shot, since it avoids the possibility of a turnover that possession. You can’t score unless you shoot, so even a bad shot has a slight advantage over a pass, in a certain sense.)

Anyway, this gives a normalization factor of 0.53 for the points scored, discounting my previous value by almost a half. I’m still not entirely happy with this, since it’s scaling everything onto offensive production, when properly speaking blocks and steals are defensive contributions, as are defensive rebounds. But removing them from the calculation above still leaves me with a system that, in my opinion, greatly overvalues points scored. While including them gives numbers that I find more reasonable. So there’s a philosophical issue there, but practically it seems to work. And since I’m still not sure how to factor in defense, I’m left with lumping everything into an offensive contribution.

Anyway, with the new, adjusted ranking systems, the top players in the league grade out as follows:

McGrady: 26.6
Shaq: 26.0
Kobe: 24.6
Garnett: 24.2
Pierce: 23.3
Nowitzki: 22.4
Webber: 22.1
Jermaine O’Neal: 20.8
Steve Francis: 19.8
Ben Wallace: 19.5
Payton: 18.6
Iverson: 18.34
Kidd: 18.29
Stackhouse: 17.8
Antoine Walker: 17.2
Jordan: 16.2

So, Kidd still grades out lower than I think he probably should, but it’s better than before. And I think the problems that remain are simply intrinsic to the process of boiling everything down to a single number. Some thoughts on that in the next installment.

Monday, February 17, 2003

Should MJ have been an all star?

There have been some debates on this point, and a common argument was that MJ isn't even the best player on his own team--that Stackhouse deserved a spot on the all star squad more than Jordan did. I decided to look at them, and found:

MJ: 25.8

Stackhouse: 30.4

So Stack measures out as significantly better than MJ. The difference is entirely attributable to scoring. Stack scores 3 more ppoints per game, and does it in more efficient fashion, so his adjusted scoring value comes out to 26.7 ppg, while MJ's is only 20.4. While Jordan shoot a slightly higher percentage, he gets to the line a meager 4 times per game, while Stack gets there 7.6 times per game, a significant advantage which pushes him ahead of MJ. In other areas, MJ comes out slightly ahead with twice and many steals (1.6 to 0.8) and 0.7 fewer turnovers per game.

On a side note, I'm starting to think the grading system I'm using is over-valuing scoring. In the other stats (rebounding, assists), I was discounting their value somewhat on the idea that, even without that particualr rebound or assist, the team still might get the ball or a basket. But I'm not doing the same thing with points, instead giving the entire value for points scored, and further adjusting for efficiency. But a system which scores Jerry Stackhouse as better than Jason Kidd strikes me as being questionable. Either I'm over-rating scoring, or else the intangibles that are not being accounted for have a very large effect. I need to think abut this some more.

Sunday, February 16, 2003

A few more players, for comparison

Pierre Pierce: 38.5

Antoine Walker: 27.9

Ben Wallace: 22.9

Steve Francis: 32.8

Yao Ming: 24.3

Jermaine O'Neal: 31.3

Gary Payton: 29.3
Putting it all together

So, I've now determined valuations for points, rebounds, assists, blocks, steals, and turnovers. Defense I'm going to leave aside for now, until and unless I figure out a good way to measure it. (However, I think for most players defensive differences are not that big of a factor. Defense is a learned skill rather than an innate one. Certainly, some players are better at it than others, but the overall FG% allowed by a team is first and foremost determined by the team defense that is taught and implemented by the coaches. Individual effects are, I think, secondary. The example of Dan Majerele is an example. For most of his career, he was a gunner on offense who never paid much attention to defense. Then, we he went to Miami and had Riley coaching him, he suddenly became an excellent defender.)

The only remaining factor is fouls, which will also be a small effect. Their biggest impact is in limiting the number of minutes a guy can play, and that will show up in his other averages. A secondary effect is getting the other team into the bonus more, but my gut feeling is that, on average, this will be a small effect. Similarly, drawing lots of fouls will show up primarilly in the number of FTA you get, and hence in the points scored numbers. Getting more FTA for the team via the bonus will be a second order effect and so can be ignored for now.

The only other wrinkle is normalization factors. As mentioned earlier, if your team gets more possessions, you're naturally going to score more, get more steals, etc. Similarly, if the opponents miss more shots, you will get more defensive rebounds. If your team misses more shots, you'll get more offensive rebounds. So, for all these stats, we want to normalize numbers to the average team. We'll normalize steals and turnovers to the number of possessions the team averages. We'll normalize blocks to the number of FGA the opponents had. Points will be normalized to the (number of shots taken + FGA/1.9), the number of offensive attempts. Similarly, for the calculation of the adjusted point value, we'll normalize each player's FGA and FTA to the league averages. Assists will be normalized to the number of made FG's the team has.

So, let's take a look at some players in the NBA this year. The usual debate is who the MVP should be. The main candidates are Shaq, Kobe, Tracy McGradee, Kevin Garnett, Tim Duncan, Jason Kidd, and Allen Iverson. Let's see how they come out. I'm just going to quote the bottom line numbers, with some comments.

McGrady: (30.7 PPG, 6.7 RPG, 5 Assists): 43.9
McGrady grades out as the best all around player in the NBA. He does everything well, and scores in bunches for a team that actually averages fewer possessions than is the norm.

Duncan: (23.7 PPG, 12.9 RPG, 2.9 BPG) : 40.4
Duncan squeeks into second place here. Despite his relatively low scoring average, he scores very efficiently which makes up ground. His excellent rebound and block numbers finish closing the gap, while his very respectable 4 assists per game also keeps him among the elite.

Kobe: (29.6, 7.1 RPG, 6.5 Assists): 40.3
Kobe's all around game adds up here. His rebounding numbers are excellent and he gets plenty of assists, too, to go along with his scoring. Basically the entire difference between he and McGrady is due to Tracy's higher scoring. And while he gets more assists and slightly more rebounds than McGrady, he also averages 1.2 more turnovers per game.

Shaq (25.9, 10.6 RPG, 2.3 Blk): 40.2
Despite shooting 10% better than most of the other candidates, Shaq's efficiency-adjusted scoring does not increase much more than the other players under consideration, because of his poor FT percentage. That keeps him out of the top despite, despite excellent offensive rebounding and block numbers.

Those four are clearly the class of the group, with a significant drop off to the next bunch.

Garnett (22.2, 12.8 RPG, 5.7 assists): 36.2
The difference between Garnett and the players above him is entirely attributed to his lower scoring. If he averaged 25 points per game, he would also be in the low 40's. He suffers in comparison to Duncan because he isn't as efficient a scorer and because Duncan blocks more shots.

Nowitzki (23.5, 10 RPG): 35.3
I was surprised Dirk was this high up. Although his rebounding numbers are good, of those 10 rebounds only 1 per game is an offensive rebound, which means those rebounds aren't as valuable as they look. But he picks up points here and there, with 1.5 steals per game, 1.2 blocks, and obviously his scoring. He also only turns it over 1.7 times per game.

Chris Webber (23 ppg, 10.5 RPG, 5.5 APG) 35.3
Webber isn't higher because his offensive efficienct is terrible for an elite player. He barely averages any more points per offensive attempt than the league average (even Iverson beats the league average by a fair amount), largely because he only gets to the line 6 times per game.The top tier players get there 8-10 times per game.

Iverson (26.8, 4.8 Assists, 2.53 steals) 31.2
Iverson, despite only shooting 40% from the field, is still more efficient than the league average, in large part because he gets to the line 8+ times per game and shoots well from there. His stelas help his bottom line score as well, making a case for him to be among the elite despite not averaging a lot of assists or rebounds. But his relatively low offensive efficiency compared to the others on the list, combined with his not piling up big numbers elsewhere, keeps him well below the very best in the league.

Kidd (19.6, 8.4 assists, 5.8 RPG) 28.7
I was surprised Kidd was so far down the list, but his lack of scoring killed him, compounded by a low shooting percentage. And although his other numbers are good, 8 assists and 6 rebounds don't stack up against the rest of the list. His 3.5 turnovers per game also hurt him.

The last easily measured statistic is the block. (The value of defense is, just as with baseball, hard to get a handle on.) The exact value of blocks is also difficult to quantify. At the first level, it’s straightforward—a block coverts a shot attempt into a miss. Since, on average, there is 1 point scored per shot attempt (which is not the same things as the previously calculated points per possession—they just happened to come out to the same value), it seems like a block should be worth 1 point. But there’s more to it than that. First, many (most?) blocks occur around the basket, so the shots being blocked might actually be higher percentage shots than the average shot. And in addition, a good shot blocker can alter shots, making players miss without actually blocking the shot. And finally, a great shot blocker can go even farther, intimidating opponents and discouraging them from coming into the lane at all, resulting in more outside shots and a lower overall shooting percentage.

To try and tease these effects out, I looked at the data in the same way as with assists. The question is, what is the effect of blocks per FG attempt on the opponents FG%? What I expected was that, as with assists, the noise in the signal would overwhelm the underlying trend in the data. But to my surprise, the data came out very cleanly, as can be seen in the following figure.

As can be seen, there is a clear trend, that can be approximated with a linear fit, with a slope of nearly exactly -1. (The linear fit won’t be valid over the entire range of possibilities, but because actual team all have blocks per FGA in the region between 0.04 and 0.09, a fit that works locally is good enough.) What his means can be seen if we look at what we’re plotting. A linear fit with a slope of negative one menas that:

FG% = FGM/FGA = Base% – Blocks/FGA

Multiply by FGA and you see that this means every block reduces the number of shots the opponents made by 1. Assuming that almost all blocks occur on two point shots, this means that each block is worth 2 points, which is a remarkable result. Getting 5 blocks is thus, according my calculations, worth as much as 10 assists and 8.5 rebounds combined.
Steals and Turnovers

Steals and turnovers are very easy to evaluate. Each steal takes away a possession from the opposing team. Similarly, a turnover costs your own team a possession. Thus, each steal or turnover is valued at 1 point, the value of a single possession.
The value of points

So, I’ve gone through and developed a valuation system for rebounds and assists. The next step is to determine the value of scoring. At first, it seems simple. Each point somebody scores ought to be worth a point, right? Almost. Yes, each point is worth a point, but there’s also some opportunity cost, since every shot you take is a shot that someone else didn’t take. At an extreme example, you can imagine a player who shot it every single time they touched the ball. They might score 40 points a game, but if they were taking 50 shots to get there, they’d also be hurting the team. So we need to determine a way of factoring in the offensive efficiency of a player.

To do this is, I think, fairly straightforward. The value of a players points to the team are equal to the number of points scored, adjusted by the difference between the points scored and the expected number of points the team would have scored in the same number of offensive opportunities. That is,

Points value = number of points scored – (expected number of points scored – number of points scored)

Where the expected number of points scored is calculated by looking at the number of offensive opportunities the player had (which is slightly different from the number of possession, since it doesn’t factor in offensive rebounds or turnovers. The turnover penalty will be factored in separately.) That number is equal to the shots attempted plus the FT’s attempted divided by 1.9. The expected number of points scored per opportunity is simply the teams total points score, divided by team opportunities (FGA + FTA/1.9 + turnovers.). Turnovers are included here since, if the player passes up his shot to pass the ball, there is a chance for a subsequent turnover.)

Simplifying, you get the value of points scored as:

Value of points = (Number of points scored * 2) – Expected number of points scored.

So, for example, if a player scored 17 points per game, but the average team would have scored 20 points in the same number of opportunities, then the player’s value from scoring is only 14 points (17 *2 –20 = 14.)

The league average for expected number of points scored per opportunity (and to compared players on different teams you want to use the league average, to avoid penalizing a player for having good teammates) comes out to be 0.876 points per opportunity. That's the number I will use for future calculations, at least for the 2000-2001 season.
The illusion of rebounding differential

I mentioned earlier the error that you can fall into by trying to evaluate the ability of a team’s offense or defense just by looking at the number of points they score (or give up.) I’d also like to point out that another stat beloved by announcers and commentators—rebounding differential—can also mislead you if you’re not careful.

The basic point, which is simple, is that because the defensive team gets the great majority of rebounds (that is, defensive rebounds are much more common than offensive ones), the number of rebounds a team gets will be determined by the number of shots the opponent misses. A team that plays good defense, limiting the opponents FG%, will have more defensive rebounding opportunities, and hence will get more rebounds. Similarly, if a team shoots a high percentage, the opposing team will have fewer defensive rebounding chances.

Let’s do some numbers here. As mentioned previously, the average team gets about 72% of possible defensive rebounds. The team with the biggest rebounding differential was Utah, with a +5.4 difference. But how good of a rebounding team are they? Per game, they averaged making 36.1 shots per game out of 76.7 taken, for a 47% shooting percentage. If they had been an average (43.8%) shooting team, they would have only made 33.6 shots, for a difference of 2.5 missed shots per game. They were actually a very good offensive rebounding team, getting 34% of the available offensive rebounds, but even at that rate, their excellent shooting still accounted for 1.6 of their rebounding differential. A similar calculation shows their good defense led to another third of a rebound. So, while they are an above average rebounding team, almost 2 of their rebounding differential of 5.4 is actually attributable to their defensive and offensive proficiency.

So looking at rebounding differential alone mixes in other factors in addition to actual rebounding ability. If you want a stat that solely measures rebounding ability, the one to look at is the rebounding rate. What percentage of available offensive and defensive rebounds does the team actually get? To take another example, San Antonio was actually a below average rebounding team, having rebounding rates on both offensive and defensive rebounding below the league averages. Yet they still ended up with a rebounding differential of +1.2.
Short and handwavey argument

Last item I promised an argument for why the total value of offensive and defensive rebounds should be the same. I thought I had one, but thinking it through a little more I realized my argument didn’t hold up. But, let me still give a very brief one. Imagine a graph plotting the ratio of the total value of offensive and defensive rebounds on the y-axis, against the frequency of offensive rebounds on the x-axis. SO the x-axis goes form zero (no offensive rebounds) to one (no defensive rebounds.) Now, at the midpoint of 0.5, the ratio has to equal one, since there are the same number of each, and since they have the same frequency, they also have the same value.

Further, you can make a symmetry argument that the value on one side of the midpoint has to be the inverse of the value on the other side of the midpoint.

F(0.5 + x) = 1/F(0.5 – x)

There’s nothing magical about an offensive or defensive rebound—their value only depends on the frequency, so if you flip the frequencies (say, from 80-20 to 20-80) then the total values should also flip. Now, unless I’m missing something obvious, it seems like the only simple function that fits these criteria is for the ratio to be a constant, 1, over the whole range, which was the result I got below. So, at the least, it’s a plausible result, and has the advantage of being simple and at least somewhat intuitive.