Tuesday, March 04, 2003

Strange Message


Sorry to all for the long hiatus, and also apoligies for the continuing appearance of a message I wrote for my other blog and inadvertently pasted in here, where it doesn't belong. I deleted it on my edit page and it no longer shows up there, but is still visible on the actual web page.

Anyway, I have a few additional pieces that I'd like to work up but have been too busy lately. Keep checking back occassionally and there should be some new materail here in the next week or two.

Tuesday, February 18, 2003

Improving the ranking: Discounting scoring


As I mentioned below, I’m not happy with the overall scoring system as I think it is overvaluing scoring points, but I wasn’t sure how to adjust for this. I could make up a fudge factor to lower the value of points scored, but that would have been an ad hoc, arbitrary solution. Thinking it over, I’ve come up with something better, although it’s still not perfect.

I’m happy with the way I’ve valued everything else besides points. I think those numbers are good both in themselves and in relation to each other. The question is how to scale the value of points scored to the value of all the other stats. To do this, I went back and looked at the total value of the stats for an averge team. Adding up all their rebounds, assists, etc., I found a total non-scoring value for the team of 38.4 points. Since they scored 94.8 points per game, that means the remaining value from points should add up to 56.4. Now, the efficiency adjusted scoring value for the team was 106.5 points (this value is higher than the actual points scored because of the way I calculated it, by excluding TO’s from consideration. This makes sense to me, as there is some value in simply chucking up a shot, since it avoids the possibility of a turnover that possession. You can’t score unless you shoot, so even a bad shot has a slight advantage over a pass, in a certain sense.)

Anyway, this gives a normalization factor of 0.53 for the points scored, discounting my previous value by almost a half. I’m still not entirely happy with this, since it’s scaling everything onto offensive production, when properly speaking blocks and steals are defensive contributions, as are defensive rebounds. But removing them from the calculation above still leaves me with a system that, in my opinion, greatly overvalues points scored. While including them gives numbers that I find more reasonable. So there’s a philosophical issue there, but practically it seems to work. And since I’m still not sure how to factor in defense, I’m left with lumping everything into an offensive contribution.

Anyway, with the new, adjusted ranking systems, the top players in the league grade out as follows:

Duncan:27.2
McGrady: 26.6
Shaq: 26.0
Kobe: 24.6
Garnett: 24.2
Pierce: 23.3
Nowitzki: 22.4
Webber: 22.1
Jermaine O’Neal: 20.8
Steve Francis: 19.8
Ben Wallace: 19.5
Payton: 18.6
Iverson: 18.34
Kidd: 18.29
Stackhouse: 17.8
Antoine Walker: 17.2
Jordan: 16.2

So, Kidd still grades out lower than I think he probably should, but it’s better than before. And I think the problems that remain are simply intrinsic to the process of boiling everything down to a single number. Some thoughts on that in the next installment.

Monday, February 17, 2003

Should MJ have been an all star?


There have been some debates on this point, and a common argument was that MJ isn't even the best player on his own team--that Stackhouse deserved a spot on the all star squad more than Jordan did. I decided to look at them, and found:

MJ: 25.8

Stackhouse: 30.4

So Stack measures out as significantly better than MJ. The difference is entirely attributable to scoring. Stack scores 3 more ppoints per game, and does it in more efficient fashion, so his adjusted scoring value comes out to 26.7 ppg, while MJ's is only 20.4. While Jordan shoot a slightly higher percentage, he gets to the line a meager 4 times per game, while Stack gets there 7.6 times per game, a significant advantage which pushes him ahead of MJ. In other areas, MJ comes out slightly ahead with twice and many steals (1.6 to 0.8) and 0.7 fewer turnovers per game.

On a side note, I'm starting to think the grading system I'm using is over-valuing scoring. In the other stats (rebounding, assists), I was discounting their value somewhat on the idea that, even without that particualr rebound or assist, the team still might get the ball or a basket. But I'm not doing the same thing with points, instead giving the entire value for points scored, and further adjusting for efficiency. But a system which scores Jerry Stackhouse as better than Jason Kidd strikes me as being questionable. Either I'm over-rating scoring, or else the intangibles that are not being accounted for have a very large effect. I need to think abut this some more.

Sunday, February 16, 2003

A few more players, for comparison

Pierre Pierce: 38.5

Antoine Walker: 27.9

Ben Wallace: 22.9

Steve Francis: 32.8

Yao Ming: 24.3

Jermaine O'Neal: 31.3

Gary Payton: 29.3
Putting it all together


So, I've now determined valuations for points, rebounds, assists, blocks, steals, and turnovers. Defense I'm going to leave aside for now, until and unless I figure out a good way to measure it. (However, I think for most players defensive differences are not that big of a factor. Defense is a learned skill rather than an innate one. Certainly, some players are better at it than others, but the overall FG% allowed by a team is first and foremost determined by the team defense that is taught and implemented by the coaches. Individual effects are, I think, secondary. The example of Dan Majerele is an example. For most of his career, he was a gunner on offense who never paid much attention to defense. Then, we he went to Miami and had Riley coaching him, he suddenly became an excellent defender.)

The only remaining factor is fouls, which will also be a small effect. Their biggest impact is in limiting the number of minutes a guy can play, and that will show up in his other averages. A secondary effect is getting the other team into the bonus more, but my gut feeling is that, on average, this will be a small effect. Similarly, drawing lots of fouls will show up primarilly in the number of FTA you get, and hence in the points scored numbers. Getting more FTA for the team via the bonus will be a second order effect and so can be ignored for now.

The only other wrinkle is normalization factors. As mentioned earlier, if your team gets more possessions, you're naturally going to score more, get more steals, etc. Similarly, if the opponents miss more shots, you will get more defensive rebounds. If your team misses more shots, you'll get more offensive rebounds. So, for all these stats, we want to normalize numbers to the average team. We'll normalize steals and turnovers to the number of possessions the team averages. We'll normalize blocks to the number of FGA the opponents had. Points will be normalized to the (number of shots taken + FGA/1.9), the number of offensive attempts. Similarly, for the calculation of the adjusted point value, we'll normalize each player's FGA and FTA to the league averages. Assists will be normalized to the number of made FG's the team has.

So, let's take a look at some players in the NBA this year. The usual debate is who the MVP should be. The main candidates are Shaq, Kobe, Tracy McGradee, Kevin Garnett, Tim Duncan, Jason Kidd, and Allen Iverson. Let's see how they come out. I'm just going to quote the bottom line numbers, with some comments.

McGrady: (30.7 PPG, 6.7 RPG, 5 Assists): 43.9
McGrady grades out as the best all around player in the NBA. He does everything well, and scores in bunches for a team that actually averages fewer possessions than is the norm.

Duncan: (23.7 PPG, 12.9 RPG, 2.9 BPG) : 40.4
Duncan squeeks into second place here. Despite his relatively low scoring average, he scores very efficiently which makes up ground. His excellent rebound and block numbers finish closing the gap, while his very respectable 4 assists per game also keeps him among the elite.

Kobe: (29.6, 7.1 RPG, 6.5 Assists): 40.3
Kobe's all around game adds up here. His rebounding numbers are excellent and he gets plenty of assists, too, to go along with his scoring. Basically the entire difference between he and McGrady is due to Tracy's higher scoring. And while he gets more assists and slightly more rebounds than McGrady, he also averages 1.2 more turnovers per game.

Shaq (25.9, 10.6 RPG, 2.3 Blk): 40.2
Despite shooting 10% better than most of the other candidates, Shaq's efficiency-adjusted scoring does not increase much more than the other players under consideration, because of his poor FT percentage. That keeps him out of the top despite, despite excellent offensive rebounding and block numbers.

Those four are clearly the class of the group, with a significant drop off to the next bunch.

Garnett (22.2, 12.8 RPG, 5.7 assists): 36.2
The difference between Garnett and the players above him is entirely attributed to his lower scoring. If he averaged 25 points per game, he would also be in the low 40's. He suffers in comparison to Duncan because he isn't as efficient a scorer and because Duncan blocks more shots.

Nowitzki (23.5, 10 RPG): 35.3
I was surprised Dirk was this high up. Although his rebounding numbers are good, of those 10 rebounds only 1 per game is an offensive rebound, which means those rebounds aren't as valuable as they look. But he picks up points here and there, with 1.5 steals per game, 1.2 blocks, and obviously his scoring. He also only turns it over 1.7 times per game.

Chris Webber (23 ppg, 10.5 RPG, 5.5 APG) 35.3
Webber isn't higher because his offensive efficienct is terrible for an elite player. He barely averages any more points per offensive attempt than the league average (even Iverson beats the league average by a fair amount), largely because he only gets to the line 6 times per game.The top tier players get there 8-10 times per game.

Iverson (26.8, 4.8 Assists, 2.53 steals) 31.2
Iverson, despite only shooting 40% from the field, is still more efficient than the league average, in large part because he gets to the line 8+ times per game and shoots well from there. His stelas help his bottom line score as well, making a case for him to be among the elite despite not averaging a lot of assists or rebounds. But his relatively low offensive efficiency compared to the others on the list, combined with his not piling up big numbers elsewhere, keeps him well below the very best in the league.

Kidd (19.6, 8.4 assists, 5.8 RPG) 28.7
I was surprised Kidd was so far down the list, but his lack of scoring killed him, compounded by a low shooting percentage. And although his other numbers are good, 8 assists and 6 rebounds don't stack up against the rest of the list. His 3.5 turnovers per game also hurt him.
Blocks


The last easily measured statistic is the block. (The value of defense is, just as with baseball, hard to get a handle on.) The exact value of blocks is also difficult to quantify. At the first level, it’s straightforward—a block coverts a shot attempt into a miss. Since, on average, there is 1 point scored per shot attempt (which is not the same things as the previously calculated points per possession—they just happened to come out to the same value), it seems like a block should be worth 1 point. But there’s more to it than that. First, many (most?) blocks occur around the basket, so the shots being blocked might actually be higher percentage shots than the average shot. And in addition, a good shot blocker can alter shots, making players miss without actually blocking the shot. And finally, a great shot blocker can go even farther, intimidating opponents and discouraging them from coming into the lane at all, resulting in more outside shots and a lower overall shooting percentage.

To try and tease these effects out, I looked at the data in the same way as with assists. The question is, what is the effect of blocks per FG attempt on the opponents FG%? What I expected was that, as with assists, the noise in the signal would overwhelm the underlying trend in the data. But to my surprise, the data came out very cleanly, as can be seen in the following figure.



As can be seen, there is a clear trend, that can be approximated with a linear fit, with a slope of nearly exactly -1. (The linear fit won’t be valid over the entire range of possibilities, but because actual team all have blocks per FGA in the region between 0.04 and 0.09, a fit that works locally is good enough.) What his means can be seen if we look at what we’re plotting. A linear fit with a slope of negative one menas that:

FG% = FGM/FGA = Base% – Blocks/FGA

Multiply by FGA and you see that this means every block reduces the number of shots the opponents made by 1. Assuming that almost all blocks occur on two point shots, this means that each block is worth 2 points, which is a remarkable result. Getting 5 blocks is thus, according my calculations, worth as much as 10 assists and 8.5 rebounds combined.
Steals and Turnovers


Steals and turnovers are very easy to evaluate. Each steal takes away a possession from the opposing team. Similarly, a turnover costs your own team a possession. Thus, each steal or turnover is valued at 1 point, the value of a single possession.
The value of points


So, I’ve gone through and developed a valuation system for rebounds and assists. The next step is to determine the value of scoring. At first, it seems simple. Each point somebody scores ought to be worth a point, right? Almost. Yes, each point is worth a point, but there’s also some opportunity cost, since every shot you take is a shot that someone else didn’t take. At an extreme example, you can imagine a player who shot it every single time they touched the ball. They might score 40 points a game, but if they were taking 50 shots to get there, they’d also be hurting the team. So we need to determine a way of factoring in the offensive efficiency of a player.

To do this is, I think, fairly straightforward. The value of a players points to the team are equal to the number of points scored, adjusted by the difference between the points scored and the expected number of points the team would have scored in the same number of offensive opportunities. That is,

Points value = number of points scored – (expected number of points scored – number of points scored)

Where the expected number of points scored is calculated by looking at the number of offensive opportunities the player had (which is slightly different from the number of possession, since it doesn’t factor in offensive rebounds or turnovers. The turnover penalty will be factored in separately.) That number is equal to the shots attempted plus the FT’s attempted divided by 1.9. The expected number of points scored per opportunity is simply the teams total points score, divided by team opportunities (FGA + FTA/1.9 + turnovers.). Turnovers are included here since, if the player passes up his shot to pass the ball, there is a chance for a subsequent turnover.)

Simplifying, you get the value of points scored as:

Value of points = (Number of points scored * 2) – Expected number of points scored.

So, for example, if a player scored 17 points per game, but the average team would have scored 20 points in the same number of opportunities, then the player’s value from scoring is only 14 points (17 *2 –20 = 14.)

The league average for expected number of points scored per opportunity (and to compared players on different teams you want to use the league average, to avoid penalizing a player for having good teammates) comes out to be 0.876 points per opportunity. That's the number I will use for future calculations, at least for the 2000-2001 season.
The illusion of rebounding differential


I mentioned earlier the error that you can fall into by trying to evaluate the ability of a team’s offense or defense just by looking at the number of points they score (or give up.) I’d also like to point out that another stat beloved by announcers and commentators—rebounding differential—can also mislead you if you’re not careful.

The basic point, which is simple, is that because the defensive team gets the great majority of rebounds (that is, defensive rebounds are much more common than offensive ones), the number of rebounds a team gets will be determined by the number of shots the opponent misses. A team that plays good defense, limiting the opponents FG%, will have more defensive rebounding opportunities, and hence will get more rebounds. Similarly, if a team shoots a high percentage, the opposing team will have fewer defensive rebounding chances.

Let’s do some numbers here. As mentioned previously, the average team gets about 72% of possible defensive rebounds. The team with the biggest rebounding differential was Utah, with a +5.4 difference. But how good of a rebounding team are they? Per game, they averaged making 36.1 shots per game out of 76.7 taken, for a 47% shooting percentage. If they had been an average (43.8%) shooting team, they would have only made 33.6 shots, for a difference of 2.5 missed shots per game. They were actually a very good offensive rebounding team, getting 34% of the available offensive rebounds, but even at that rate, their excellent shooting still accounted for 1.6 of their rebounding differential. A similar calculation shows their good defense led to another third of a rebound. So, while they are an above average rebounding team, almost 2 of their rebounding differential of 5.4 is actually attributable to their defensive and offensive proficiency.

So looking at rebounding differential alone mixes in other factors in addition to actual rebounding ability. If you want a stat that solely measures rebounding ability, the one to look at is the rebounding rate. What percentage of available offensive and defensive rebounds does the team actually get? To take another example, San Antonio was actually a below average rebounding team, having rebounding rates on both offensive and defensive rebounding below the league averages. Yet they still ended up with a rebounding differential of +1.2.
Short and handwavey argument


Last item I promised an argument for why the total value of offensive and defensive rebounds should be the same. I thought I had one, but thinking it through a little more I realized my argument didn’t hold up. But, let me still give a very brief one. Imagine a graph plotting the ratio of the total value of offensive and defensive rebounds on the y-axis, against the frequency of offensive rebounds on the x-axis. SO the x-axis goes form zero (no offensive rebounds) to one (no defensive rebounds.) Now, at the midpoint of 0.5, the ratio has to equal one, since there are the same number of each, and since they have the same frequency, they also have the same value.

Further, you can make a symmetry argument that the value on one side of the midpoint has to be the inverse of the value on the other side of the midpoint.

F(0.5 + x) = 1/F(0.5 – x)

There’s nothing magical about an offensive or defensive rebound—their value only depends on the frequency, so if you flip the frequencies (say, from 80-20 to 20-80) then the total values should also flip. Now, unless I’m missing something obvious, it seems like the only simple function that fits these criteria is for the ratio to be a constant, 1, over the whole range, which was the result I got below. So, at the least, it’s a plausible result, and has the advantage of being simple and at least somewhat intuitive.

Thursday, February 13, 2003

Rebounding, Part II: Defensive Rebounds


Part I is here. While each offensive rebound is worth The situation is different on defensive rebounds. Anyone who’s watched basketball can remember many situations where the defensive team surrounded the ball with everyone boxed out. Then, which player happened to get rebound is irrelevant, as discussed above. So a defensive rebound is not as valuable as an offensive rebound, since it’s more common, and because if a particular player didn’t get the defensive rebound, there’s still a good chance someone else on his team would.

So, I’m going to use the idea of removing the player who got the defensive rebound from the play. The value of the rebound is determined by how likely it would be for the opposing team to get the offensive rebound if he didn’t. But how to quantify this? I’m going to consider the rebound as a loose ball, with each player in the region having an equal chance of getting the rebound. Assume one offensive rebounder, with a chance (from the league average) of 28.5% of getting the rebounder. Now, if each rebounder has an equal chance of getting the ball, that means there have to be 2.5 defensive rebounders (on average, obviously) in the vicinity, each with a 28.5% chance of getting the board, in order to give the total defensive rebounding probability of 71.5%.

Now, what happens when you take away one of the defensive rebounders, when you remove his chance of getting the rebound. Then, it’s only 1.5 to 1 in favor of the defense, and the chance of the offensive rebound is 40%. So, if you remove the defensive player who actually got the rebound from the play, the offense then has a 40% chance of getting the board. So the value of the defensive rebound is 40% of a possession, or 0.4 points.

This might seem a little arbitrary, but it ends up giving a result which is interesting and, on further consideration, very plausible for a completely independent reason. Let’s use these numbers to figure out what the value is of a player who gets a typical 10 rebounds per game. On average, he would be getting 7.15 defensive rebounds and 2.85 offensive rebounds. The value of the offensive rebounds is easy, 2.85 points. The value of the defensive rebounds is 7.15 * 0.4 which come out to, surprise, 2.85 points. This means that, with this formula, the average team gets the exact same value from its offensive rebounds as it does from its defensive rebounds.

And it wasn’t just a lucky coincidence, either. If you go back and do the algebra (which is left as an exercise for the reader), it’s straightforward to show that the methodology I used above to calculate the value of defensive rebounds will, no matter what the base probabilities for offensive and defensive rebounds, produce the result that the average team (and the average player) gets the exact same value from their offensive as from their defensive rebounds.

Now, in physics, these sorts of strange coincidences come up all the time—you work through lots of seemingly aimless math and end up with a very simple or symmetric result. And when you do that, it’s usually a sign that your theory is correct. Now, I don’t know if God plays dice with basketball or not, but this nice symmetric result gives me more confidence in my methodology. And in the next post, I’ll explain why I think this result makes sense.


Rebounding, Part I: Offensive Rebounds


Next up is considering how valuable rebounds are. Rebounds are valuable because they give you possession—they convert a loose ball, specifically a missed shot, into a possession for your team. For offensive rebounds, this can be analyzed fairly easily. Offensive rebounds are unusual—in the 2000-2001 NBA, they amounted to only 28.5% of all rebounds, just a little over a quarter.

At this point, since our goal is to evaluate individual players, we need to drop a conceptual level from the team to the individual. At the level of the team, all rebounds are equally valuable, since if the team got one less rebound, that means they got one less possession. But looking at what the actual contributions of individual rebounders were, this is not the case, as I hope will become clear in the following discussion.

Since they are rare, we can start out by giving each offensive rebound the value of a possession, 1 point. Now, you might object that in my previous definition, I included in a possession all the subsequent offensive rebounds, until the other team got the ball back. Now, they’ve gotten an offensive rebound, and I’m giving that the value of the entire possession?

Yes, since, once you get the offensive rebound, you’re back in the same situation you were when you first brought the ball over the timeline. You have the ball and a full shot clock, so the expected number of points you will score will be identical in the two cases, as long as the probability of future offensive rebounds is not affected by the fact that you’ve already gotten one. And it shouldn’t be.

A second question would be why the rarity of offensive rebounds should affect their value. Isn’t a rebound a rebound? Yes and no. The thing is, the more common an offensive rebound is, the more likely it is that some other player on your team will get it if you don’t. The value of the rebound is proportional to its scarcity, which is why our intuition is that offensive rebounds are more valuable than defensive ones. To understand this, consider the extreme case where the defensive team gets 100% of the rebounds. If this is the case, it doesn’t matter at all which player on the team actually gets the rebound. If there’s no chance the opposing team will get the ball, then the individual value of a rebound is nothing—the team would not be hurt at all if a specific player didn’t get the rebound.

Most offensive rebounds involve one offensive player who’s managed to sneak in to get position, or who simply beats the defenders to the ball. So it’s not unreasonable to assume that, if the player who got the offensive rebound didn’t get it, one of the defenders would. So the value of the offensive rebound is an entire possession, 1 point.

Wednesday, February 12, 2003

What’s a possession worth?


Before tackling the worth of rebounds, I first need to look at the value of possessions. The best and simplest way to analyze the value of rebounds (and steals and turnovers) is by considering them as adding or taking away possessions from your team. And on each possession, you can expect to score a certain number of points.

So, there are two jobs here. The first is to define exactly what a possession is, and the second is then to determine what the average number of points you can expect to score on each one. First, the way I’ll define a possession will be the time your team gets the ball, all the way up until the other team gets it back. Pretty standard and straightforward, but this means that, at least for the purpose of this number this number, offensive rebounds don’t add possessions. This makes sense, since to determine the cost of a turnover, for instance, we want to know what the total chance there was that you’d score some points before the other team got the ball back. That includes getting offensive rebounds, since the turnover eliminates the possibility of getting second chance points just as it did first chance points.

So, the total number of possessions a team has can be calculated as

(FGM + Opp Def. Rebounds + Turnovers + FTA/1.9)

The 1.9 factor on free throws is a rough estimate. Since, in the NBA, there is no 1 and 1, it’s easier, since the only time you’ll shoot a single FT will be on a 3-point play attempt. I’m estimating that happens about 10% as often as a regular shooting foul or bonus attempt. This could be off a bit, and ideally I’d have the data giving me this number. But I don’t, and regardless the effect of a small error here won’t be great.

Then, you just take the total points scored by the team, divide by this number of possessions, and you get the points scored per possession. In 2000-2001, this number was 0.986 points per possession. Obviously, it will vary somewhat from year to year. To keep the formulas simple, I’ll approximate this as 1 point per possession.
What is the value of an assist? Conclusion (for now)


Scroll down to see the previous two installments. So, I’ve started on the effort I mentioned in the previous post, to try and plot team stats game by game, to see what effect, if any, a high assist rate has on shooting percentages. Unfortunately, this was incredibly tedious. ESPN.com has splits for the performance of each team against each other team, at least for the current season. While not ideal, this at least gives me 28 data points. But it took a long time to cut and paste all the relevant data over into Excel to try and use it. I looked at two teams, Atlanta and Dallas, and it probably took me an hour or more. This wouldn’t be too bad, if the data was worthwhile. But it was just as useless as the data presented in the previous posts. I didn’t feel like doing a lot of work for nothing, so I stopped on that.

However, I believe the underlying concept is still good. The effect of a good pass is to increase the probability of making a basket, so the value of an assist is increasing the points your team will score by increasing their FG%. But by how much? Well, there’s another way to attack the problem.

I have the complete season stats from 2000-2001, which gives me the FGM, FGA, and assists for the “average” team in the NBA that year. Per game, the numbers are (35.7, 80.6, and 21.08) Now, assume a simple model, where all shots with a good pass have the same FG%, and all shots without a good pass have the same, but lower FG%. But we know the number of shots taken, and the number made with (and hence without) good passes. Since everything has to be self consistent, if you guess what the FG% ought to be without a good pass, then you can calculate the FG% with a good pass.

As it turns out, one such possible pair is 39% FG% without a good pass, and 50% with one, which matches up pretty well both with my gut feeling (players playing one-on-one probably shoot a little below 40%) and with my guess about a data fit from below. So, in the absence of better data, I’ll take this as a starting point for further analysis. I’ll keep looking to see if I can find new data, and will let you know if I do.

But for now, let’s use these numbers. What these mean is that approximately 1/4 of the shots that were made with an assist would not have been made without the assist. Thus, each assist is worth 1/4 of 2 points, or half a point. Including three pointers would modify this result slightly, and I know the 1/4 number is slightly off, but the half a point value is nice an convenient. Given the inherent sloppiness of these calculations, I don't think it's worth worrying about.

While this result is not all that surprising to me, given the widespread much higher valuation of assists, perhaps it is surprising to some. Next up, we'll tackle rebounding, and we can see whether the numbers say 10 rebounds per game is more valuable than 10 assists per game. I'm guessing it is, but we'll see.

Tuesday, February 11, 2003

What is the value of an assist? Part II


So, in the previous piece, here, I argued that an assist is worth less, maybe significantly less than 2 points. What it actually does is increase the chances of making a shot. So, how can we figure out from the stats what the actual effect is? A first idea is to simply look at the number of assists a team has and their shooting percentage. More assists should produce a higher shooting percentage. So you plot those, look for correlation, and voila!

However, it’s not that simple, because there’s also correlation the other way. That is, a high shooting percentage will result in more assists. Since your team is making more shots, there will be more opportunities for assists, and all else being equal, there will then be more assists. So just plotting those two variables, it’s impossible to tell how much correlation is from assists leading to high shooting %, and how much is high shooting % leading to assists.

So, we need some way to isolate the effect that the assists are having on the shooting percentage. To do this, I plotted the shooting percentage by team versus the number of assists per made basket. So, the x axis goes from a low of zero (no assists) to a possible maximum of 1 (an assist on every single made basket.) If the data is clean enough, this plot should give us exactly what we want. The intercept at x=0 is the baseline shooting percentage without any assists. The intercept at x=1 is the average shooting percentage with an assist. The difference between these two values is then the value added by the assist—how much it increases the probability of making a shot.

While this is a good idea in theory, unfortunately the data don’t look so good. If this works, below you should see a plot of the data as I describe above, where each point represents one team’s results for the 2000-2001 season. And it basically looks like noise; there’s not a clear trend that could be fitted. The biggest reason is that the data is too tightly confined. Almost all the teams shot between 42 and 47% from the field, and they mostly fell between 0.55 and 0.7 assists per made basket. (The outlier in the top right is, as you might have guessed, Utah, with a FG% of 47% and 0.71 assists per made basket.)



What this plot tells you is that the effect of assists on the FG% is swamped by the team-to-team differences in base shooting percentage. That is, if you have a bunch of bad shooters on your team, then getting a good point guard isn’t going to magically turn you into a good shooting team. The spread in the data can also give a rough upper bound on the effect of assists. If we ask how big an effect would be noticeable on this plot, even given the underlying noise, then we’ll know the actual effect of assists has to be less than that, since no such effect is seen.

To do this, let’s plot it out on a wider axis. Looking at this, you could maybe convince me that the data should be plotted on a line from 38% to 48%. Now, If I were getting paid for this, I’d actually sit down and do the real work to figure out how big the effect would have to be to be really noticeable, but I’m lazy, so I’m just going to guess that the data shown wouldn’t be consistent with a jump from 30 to 60—it’s not doubling your FG%.



So, this idea didn’t turn out so well, although it wasn’t completely uninteresting. The next step is to try and remove the intrinsic spread due to differing teams shooting a different percentage. To do this, what I’d like to do is to plot out, just as in the above figures, FG% vs. assists/made basket, but do it for each team individually, plotting each game. This removes biases that might be caused both by individual team shooting as well as by offensive styles. (Different types of offenses might tend to produce more or less assists.) It will also hopefully give points spread across a larger range on the x-axis, reducing the errors caused by trying to fit a line to several points that are very close together.

Incidentally, it occurred to me that the above data might be skewed by the three pointer, since teams that shoot more three pointers would have lower shooting percentages. So I pulled out the 3-point data and looked just at 2-pointers, and it looked pretty much the same.