- Read Tutorial
- Watch Guide Video
OK, I hope you've been paying attention for this section on joins because this is going to be a pretty comprehensive look at how we can put all of our knowledge of joins together and to be able to work with multiple tables and also be able to summarize the data. You remember back when I was talking about inner joins when we combined all three tables and I talked about how some of the results from those were kind of confusing.
Just to give us a frame of reference I'm going to quickly pull that in so as I say select all from users and then we'll say join and also let me alias this. So I'm going to join addresses alias to a and then we have to say on so I'll say a addresses users id is equal to you users id and then we're also going to join our guides so guides g and then we're going to say on g dot guides users id is equal to you dot users id.
SELECT * FROM users u JOIN addresses a ON a.addresses_users_id = u.users_id JOIN guides g ON g.guides_users_id = u.users_id;
Now if we run all of this you can see that we get our full results set back
but there's a little bit of a problem here especially if you're doing this for reporting purposes and that is that the data can look kind of confusing. You notice for one thing they're missing a bunch of our users which may or may not be a good thing except that there is one little problem here and it depends on what type of data that you want.
So, for example, we only have two users that were shown and that is because our other sample users either only have addresses associated with their names or they only have guides but they don't have both and the only users that will show up in a multiple join inner join type table query are going to be the users that have data and a reference in all of the tables.
When that is a case that's perfectly fine and you can use this. I will also say there are some other parts of this that are a little bit confusing such as having all these duplicates all over the place. This might be kind of hard to tell which is which value belongs to which person
and definitely hard to summarize.
So that is why I want to walk through this very very much more complex query because this is probably going to just be met with a lot of confusion if you're doing this and giving it to an executive not only because you have duplicate users but you also have duplicate addresses and duplicate revenue numbers which really could be confusing if they're trying to do this for financial purposes.
So let's look at how we could create a summary report with all of these tables and so this is going to join all three. My goal is that I can have all of our users so every user that is going to be in the database and I want to show totals for how many guides they've written and how many addresses they have in the database and that may seem like it would be a relatively simple thing to do.
Until you actually start getting into it you'll see it's pretty complex but we'll take it one step at a time and I think this would be this will be a really good exercise for learning how everything regarding joins really fits together. So I am gonna say select and do indention because this is going to be a very long query and so I'll say select you dot users e-mail and then I'm going give an alias and say that this is going to be e-mail.
Now, this next line is going to use a function calledCOALESCE
. This is going to allow us to count but it's so this is very similar to the count function except it allows for us to not count values that are null. I'll explain in a little bit more detail what that actually means so I am gonna say g dot guides count and then 0. So this is a function that takes two arguments and I'm going to say guide count and actually, I'm going to change this to guide count.
So coalesce is a way of being able to add rows and be able to ignore null values. So what would happen in a situation where you create multiple joins, you would create joins and you try to use the count function. The problem with that is every one of those counts is going to count even if a null record is there.
So this is something I think that really requires a look at more than just words. So let's try to do this and I'm going to use our example up here and just copy it down. So if I were to do one of these types of joins and let's say that I would make a left join for both of these I believe this would be the right way to do it. Now if I hit return.
Yes, so this returns all the values. Now if I said instead of select if I said that I wanted to count the g dot guides id and I wanted to count the g dot or let's see for this one the addresses a dot addresses id if I wanted to do this and then I want to also group by the user. So I can say u dot users e-mail. So if I wanted to do all of this and then at the very end group all of this is going to get us what seems like working results. So I can say, group by u dot users e-mail.
Let's see if this works. I know it's not going to work from actually being an accurate function but let's see if it works and giving us a result set and yes it does. So you may think if you wrote your query like this would work. We're selecting the user's email and then we're counting the number of guides and then we're counting the addresses.
As you can see it seems to be accurate. So we have Dave who has a count of one guide and John Snow has a count of one guide and no addresses so far so good. That's accurate. Myself I have a count of zero guides and I have a count of one address also so far so good. Everything seems to be working but then we run into a little bit of a problem.
So, Christine, it shows that she has six guides and six addresses and that is where we hit the snag. It's because the count function when it is in a join is going to count every single time that that user showed up in this case. So in other words, if the user had three addresses or three guides and two addresses but null value it's still going to count all of them.
So if it had a bunch of null values lets actually take a little bit more and dive into a little bit more of an introspection on what this returns back. We have a users e-mail the count and we're joining each one of these let's not actually group it. Then you're going to see what date what the actual data is. Now if I run this, we have a little bit of an error aggregated query, I need to remove these counts. Let me run and just return everything.
The reason why that wouldn't work is the when you return a count you need to group it by something. When I remove the group that didn't work. So now if I run this.
OK. Now, this brings our full results set and this is what you'd expect. But now this is where the problem lies when using count. So count right here for Kristine is going to count each one of the times that she shows up in guides and in the addresses. The problem with that is she doesn't represent all of those values.
So she shows up six times or I should say each one of those has six but she doesn't actually own all six of those. Let's sort it by if you want to get a little bit more clear idea. So say order by a dot addresses city and run the query again. So now let's actually take a look at this.
Ok so now we have our results set here and you can see that Kristine shows up in the city count here and so we have one two and then three, four, five, six. She shows up six times. The problem is a bunch of those are duplicate values because whenever you're joining three tables together remember it's joining them and there's going to be duplicates all over the place, count isn't smart enough to know how to process that and count just counts up every time they showed up regardless if it was a duplicate or not.
So you know that was kind of a lot of in-depth discussion on why count doesn't work. Hopefully it kind of makes a little bit more sense now and if not definitely feel free to play around with it a little bit more.
But essentially what my main goal for this is to show you why Coalesce is important, because Coalesce is going to ignore those duplicate values and it's instead where it would have been a duplicate. It's just going to give a zero to that value. So the guide count is actually going to be accurate, so we did that for guides.
Now we also have to do this for addresses so I'm going to create one called address count and we're going to give it an alias here. Just to also make sure that we stay we stay with our capitals we're going to do that and that's all we want to return is the user's e-mail the total number of guides and then the total number of addresses like you saw when we went with our naïve example that did not work when we had users that had guides and addresses.
So now that we have that now let's set our root tables so I'm going to say users and have an alias of u and now we're going to have multiple subqueries. They're going to be joined together with a left outer join and I'm going to put parentheses because remember that's what we do with subqueries.
Now inside of this subquery, I'm going to select count and we're just going to count all as guide underscore count and then we're also going to grab the guides users id. So that's what we're going to be selecting here, is the count and then the user's id. Next we just tell it the table which is guides and then we need to group by. So we're going to group by the guides users id and that's it. We're also going to then give this an alias of g.
Remember this is how coalesce is going to know what we're talking about with the join, we're saying this is going to be g and then we can grab guides count. So we're able to grab this value because we created this alias here. So this is just going to grab the count value.
Now with that in place now we have to say on g dot guides users id and then u users id just like normal. Now with that in place now we can do our next join. So you say left join and inside of this say select count all and you may have guessed what we're going to do here it's pretty much identical to what we did above except now with addresses.
So we say address count addresses users id from addresses and then group by addresses users id and then just close off the parentheses and then give an alias of a for address and once again this is going to be on a dot of addresses users id equals u dot users id.
Last but not least we're going to order by u dot users email and then close it off with a semi colon. So wow. Yes this is pretty intense. Let's I'm going to clear off what we have there and let's take a look at this with a little bit more of a view. So this is a giant query. There's no way around that. We are selecting the e-mail, we're using the coalesce function to grab the guide's count and the address count but we're ignoring the duplicate.
And then from there we're creating to joins a left join to grab the count of guides per user and then a left join that's going to grab the count of addresses per user and then we're going to order it by the users e-mail. So if everything I typed in worked then we should have our summary report running that. We have a little bit of an error unknown column addresses user's id.
Ok it looks like we have a typo would be my guess and see where with that. And I see it's right here Id not d. Ok let's try it again. And now that worked right here.
You can see we now have an accurate summary report. So now we have Dave at test dot com with one guide. We have John with one guy that's accurate. We have myself with no guides but one address. And then Kristine now it shows that she has the correct amount she has three guides and she has two addresses.
All of this is accurate. If you go all the way down right here updated test dot com it has three guides and one address and if you check in your database you'll see that all of those counts are actually accurate. Now we know that was a ton of code and that may seem incredibly complex and confusing and if so that's perfectly fine, if you've never built a giant query like this then this is going to look very weird.
But I want to also kind of reinforce the importance of understanding and working through this kind of thing because this is a relatively common task that you're going to want to be able to build. So if you're building a SQL reporting engine being able to grab the counts on multiple tables and being able to do it via a join table is something you are going to have to become familiar with. There are a lot of kind of hidden gotchas.
When I initially wrote up the solution for this I use the more naive example but it worked and then forgot that the it was going to duplicate the data with the count. Once I saw that then I had to work through it research and find exactly what's the best way. When you have three tables and you need to count up the values how would you implement that and that's where I ran into the coalesce function.
So this is something that is very helpful. I can promise you this is in the non-trivial category when it comes to building SQL queries. One of my best recommendations if this feels very intimidating or confusing is copy this code right from the show notes and work through it. Copy it in to SQL. Test it out and change things so you know. There we have all kinds of different table elements here. Change them see what happens see the difference when you change it from a left join to a right join.
See what happens when you use count instead of coalesce all of those kind of things because that's really how you're going to be able to work through and truly learn the concept. It's just being able to perform this query is great but there's a very small chance you're going to do something identical to this.
It's more important to understand the underlying concepts on what's happening here and how you can do things such as joining three tables together and treating them all independent. The reason why I chose this example is because it's really a combination of everything we've learned in this course.
We're using select statements we have multiple sub-queries we have different types of joins we're using multiple functions we're using aliases we're using group by we're using order by. I mean there's all kinds of different components that are all happening here and I really wanted to give a good example on how those could all be worked together to build a reporting engine and that's exactly what you did. So very nice working going through that.
Code
USE devcamp_sql_course_schema; SELECT * FROM users u JOIN addresses a ON a.addresses_users_id = u.users_id JOIN guides g ON g.guides_users_id = u.users_id; SELECT * FROM users u LEFT JOIN addresses a ON a.addresses_users_id = u.users_id LEFT JOIN guides g ON g.guides_users_id = u.users_id ORDER BY a.addresses_city; SELECT u.users_email AS 'Email', COALESCE(g.guide_count, 0) AS guide_count, COALESCE(a.address_count, 0) AS address_count FROM users u LEFT JOIN ( SELECT COUNT(*) AS guide_count, guides_users_id FROM guides GROUP BY guides_users_id ) AS g ON g.guides_users_id = u.users_id LEFT JOIN ( SELECT COUNT(*) AS address_count, addresses_users_id FROM addresses GROUP BY addresses_users_id ) AS a ON a.addresses_users_id = u.users_id ORDER BY u.users_email;