SQL UNION vs UNION ALL – Combining Datasets

We can combine similar datasets using the UNION or UNION ALL keywords between select statements. These are known as set operators. The key differences between the sql UNION vs UNION ALL operators are:

  • UNION – Combines like queries and returns only unique records
  • UNION ALL – Combines like queries and will return duplicate records if they exist.

As an example, let’s say that we need a list of all names of customers and employees. This data currently sits in two separate tables so we need to combine these sets of data with the UNION operator.

SQL Union Example

Using the UNION operator means that our dataset will return a list of unique names between the two select statements. As mentioned, using UNION ALL would allow duplicate records to return. If the Customer table had a ‘John Smith’ and the Employee table had a ‘John Smith’ then we would get both records returning had we used UNION ALL. Using the UNION operator will eliminate these sorts of duplicates.

sql venn diagram union
Mistakes to Avoid

The union operators only work if we have the same number of columns between select statements. Larger union queries may be harder to spot, but executing a query with a mismatched number of columns should produce an error.

sql union error number of columns mismatch

We want to also be sure that the datatypes are matching, ex. INTEGER to INTEGER, VARCHAR() to VARCHAR(). Not all databases will care about datatype mismatches such as SQLite, but others will such as SQL Server. If the data types are mismatched, but the columns we are selecting are what we need then we can use the CAST() function to change the necessary fields so data types line up.

Don’t forget these UNION requirements:

  1. Same number of columns in each SELECT statement.
  2. ALL datatypes must match in alignment to the SELECT column order.
  3. Only use the ORDER BY clause after all unions.

One last note, we can combine as many UNION or UNION ALL operators as we want. As long as we follow the above rules with each additional select statement, there are no limits. Since we can combine data-sets with these operators, they’re known as set operators. These are not the only two. We also have MINUS and INTERSECT, which we’ll cover in the next sections.

Next Section: MINUS Operator – Subtracting Datasets