Unions in Postgres

This post was originally posted as “Fun with SQL: Unions in Postgres” on the Citus Data Blog on September 27, 2018.

I am reposting here because unions are a fundamental operator in the data analytics world.


Before joining the Citus Data team to work on the Postgres extension that transforms Postgres into a distributed database, I checked out the Citus Data blog to learn a bit more. And found all sorts of material for Postgres beginners as well as power users, with one of my favorites being the “Fun with SQL” series. After I joined, I jumped at the chance to write a Fun with SQL post, and the only question was what to write about. I chose unions.

In SQL, the UNION and UNION ALL operators help take multiple tables and combine the results into a single table of all matching columns. This operator is extremely useful if you want the results to come back as a single set of records.

You can use UNION to enhance your SQL queries. In past posts we’ve also covered topics like common functions in Postgres, window functions in Postgres and CTEs (common table expressions) in Postgres.

When do you Union in Postgres?

The UNION operator is often used when multiple tables have the same structure but have been split for some reason (usually for performance considerations or archival purposes).

Before we begin, some basic requirements you should know are:

  • All individual queries must have the same number of columns and compatible data types.
  • Mismatched queries will not execute.
  • Validating consistency when building your query is important, so there are no surprises after.
  • By default, UNION only returns distinct values.

One example might be movies where you have a different table for each year of releases. Or if you have different types of users and email address, such as within a CRM like Copper where you have leads and contacts. Your leads table may look like:

leads

|   FirstName   |   LastName    |          Email        |
----------------+---------------+------------------------
| Jennifer      | Matthews      | jennifer@acmecorp.com |
| Tom           | Jones         | tom@acmecorp.com      |
| Daniel        | Farina        | farina@citusdata.com  |

contacts

|   FirstName   |   LastName    |          Email        |
----------------+---------------+------------------------
| Craig         | Kerstiens     | craig@citusdata.com   |
| Lisa          | Wu            | lisa@acmecorp.com     |

To merge these two tables together, you would need to run this query:

SELECT * FROM leads
UNION
SELECT * FROM contacts;

|   FirstName   |   LastName    |          Email        |
----------------+---------------+------------------------
| Jennifer      | Matthews      | jennifer@acmecorp.com |
| Tom           | Jones         | tom@acmecorp.com      |
| Daniel        | Farina        | farina@citusdata.com  |
| Craig         | Kerstiens     | craig@citusdata.com   |
| Lisa          | Wu            | lisa@acmecorp.com     |

Pro tip: once this query runs, the data will not be show in in any particular order. Of course, you can always use ORDER BY to ensure the data is properly sorted.

When do you UNION ALL in Postgres?

As I mentioned earlier, the UNION operator only returns distinct values by default. If you want your result to include all values—including duplicate values—thenUNION ALL is definitely the way to go.

However, UNION ALL also doesn’t guarantee that the resulting rows will be in any particular order. It’s pretty common for the results of subsequent operands to be mixed with results from the first operand. So, just like with UNION, you need to use ORDER BY to sort.

Let’s look at the data below:

pasta

| Ingredient    | Need to Buy   |
----------------+----------------
| Dry Pasta     | Yes           |
| Pasta Sauce   | No            |
| Olive Oil     | Yes           |

pizza

| Ingredient    | Need to Buy   |
----------------+----------------
| Dough         | No            |
| Pizza Sauce   | Yes           |
| Cheese        | No            |
| Olive Oil     | Yes           |

If I simply performed a UNION, the results would be as follows:

SELECT * FROM pasta
UNION
SELECT * FROM pizza;

| Ingredient    | Need to Buy   |
----------------+----------------
| Dough         | No            |
| Pizza Sauce   | Yes           |
| Olive Oil     | Yes           |
| Cheese        | No            |
| Dry Pasta     | Yes           |
| Pasta Sauce   | No            |

However, I am missing the extra Olive Oil record. To display that record, I need to swap UNION for UNION ALL.

SELECT * FROM pasta
UNION ALL
SELECT * FROM pizza;

| Ingredient    | Need to Buy   |
----------------+----------------
| Dough         | No            |
| Pizza Sauce   | Yes           |
| Pasta Sauce   | No            |
| Olive Oil     | Yes           |
| Dry Pasta     | Yes           |
| Olive Oil     | Yes           |
| Cheese        | No            |

Now we see all records in both tables.

Pro tip: at scale, UNION ALL can also be faster to execute because it doesn’t have to filter out duplicates.

How does ORDER BY work?

The ORDER BY operator, as the name suggests, can ensure that your results are always displayed in the order that you specify. Let’s say you wanted to sort the movie data from earlier in an ascending fashion first by release year and then by movie name.

Here’s what the query would look like:

SELECT * FROM movies1972
UNION
SELECT * FROM movies1993
ORDER BY Released, Movie ASC;

| Movie                 | Genre         | Released  |
------------------------+---------------+------------
| Cabaret               | Comedy        | 1972      |
| The Godfather         | Crime Drama   | 1972      |
| Treasure Island       | Adventure     | 1972      |
| Dennis the Menace     | Comedy        | 1993      |
| Groundhog Day         | Comedy        | 1993      |
| Jurassic Park         | Adventure     | 1993      |

Are UNION and UNION ALL even useful for Postgres users?

UNION and UNION ALL are good ways to unify results from multiple queries. Use UNION to see the full set of distinct records, and use UNION ALL to see a full set that includes duplicate records. Just make sure to also use ORDER BY to ensure the results are organized the way you want.

Many of us take the power of UNION and UNION ALL for granted. Maybe because of the simplicity in its execution. For those of you just starting out with Postgres, I predict these operators will soon become part of your trusted and powerful SQL toolkit.


If you have more questions about the Postgres or data analytics, drop me a note.

My Blog | Citus Data Blog