Querying BitFields with SQL
March 15th, 2018
tl;dr SELECT * FROM user WHERE subscriptions & (1 << FLAG_POSITION)
BitField is a popular field type, found in both Django and Peewee. They are perfect for efficiently storing user options as Bitfield flags1. Querying them in SQL isn’t as easy as querying boolean fields. Luckily, most database engines provide bitwise operations functions, which can query these fields easily.
Let’s imagine a Bitfield column storing user preferences for marketing emails, where there are many types of emails a user can opt in or out of:
Email Preferences
- Events
- Newsletter
- Promotions
- Billing
Abstractly, these email preference are just a sequence of binary values: 1 or 0. Let’s say our user Eric has opted into our newsletter, but nothing else. That could be represented as False, True, False, False
or more succinctly, 0010 (because even though we read from left to right, our number system goes right to left). 00102 in binary is just 2 in decimals, which can then be stored in the database.
Note: for brevity, I use a subscript is used to denote base.
Quick Refresher on Binary
Events | Newsletter | Promotions | Billing | As decimal |
---|---|---|---|---|
20 = 1 | 21 = 2 | 22 = 4 | 23 = 8 | |
0 | 1 | 0 | 0 | 2 |
1 | 1 | 1 | 1 | 15 |
1 | 0 | 1 | 0 | 5 |
Let’s find more Eric’s: users who are subscribed to newsletter AND not anything else. You’d query for…
SELECT * FROM user WHERE subscriptions = (0 + 2 + 0 + 0)
What if you wanted to find out who was subscribed to newsletter (regardless of other subscriptions)? You’ll need bitwise AND
. Bitwise AND compares each flag (e.g. user’s email preferences) against the value you provide and if both are not 1
, it returns 0
.
Example bitwise AND operation
0110 (decimal 6)
AND 0001 (decimal 1)
= 0000 (decimal 0)
So in our case, looking for anyone that opted into newsletter, we’re looking for any value where the second binary place is 1: xx1x
(I use x
’s to represent arbitrary values.0010
, 1010
and 1111
would all be matches). To find everyone who’s opted into newsletter, we can use the bitwise AND operation against 00102 (or 210).
Opted-In Case
In the opted-in case, we will always get 210.
xx1x
AND 0010
= 0010 (0's cancel out any 1's)
For example, Emily has opted into newsletter and promotions, her value in the database is 610 or 01102. Running the bitwise AND operation produces 210.
0110
AND 0010 (decimal 2)
= 0010
Opted-Out Case
In the opted-out case, we always get 010
xx0x
AND 0010 (decimal 2)
= 0000 (decimal 0)
For example, Jenn has opted into Events, Promotions, and Billing (everything BUT newsletter). Running bitwise AND still results in 0.
1101
AND 0010 (decimal 2)
= 0000
Postgres uses &
for bitwise ANDs, so to find all users who have opt-ed into the newsletter, you run WHERE subscriptions & 2 = 2
. Now, it’s tedious to think how a position might be represented in decimals, so I wrote a more plug-and-play version:
SELECT * FROM user WHERE subscriptions & (1 << FLAG_POSITION)
Note: the bitwise shift left expression (1 << FLAG_POSITION)
will generate the integer for you based on the flag’s position (zero indexed) in the BitField. So in the case of our example, newsletter opt-in is the second flag, so FLAG_POSITION
would be 1. Definely check out the Wikpedia article on bitwise operations if you’re curious how this works.
Also, if you wondered during this article, “What about bitwise OR?”, it can be used to force select values. Postgres uses |
to represent bitwise OR, so 1010 | 0001 = 1011
. The 0s allow the original values to passthrough, unchanged, while forcing the last digit to be 1.
-
Why not simply store flags as separate boolean columns? It takes less space. Each boolean value takes 1 byte, but represents a single true/false. BitFields lets you represent up to 32 flags in the space of 4 bytes (because it’s held in the database as an
int
). If you really need to space savings, you can represent them asbit
s, which “Occupies at least 5 bytes (or 8 for very long strings) plus 1 byte for each group of 8 bits.” As you can imagine, these savings only matter if you know a table will have hundreds of millions of rows and more than a dozen flags. ↩