SQL for Active Cohorts
October 24th, 2018
Typical query for tracing out what percent of organizations are still active (members are still using the product) for X day. Key here is the generate_series
, which generates separate rows for each value, with all other columns duplicated. This query is written for Postgres, but should be do-able in most databases.
Assumes you have an organizations
, users
, organization_members
and events
table, and any event with a user_id is an “active” event. Find and replace ‘week’ with whatever interval you want.
-- first 4 weeks
with org_weeks as (
select id, created,
generate_series(created,
least(now()::timestamp,
created + interval '4 weeks'),
interval '1 week') as cohort_date
from organizations
where created = :daterange
),
org_events as (
select distinct org_id, created
from events e
join organization_members om on e.user_id=om.user_id
),
active_weeks as (
select ow.id,
extract(epoch from ow.cohort_date - ow.created) / 3600 / 24 / 7 as week,
count(oe.org_id) as active
from org_weeks ow
left join org_events oe
on ow.id=oe.org_id
and oe.created between ow.cohort_date and ow.cohort_date + interval '1 week'
group by 1, 2)
select week,
sum(case when active > 0 then 1 end) as actives,
count(1) as total,
sum(case when active > 0 then 1 end)::float / count(1) as percent
from active_weeks
group by 1
;