Reply Times with SQL by passing ARRAY to UDFs
September 11th, 2018
How long is a customer waiting for reply?
Supposing you had this simplified table for tickets donating when the message was sent and if the sender was an employee or a customer…
messages
timestamp | sender |
---|---|
1 | customer |
2 | employee |
3 | employee |
4 | customer |
5 | customer |
6 | employee |
What is the total time difference from the first in a sequence of customer messages to the first in the sequence of employee replies? In this case, the customer has waited a total of 3 seconds from time 1 to 2 and again from time 4 to 6.
The difficulty here is that SQL is not very good at conditionally looking backwards. You might be tempted to use windowing or navigation functions here, but the problem is you don’t know what to partition by…because you’re not partitioning by a column.
Instead, I took the approach of using UDFs (which are available in every major database) and using aggregate functions to pass the chunk of the table you care about. In our simplified case, it’s the whole table, but you can always split it by group by
.
I hope others find this useful!
Total Wait Times
create temp function totalWaitTime(x ARRAY<STRUCT<t float64, c BOOL>>)
returns int64
language js as """
var sum = 0
var timer_start = null
for (var row of x) {
if (row.c && timer_start === null) {
timer_start = row.t
} else if (!row.c && timer_start != null) {
sum += row.t - timer_start
timer_start = null
}
}
return sum
""";
select totalWaitTime(array_agg(struct(cast(timestamp as float64) as t, sender = 'customer' as c) ORDER BY timestamp asc)) as total from messages;
Average Wait Times
create temp function avgWaitTime(x ARRAY<STRUCT<t float64, c BOOL>>)
returns float64
language js as """
var sum = 0
var count = 0
var timer_start = null
for (var row of x) {
if (row.c && timer_start === null) {
timer_start = row.t
} else if (!row.c && timer_start != null) {
sum += row.t - timer_start
count += 1
timer_start = null
}
}
return sum / count
""";
select avgWaitTime(array_agg(struct(cast(timestamp as float64) as t, sender = 'customer' as c) ORDER BY timestamp asc)) as total from messages;