Redash Python functions
November 18th, 2016
We’ve been using Redash at Sentry lately. I’d never heard of it before (Mode being the frontrunner in this category), but it’s damned good so far. And before you ask me about Metabase, it’s a totally different use case.
One of the best features is the Python query runner, allowing you to manipulate data from your SQL queries. You can then output to visualizations, csv, etc. The global python functions aren’t obvious (I looked through their code for them), documenting here for others. I also make liberal use of print
, which the re:dash guys have thoughtfully made print to the page.
result
At the end of the query, re:dash looks at the variable result
. If it’s in the right shape, it’ll be show in the table below. The result
is a dictionary and is the same that is returned by execute_query
and get_query_result
. The format should be like this:
result = {
"rows": [
{"user_id": 1, "anonymous_id": "s2f1"},
],
"columns": [
{
"name": "user_id",
"friendly_name": "user_id",
"type": TYPE_INTEGER,
},
{
"name": "anonymous_id",
"friendly_name": "anonymous_id",
"type": TYPE_STRING,
}
]
}
The available value types are:
- TYPE_INTEGER
- TYPE_STRING
- TYPE_INTEGER
- TYPE_FLOAT
- TYPE_BOOLEAN
- TYPE_DATE
- TYPE_DATETIME
TYPE_DATE and TYPE_DATETIME are both actually strings, not python datetime
objects. Use strptime
to convert them. This will vary by database, but for Postgres, I made a helper function:
def convert_to_native_datetime(result):
import datetime
datetime_columns = [column['name'] for column in result['columns'] if column['type'] == TYPE_DATETIME]
date_columns = [column['name'] for column in result['columns'] if column['type'] == TYPE_DATE]
for row in result['rows']:
for column in row:
if column in datetime_columns:
try:
row[column] = datetime.datetime.strptime(row[column], '%Y-%m-%dT%H:%M:%S')
except ValueError:
# in case format includes milliseconds and timezones
# no currently handling for this (all our data is stored as the same timezone)
row[column] = datetime.datetime.strptime(row[column].split('+')[0].split('.')[0], '%Y-%m-%dT%H:%M:%S')
elif column in date_columns:
row[column] = datetime.date.strptime(row[column], '%Y-%m-%d')
return result
execute_query
Let’s you…well, execute a query within Python.
result = execute_query("<data source name>", "your SQL query")
get_query_result
Grabs the results on a previous query. I prefer this, as it means while I’m debugging a python script, I’m not hitting the database over and over again.
get_query_result(<query_id>)
add_result_column
If you want to create a new result
from scratch (and just append data to it), the base result dictionary is just {"columns": [], "rows": []}
.
add_result_column(result, "<new column name>", "<new friendly name for column>", "<column type>")
add_result_row
values
should be a dictionary with the keys being column names.
add_result_row(result, "<values>")
The current result format isn’t vey friendly to data analysts. At some point, I’ll probably create a helper function that allows us to work in something closer to data frames (like R) or maybe figure out how re:dash and pandas works together.
Importing the standard library
TYPE_DATE and TYPE_DATETIME still return strings, so you’ll need to import datetime
. If you’re doing this within a Python function, because of how RestrictedPython works, you’ll need to import it from within the function. Top-of-query imports don’t work, as it seems like every function has its own namespace (as if it was its own file).
Does NOT work
import datetime
def hello():
print(datetime.datetime.now())
Fixed!
def hello():
import datetime
print(datetime.datetime.now())