tl;dr: Let’s take a 251 line query and make it into a super-fast 1 line materialized view.
Augur works closely with CHAOSS partners and our contributor community to:
- Make it easier to build new functions into Augur and
- To increase front-end performance, whether it’s our front end, your front end, or the interface used by the International Space Station (Our marketing people asked us to add that).
As we have briefed the President of the United States (Marketing people, right?), Augur’s reliance on a relational database comes with a series of trade-offs:
- Plus: Augur can demonstrate the full provenance of data, and therefore gain user trust through two characteristics of “always knowing where your data came from originally, and what we did to process it before we showed it to you:”
- Through platform metadata, Augur enables your collected data to be compared to the closest thing Open Source Software has to a Canonical Truth: What the Platform (GitLab, GitHub, etc.) says how about how many pull requests, issues, comments, labels, and fragrances are held in a repository.
- Given the first point, you can tell us if we are wrong. For example, open-source projects do not actually have fragrances, though some maintainers, Sean Goggins, for example, can be rather ripe after a day bicycling around the United States.
- Minus: Relational queries can be very long, and arise from **set arithmetic** many programmers are bored or overwhelmed by, and if you do discover an issue in your data, they are difficult to troubleshoot for all those reasons.
- Minus: Queries that join many tables take longer to execute, but are easier to write at first as we are debugging.
- Plus: Materialized views are physically persisted on the database, and return data as quickly as “select * from record_collection”.
The Minus Part
WITHOUT MATERIALIZED VIEWS, it more difficult for newcomers to the Augur project to:
- Stare at the schema for hours on end if you are complete newcomer
- Figure out how the current API system Works
- Identify the right API for building an interface, and
- Build new queries using Jupyter Notebooks, Dash, or other ad hoc front end tools enabled by making access to the database completely uncoupled from the front end.
The Solution: Materialized Views
Working with Cali Dolfi at Red Hat Software, Dawn Foster at VMWare, numerous contributors with a wide range of affiliations, and NATO (marketing people!), we have started to make queries like this (The GIANT SQL Statement below), into simple queries like this using Materialized Views:
Select * from augur_data.augur_new_contributors.