Under the Hood with Augur’s Data

Augur is a Community Health Analytics for Open Source Software software tool that collects, organizes, and validates the completeness of open source software trace data from GitLab, GitHub, and any standalone Git repository. There are four parts to Augur that make its rich collection of data available:

  1. Augur itself, which is actually two parts:
    1. Data collection and,
    2. a standard, rudimentary dashboard.
  2. Augur Community Reports, which allow open source program offices, community managers, and contributors to query their copy of Augur in complex ways. For example, to examine the stickiness of new contributors, or to understand how change requests are responded to, merged, or declined.
  3. Augur’s License Scanner, which inventories all the licenses declared in an open source project, and flags licenses that are not OSI Compliant (not really open source).

All parts of Augur are stored in a relational database composed of 60+ tables grouped into ten sections:

  1. Repositories
  2. Repository Groups
  3. Commits
  4. Change Requests
  5. Issues
  6. Licensing
  7. Messaging
  8. Dependencies
  9. Artificial Intelligence Insights
  10. Operations (utility)

Data accuracy is validated against platform metadata stored in the repo_info table. Below is a brief visualization of each, and here you can find a full copy of the conceptual schema:

  1. 20200924-augur-conceptual-Schema.png (opens in a new tab)

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.