1 Introduction
Community managers take a variety of perspectives, depending on where their communities are in the lifecycle of growth, maturity and decline. This is an evolving report of what we are learning from community managers, some of whom we are working with on live experiments with a CHAOSS project prototyping software tool called Augur ( http://www.github.com/CHAOSS/augur ). At this point we are paying particular focus to how community managers consume metrics and how the presentation of open source software health and sustainability metrics could make them more and in some cases less useful for doing their jobs.
Right now, increasing maturity and communication with CHAOSS project ( https://chaoss.community ) we have the following observations that will inform our work both the the “evolution” ( https://github.com/chaoss/wg-evolution ), risk ( https://github.com/chaoss/wg-risk ), and value ( https://github.com/chaoss/wg-value ) working groups and in Augur Development. There are a few things we have learned from prototyping Augur with community managers. Some of them are found in Jupyter Notebooks you can easily try out against sample open source data ( https://github.com/chaoss/augur-community-reports ). If you would like to have us gather data for your project so you can explore it on an Augur Dashboard or Jupyter Notebooks connected to a database, you can fill out our request form here: http://www.augurlabs.io/
In our research to date, these features in Augur are particularly valued by community managers:
- Allowing comparisons with projects within a defined universe is essential
- Allow community managers to add and remove repositories that they monitor from their repertoires periodically.
- Downloadable graphics
- Downloadable data (.csv or .json)
- Availability of a “Metrics API”, limiting the amount of software infrastructure the CM needs to maintain for themselves. This is more valued by program managers overseeing larger portfolios right now, but we think has potential to grow as awareness of the relatively light weight of this approach becomes more apparent. By apparent, we really mean “easy to use and understand”; right now it is for a programmer, but less so for a community manager without this background or current interest.
2 Date Summarized Comparison Metrics
With these advantages in mind, making the most of this opportunity to help community managers with useful metrics is going to include the availability of date summarized comparison metrics. These types of metrics have two “filters” or “parameters” fed into them that are more abstractly defined in the Growth, Maturity and Decline metrics on the CHAOSS project.
- Given a pool of repositories of interest for a community manager, rank them in ascending or descending order by a metric.
- Over a specified time period or
- Over a specified periodicity (i.e., month) for a length of time (i.e., year).
For example, one open source program office we talked with is interested in the following set of date summarized comparison metrics. Given a pool of repositories of interest to the program office (dozens to hundreds of repositories):
- What ten repositories have the most commits this year (straight commits, and lines of code)?
- How many new projects were launched this year?
- What are the top ten new repositories in terms of commits this year (straight commits, and lines of code)?
- How many commits and lines of code were contributed by outside contributors this calendar year? Organizationally sponsored contributors?
- What organizations are the top five external contributors of commits, comments and merges?
- What are the total number of repository watchers we have across all of our projects?
- Which repositories have the most stars? Of the ones new this year? Of all the projects? Which projects have the most new stars this year?
3 Open Ended Community Manager Questions to Support with Metrics
There are other, more open ended questions that may be useful to open source community managers:
- Is a repository active?
- Visual differentiation that examines issue and commit data
- Activity in the past 30 days
- Across all repositories, present the 50th percentile as a baseline and show repositories above and below that line.
- Should we archive this repository?
- Enable an input from the manager after reviewing statistics.
- Activity level, inactivity level and dependencies
- Mean/Median/Mode histogram for commits/repo
- Should we feature this repository in our top 10? (Probably a subjective decision based on some kind of composite scoring system that is likely specific to the needs of every community manager or program office.)
- Who are our top authors? (Some kind of aggregated contribution ranking by time period [year, month, week, day?]. nominally, I have a concern about these kinds of metrics being “gameable”, but if they are not visible to contributors themselves, there is less “gaming” opportunity.)
- What are our top repositories? (Probably a subjective decision based on some kind of composite scoring system that is likely specific to the needs of every community manager or program office.)
- Most active repositories by time period [Week? Month? Year?]. Activity to be revealed through a mix of Retention and Maintainer activity primarily focusing on the latter. Number of issues and commits. Also the frequency of pull requests and the number of closed issues.
- Least active repositories by time period [Week? Month? Year?]. Bottom of scores calculated, as above.
- Who is our most active contributor (Some kind of aggregated contribution ranking by time period [year, month, week, day?]. nominally, I have a concern about these kinds of metrics being “gameable”, but if they are not visible to contributors themselves, there is less “gaming” opportunity.)
- What new contributors submitted their first new patches/issues this week? (Visualization Note: New contributors can be colored in visualizations and then additionally a graph can be made for number of)
- Which contributors became inactive? (Will need a mechanism for setting “inactive” thresholds.)
- Baseline level for the “average” repository in an organization and for each, individual organization repository.
- What projects outside of a community manager’s general view (GitHub organization or other boundary) doe my repositories depend on or do my contributors also significantly contribute to?
- Build a summary report in 140 characters or less. For example, “Your total commits in this time period [week? month?] across the organization increased 12% over the last period. Your most active repositories remained the same. You have 8 new contributors, which is 1 below your mean for the past year. For more information, click here.”
- Once a metrics baseline is established, what can be done to move them? 1
- Are there optimal measures for some metrics?
— Pull request size?
— Ratio of maintainers to contributors?
— New contributor to consistent contributor ratio?
— New contributor to maintainer ratio?
Leave a Reply