Merge pull request #3334 from justincormack/roadmap

Update roadmap
2021-01-26 11:39:23 +00:00 · 2021-01-26 11:39:23 +00:00 · 5b27a7fc8b
parent 35f1369d37 67c504de8b
commit 5b27a7fc8b
3 changed files with 49 additions and 264 deletions
--- a/CODE-OF-CONDUCT.md
+++ b/CODE-OF-CONDUCT.md
@ -0,0 +1,36 @@
+This project has adopted the [CNCF Community Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md)
+
+### Contributor Code of Conduct
+
+As contributors and maintainers of this project, and in the interest of fostering
+an open and welcoming community, we pledge to respect all people who contribute
+through reporting issues, posting feature requests, updating documentation,
+submitting pull requests or patches, and other activities.
+
+We are committed to making participation in this project a harassment-free experience for
+everyone, regardless of level of experience, gender, gender identity and expression,
+sexual orientation, disability, personal appearance, body size, race, ethnicity, age,
+religion, or nationality.
+
+Examples of unacceptable behavior by participants include:
+
+* The use of sexualized language or imagery
+* Personal attacks
+* Trolling or insulting/derogatory comments
+* Public or private harassment
+* Publishing others' private information, such as physical or electronic addresses,
+ without explicit permission
+* Other unethical or unprofessional conduct.
+
+Project maintainers have the right and responsibility to remove, edit, or reject
+comments, commits, code, wiki edits, issues, and other contributions that are not
+aligned to this Code of Conduct. By adopting this Code of Conduct, project maintainers
+commit themselves to fairly and consistently applying these principles to every aspect
+of managing this project. Project maintainers who do not follow or enforce the Code of
+Conduct may be permanently removed from the project team.
+
+This code of conduct applies both within project spaces and in public spaces
+when an individual is representing the project or its community.
+
+Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by
+contacting a CNCF project maintainer or our mediator, Mishi Choudhary <mishi@linux.com>.
--- a/README.md
+++ b/README.md
@ -1,6 +1,6 @@
 # Distribution

-The Docker toolset to pack, ship, store, and deliver content.
+The toolset to pack, ship, store, and deliver content.

 This repository's main product is the Open Source Docker Registry implementation
 for storing and distributing Docker and OCI images using the
--- a/ROADMAP.md
+++ b/ROADMAP.md
@ -1,267 +1,16 @@
 # Roadmap

-The Distribution Project consists of several components, some of which are
-still being defined. This document defines the high-level goals of the
-project, identifies the current components, and defines the release-
-relationship to the Docker Platform.
+The Distribution project aims to support the following use cases

-* [Distribution Goals](#distribution-goals)
-* [Distribution Components](#distribution-components)
-* [Project Planning](#project-planning): release-relationship to the Docker Platform.
-
-This road map is a living document, providing an overview of the goals and
-considerations made in respect of the future of the project.
-
-## Distribution Goals
-
- Replace the existing [docker registry](github.com/docker/docker-registry)
-  implementation as the primary implementation.
- Replace the existing push and pull code in the docker engine with the
-  distribution package.
- Define a strong data model for distributing docker images
- Provide a flexible distribution tool kit for use in the docker platform
- Unlock new distribution models
-
-## Distribution Components
-
-Components of the Distribution Project are managed via github [milestones](https://github.com/docker/distribution/milestones). Upcoming
-features and bugfixes for a component will be added to the relevant milestone. If a feature or
-bugfix is not part of a milestone, it is currently unscheduled for
-implementation. 
-
-* [Registry](#registry)
-* [Distribution Package](#distribution-package)
-
-***
-
-### Registry
-
-The new Docker registry is the main portion of the distribution repository.
-Registry 2.0 is the first release of the next-generation registry. This was
-primarily focused on implementing the [new registry
-API](https://github.com/docker/distribution/blob/master/docs/spec/api.md),
-with a focus on security and performance. 
-
-Following from the Distribution project goals above, we have a set of goals
-for registry v2 that we would like to follow in the design. New features
-should be compared against these goals.
-
-#### Data Storage and Distribution First
-
-The registry's first goal is to provide a reliable, consistent storage
-location for Docker images. The registry should only provide the minimal
-amount of indexing required to fetch image data and no more.
-
-This means we should be selective in new features and API additions, including
-those that may require expensive, ever growing indexes. Requests should be
-servable in "constant time".
-
-#### Content Addressability
-
-All data objects used in the registry API should be content addressable.
-Content identifiers should be secure and verifiable. This provides a secure,
-reliable base from which to build more advanced content distribution systems.
-
-#### Content Agnostic
-
-In the past, changes to the image format would require large changes in Docker
-and the Registry. By decoupling the distribution and image format, we can
-allow the formats to progress without having to coordinate between the two.
-This means that we should be focused on decoupling Docker from the registry
-just as much as decoupling the registry from Docker. Such an approach will
-allow us to unlock new distribution models that haven't been possible before.
-
-We can take this further by saying that the new registry should be content
-agnostic. The registry provides a model of names, tags, manifests and content
-addresses and that model can be used to work with content.
-
-#### Simplicity
-
-The new registry should be closer to a microservice component than its
-predecessor. This means it should have a narrower API and a low number of
-service dependencies. It should be easy to deploy.
-
-This means that other solutions should be explored before changing the API or
-adding extra dependencies. If functionality is required, can it be added as an
-extension or companion service.
-
-#### Extensibility
-
-The registry should provide extension points to add functionality. By keeping
-the scope narrow, but providing the ability to add functionality.
-
-Features like search, indexing, synchronization and registry explorers fall
-into this category. No such feature should be added unless we've found it
-impossible to do through an extension.
-
-#### Active Feature Discussions
-
-The following are feature discussions that are currently active.
-
-If you don't see your favorite, unimplemented feature, feel free to contact us
-via IRC or the mailing list and we can talk about adding it. The goal here is
-to make sure that new features go through a rigid design process before
-landing in the registry.
-
-##### Proxying to other Registries
-
-A _pull-through caching_ mode exists for the registry, but is restricted from 
-within the docker client to only mirror the official Docker Hub.  This functionality
-can be expanded when image provenance has been specified and implemented in the 
-distribution project.
-
-##### Metadata storage
-
-Metadata for the registry is currently stored with the manifest and layer data on
-the storage backend.  While this is a big win for simplicity and reliably maintaining
-state, it comes with the cost of consistency and high latency.  The mutable registry
-metadata operations should be abstracted behind an API which will allow ACID compliant
-storage systems to handle metadata.
-
-##### Peer to Peer transfer
-
-Discussion has started here: https://docs.google.com/document/d/1rYDpSpJiQWmCQy8Cuiaa3NH-Co33oK_SC9HeXYo87QA/edit
-
-##### Indexing, Search and Discovery
-
-The original registry provided some implementation of search for use with
-private registries. Support has been elided from V2 since we'd like to both
-decouple search functionality from the registry. The makes the registry
-simpler to deploy, especially in use cases where search is not needed, and
-let's us decouple the image format from the registry.
-
-There are explorations into using the catalog API and notification system to
-build external indexes. The current line of thought is that we will define a
-common search API to index and query docker images. Such a system could be run
-as a companion to a registry or set of registries to power discovery.
-
-The main issue with search and discovery is that there are so many ways to
-accomplish it. There are two aspects to this project. The first is deciding on
-how it will be done, including an API definition that can work with changing
-data formats. The second is the process of integrating with `docker search`.
-We expect that someone attempts to address the problem with the existing tools
-and propose it as a standard search API or uses it to inform a standardization
-process. Once this has been explored, we integrate with the docker client.
-
-Please see the following for more detail:
-
- https://github.com/docker/distribution/issues/206
-
-##### Deletes
-
-> __NOTE:__ Deletes are a much asked for feature. Before requesting this
-feature or participating in discussion, we ask that you read this section in
-full and understand the problems behind deletes.
-
-While, at first glance, implementing deleting seems simple, there are a number
-mitigating factors that make many solutions not ideal or even pathological in
-the context of a registry. The following paragraph discuss the background and
-approaches that could be applied to arrive at a solution.
-
-The goal of deletes in any system is to remove unused or unneeded data. Only
-data requested for deletion should be removed and no other data. Removing
-unintended data is worse than _not_ removing data that was requested for
-removal but ideally, both are supported. Generally, according to this rule, we
-err on holding data longer than needed, ensuring that it is only removed when
-we can be certain that it can be removed. With the current behavior, we opt to
-hold onto the data forever, ensuring that data cannot be incorrectly removed.
-
-To understand the problems with implementing deletes, one must understand the
-data model. All registry data is stored in a filesystem layout, implemented on
-a "storage driver", effectively a _virtual file system_ (VFS). The storage
-system must assume that this VFS layer will be eventually consistent and has
-poor read- after-write consistency, since this is the lower common denominator
-among the storage drivers. This is mitigated by writing values in reverse-
-dependent order, but makes wider transactional operations unsafe.
-
-Layered on the VFS model is a content-addressable _directed, acyclic graph_
-(DAG) made up of blobs. Manifests reference layers. Tags reference manifests.
-Since the same data can be referenced by multiple manifests, we only store
-data once, even if it is in different repositories. Thus, we have a set of
-blobs, referenced by tags and manifests. If we want to delete a blob we need
-to be certain that it is no longer referenced by another manifest or tag. When
-we delete a manifest, we also can try to delete the referenced blobs. Deciding
-whether or not a blob has an active reference is the crux of the problem.
-
-Conceptually, deleting a manifest and its resources is quite simple. Just find
-all the manifests, enumerate the referenced blobs and delete the blobs not in
-that set. An astute observer will recognize this as a garbage collection
-problem. As with garbage collection in programming languages, this is very
-simple when one always has a consistent view. When one adds parallelism and an
-inconsistent view of data, it becomes very challenging.
-
-A simple example can demonstrate this. Let's say we are deleting a manifest
-_A_ in one process. We scan the manifest and decide that all the blobs are
-ready for deletion. Concurrently, we have another process accepting a new
-manifest _B_ referencing one or more blobs from the manifest _A_. Manifest _B_
-is accepted and all the blobs are considered present, so the operation
-proceeds. The original process then deletes the referenced blobs, assuming
-they were unreferenced. The manifest _B_, which we thought had all of its data
-present, can no longer be served by the registry, since the dependent data has
-been deleted.
-
-Deleting data from the registry safely requires some way to coordinate this
-operation. The following approaches are being considered:
-
- _Reference Counting_ - Maintain a count of references to each blob. This is
-  challenging for a number of reasons: 1. maintaining a consistent consensus
-  of reference counts across a set of Registries and 2. Building the initial
-  list of reference counts for an existing registry. These challenges can be
-  met with a consensus protocol like Paxos or Raft in the first case and a
-  necessary but simple scan in the second..
- _Lock the World GC_ - Halt all writes to the data store. Walk the data store
-  and find all blob references. Delete all unreferenced blobs. This approach
-  is very simple but requires disabling writes for a period of time while the
-  service reads all data. This is slow and expensive but very accurate and
-  effective.
- _Generational GC_ - Do something similar to above but instead of blocking
-  writes, writes are sent to another storage backend while reads are broadcast
-  to the new and old backends. GC is then performed on the read-only portion.
-  Because writes land in the new backend, the data in the read-only section
-  can be safely deleted. The main drawbacks of this approach are complexity
-  and coordination.
- _Centralized Oracle_ - Using a centralized, transactional database, we can
-  know exactly which data is referenced at any given time. This avoids
-  coordination problem by managing this data in a single location. We trade
-  off metadata scalability for simplicity and performance. This is a very good
-  option for most registry deployments. This would create a bottleneck for
-  registry metadata. However, metadata is generally not the main bottleneck
-  when serving images.
-
-Please let us know if other solutions exist that we have yet to enumerate.
-Note that for any approach, implementation is a massive consideration. For
-example, a mark-sweep based solution may seem simple but the amount of work in
-coordination offset the extra work it might take to build a _Centralized
-Oracle_. We'll accept proposals for any solution but please coordinate with us
-before dropping code.
-
-At this time, we have traded off simplicity and ease of deployment for disk
-space. Simplicity and ease of deployment tend to reduce developer involvement,
-which is currently the most expensive resource in software engineering. Taking
-on any solution for deletes will greatly effect these factors, trading off
-very cheap disk space for a complex deployment and operational story.
-
-Please see the following issues for more detail:
-
- https://github.com/docker/distribution/issues/422
- https://github.com/docker/distribution/issues/461
- https://github.com/docker/distribution/issues/462
-
-### Distribution Package 
-
-At its core, the Distribution Project is a set of Go packages that make up
-Distribution Components. At this time, most of these packages make up the
-Registry implementation. 
-
-The package itself is considered unstable. If you're using it, please take care to vendor the dependent version. 
-
-For feature additions, please see the Registry section. In the future, we may break out a
-separate Roadmap for distribution-specific features that apply to more than
-just the registry.
-
-***
-
-### Project Planning
-
-An [Open-Source Planning Process](https://github.com/docker/distribution/wiki/Open-Source-Planning-Process) is used to define the Roadmap. [Project Pages](https://github.com/docker/distribution/wiki) define the goals for each Milestone and identify current progress.
+1. A library to support building highly scalable and reliable container registries,
+that can be customised for different backends and use cases. This is used by many
+of the largest registry operators, including Docker Hub, GitHub, GitLab, Harbor
+and Digital Ocean.
+2. A reference implementation of the OCI registry standards, and an easy way to
+experiment with new propsals in the registry space as these standards change.
+3. Distributed registry tools, such as caching registries and local registries
+that can be used within clusters for performance and locality use cases.

+As every container application needs at least one registry as part of its infrastructure,
+and more cloud native artifacts are using registries as the basis of their distribution,
+having a widely used and supported open source registry is important for innovation.