commit
						5b27a7fc8b
					
				|  | @ -0,0 +1,36 @@ | ||||||
|  | This project has adopted the [CNCF Community Code of Conduct](https://github.com/cncf/foundation/blob/master/code-of-conduct.md) | ||||||
|  | 
 | ||||||
|  | ### Contributor Code of Conduct | ||||||
|  | 
 | ||||||
|  | As contributors and maintainers of this project, and in the interest of fostering | ||||||
|  | an open and welcoming community, we pledge to respect all people who contribute | ||||||
|  | through reporting issues, posting feature requests, updating documentation, | ||||||
|  | submitting pull requests or patches, and other activities. | ||||||
|  | 
 | ||||||
|  | We are committed to making participation in this project a harassment-free experience for | ||||||
|  | everyone, regardless of level of experience, gender, gender identity and expression, | ||||||
|  | sexual orientation, disability, personal appearance, body size, race, ethnicity, age, | ||||||
|  | religion, or nationality. | ||||||
|  | 
 | ||||||
|  | Examples of unacceptable behavior by participants include: | ||||||
|  | 
 | ||||||
|  | * The use of sexualized language or imagery | ||||||
|  | * Personal attacks | ||||||
|  | * Trolling or insulting/derogatory comments | ||||||
|  | * Public or private harassment | ||||||
|  | * Publishing others' private information, such as physical or electronic addresses, | ||||||
|  |  without explicit permission | ||||||
|  | * Other unethical or unprofessional conduct. | ||||||
|  | 
 | ||||||
|  | Project maintainers have the right and responsibility to remove, edit, or reject | ||||||
|  | comments, commits, code, wiki edits, issues, and other contributions that are not | ||||||
|  | aligned to this Code of Conduct. By adopting this Code of Conduct, project maintainers | ||||||
|  | commit themselves to fairly and consistently applying these principles to every aspect | ||||||
|  | of managing this project. Project maintainers who do not follow or enforce the Code of | ||||||
|  | Conduct may be permanently removed from the project team. | ||||||
|  | 
 | ||||||
|  | This code of conduct applies both within project spaces and in public spaces | ||||||
|  | when an individual is representing the project or its community. | ||||||
|  | 
 | ||||||
|  | Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by | ||||||
|  | contacting a CNCF project maintainer or our mediator, Mishi Choudhary <mishi@linux.com>. | ||||||
|  | @ -1,6 +1,6 @@ | ||||||
| # Distribution | # Distribution | ||||||
| 
 | 
 | ||||||
| The Docker toolset to pack, ship, store, and deliver content. | The toolset to pack, ship, store, and deliver content. | ||||||
| 
 | 
 | ||||||
| This repository's main product is the Open Source Docker Registry implementation | This repository's main product is the Open Source Docker Registry implementation | ||||||
| for storing and distributing Docker and OCI images using the | for storing and distributing Docker and OCI images using the | ||||||
|  |  | ||||||
							
								
								
									
										275
									
								
								ROADMAP.md
								
								
								
								
							
							
						
						
									
										275
									
								
								ROADMAP.md
								
								
								
								
							|  | @ -1,267 +1,16 @@ | ||||||
| # Roadmap | # Roadmap | ||||||
| 
 | 
 | ||||||
| The Distribution Project consists of several components, some of which are | The Distribution project aims to support the following use cases | ||||||
| still being defined. This document defines the high-level goals of the |  | ||||||
| project, identifies the current components, and defines the release- |  | ||||||
| relationship to the Docker Platform. |  | ||||||
| 
 | 
 | ||||||
| * [Distribution Goals](#distribution-goals) | 1. A library to support building highly scalable and reliable container registries, | ||||||
| * [Distribution Components](#distribution-components) | that can be customised for different backends and use cases. This is used by many | ||||||
| * [Project Planning](#project-planning): release-relationship to the Docker Platform. | of the largest registry operators, including Docker Hub, GitHub, GitLab, Harbor | ||||||
| 
 | and Digital Ocean. | ||||||
| This road map is a living document, providing an overview of the goals and | 2. A reference implementation of the OCI registry standards, and an easy way to | ||||||
| considerations made in respect of the future of the project. | experiment with new propsals in the registry space as these standards change. | ||||||
| 
 | 3. Distributed registry tools, such as caching registries and local registries | ||||||
| ## Distribution Goals | that can be used within clusters for performance and locality use cases. | ||||||
| 
 |  | ||||||
| - Replace the existing [docker registry](github.com/docker/docker-registry) |  | ||||||
|   implementation as the primary implementation. |  | ||||||
| - Replace the existing push and pull code in the docker engine with the |  | ||||||
|   distribution package. |  | ||||||
| - Define a strong data model for distributing docker images |  | ||||||
| - Provide a flexible distribution tool kit for use in the docker platform |  | ||||||
| - Unlock new distribution models |  | ||||||
| 
 |  | ||||||
| ## Distribution Components |  | ||||||
| 
 |  | ||||||
| Components of the Distribution Project are managed via github [milestones](https://github.com/docker/distribution/milestones). Upcoming |  | ||||||
| features and bugfixes for a component will be added to the relevant milestone. If a feature or |  | ||||||
| bugfix is not part of a milestone, it is currently unscheduled for |  | ||||||
| implementation.  |  | ||||||
| 
 |  | ||||||
| * [Registry](#registry) |  | ||||||
| * [Distribution Package](#distribution-package) |  | ||||||
| 
 |  | ||||||
| *** |  | ||||||
| 
 |  | ||||||
| ### Registry |  | ||||||
| 
 |  | ||||||
| The new Docker registry is the main portion of the distribution repository. |  | ||||||
| Registry 2.0 is the first release of the next-generation registry. This was |  | ||||||
| primarily focused on implementing the [new registry |  | ||||||
| API](https://github.com/docker/distribution/blob/master/docs/spec/api.md), |  | ||||||
| with a focus on security and performance.  |  | ||||||
| 
 |  | ||||||
| Following from the Distribution project goals above, we have a set of goals |  | ||||||
| for registry v2 that we would like to follow in the design. New features |  | ||||||
| should be compared against these goals. |  | ||||||
| 
 |  | ||||||
| #### Data Storage and Distribution First |  | ||||||
| 
 |  | ||||||
| The registry's first goal is to provide a reliable, consistent storage |  | ||||||
| location for Docker images. The registry should only provide the minimal |  | ||||||
| amount of indexing required to fetch image data and no more. |  | ||||||
| 
 |  | ||||||
| This means we should be selective in new features and API additions, including |  | ||||||
| those that may require expensive, ever growing indexes. Requests should be |  | ||||||
| servable in "constant time". |  | ||||||
| 
 |  | ||||||
| #### Content Addressability |  | ||||||
| 
 |  | ||||||
| All data objects used in the registry API should be content addressable. |  | ||||||
| Content identifiers should be secure and verifiable. This provides a secure, |  | ||||||
| reliable base from which to build more advanced content distribution systems. |  | ||||||
| 
 |  | ||||||
| #### Content Agnostic |  | ||||||
| 
 |  | ||||||
| In the past, changes to the image format would require large changes in Docker |  | ||||||
| and the Registry. By decoupling the distribution and image format, we can |  | ||||||
| allow the formats to progress without having to coordinate between the two. |  | ||||||
| This means that we should be focused on decoupling Docker from the registry |  | ||||||
| just as much as decoupling the registry from Docker. Such an approach will |  | ||||||
| allow us to unlock new distribution models that haven't been possible before. |  | ||||||
| 
 |  | ||||||
| We can take this further by saying that the new registry should be content |  | ||||||
| agnostic. The registry provides a model of names, tags, manifests and content |  | ||||||
| addresses and that model can be used to work with content. |  | ||||||
| 
 |  | ||||||
| #### Simplicity |  | ||||||
| 
 |  | ||||||
| The new registry should be closer to a microservice component than its |  | ||||||
| predecessor. This means it should have a narrower API and a low number of |  | ||||||
| service dependencies. It should be easy to deploy. |  | ||||||
| 
 |  | ||||||
| This means that other solutions should be explored before changing the API or |  | ||||||
| adding extra dependencies. If functionality is required, can it be added as an |  | ||||||
| extension or companion service. |  | ||||||
| 
 |  | ||||||
| #### Extensibility |  | ||||||
| 
 |  | ||||||
| The registry should provide extension points to add functionality. By keeping |  | ||||||
| the scope narrow, but providing the ability to add functionality. |  | ||||||
| 
 |  | ||||||
| Features like search, indexing, synchronization and registry explorers fall |  | ||||||
| into this category. No such feature should be added unless we've found it |  | ||||||
| impossible to do through an extension. |  | ||||||
| 
 |  | ||||||
| #### Active Feature Discussions |  | ||||||
| 
 |  | ||||||
| The following are feature discussions that are currently active. |  | ||||||
| 
 |  | ||||||
| If you don't see your favorite, unimplemented feature, feel free to contact us |  | ||||||
| via IRC or the mailing list and we can talk about adding it. The goal here is |  | ||||||
| to make sure that new features go through a rigid design process before |  | ||||||
| landing in the registry. |  | ||||||
| 
 |  | ||||||
| ##### Proxying to other Registries |  | ||||||
| 
 |  | ||||||
| A _pull-through caching_ mode exists for the registry, but is restricted from  |  | ||||||
| within the docker client to only mirror the official Docker Hub.  This functionality |  | ||||||
| can be expanded when image provenance has been specified and implemented in the  |  | ||||||
| distribution project. |  | ||||||
| 
 |  | ||||||
| ##### Metadata storage |  | ||||||
| 
 |  | ||||||
| Metadata for the registry is currently stored with the manifest and layer data on |  | ||||||
| the storage backend.  While this is a big win for simplicity and reliably maintaining |  | ||||||
| state, it comes with the cost of consistency and high latency.  The mutable registry |  | ||||||
| metadata operations should be abstracted behind an API which will allow ACID compliant |  | ||||||
| storage systems to handle metadata. |  | ||||||
| 
 |  | ||||||
| ##### Peer to Peer transfer |  | ||||||
| 
 |  | ||||||
| Discussion has started here: https://docs.google.com/document/d/1rYDpSpJiQWmCQy8Cuiaa3NH-Co33oK_SC9HeXYo87QA/edit |  | ||||||
| 
 |  | ||||||
| ##### Indexing, Search and Discovery |  | ||||||
| 
 |  | ||||||
| The original registry provided some implementation of search for use with |  | ||||||
| private registries. Support has been elided from V2 since we'd like to both |  | ||||||
| decouple search functionality from the registry. The makes the registry |  | ||||||
| simpler to deploy, especially in use cases where search is not needed, and |  | ||||||
| let's us decouple the image format from the registry. |  | ||||||
| 
 |  | ||||||
| There are explorations into using the catalog API and notification system to |  | ||||||
| build external indexes. The current line of thought is that we will define a |  | ||||||
| common search API to index and query docker images. Such a system could be run |  | ||||||
| as a companion to a registry or set of registries to power discovery. |  | ||||||
| 
 |  | ||||||
| The main issue with search and discovery is that there are so many ways to |  | ||||||
| accomplish it. There are two aspects to this project. The first is deciding on |  | ||||||
| how it will be done, including an API definition that can work with changing |  | ||||||
| data formats. The second is the process of integrating with `docker search`. |  | ||||||
| We expect that someone attempts to address the problem with the existing tools |  | ||||||
| and propose it as a standard search API or uses it to inform a standardization |  | ||||||
| process. Once this has been explored, we integrate with the docker client. |  | ||||||
| 
 |  | ||||||
| Please see the following for more detail: |  | ||||||
| 
 |  | ||||||
| - https://github.com/docker/distribution/issues/206 |  | ||||||
| 
 |  | ||||||
| ##### Deletes |  | ||||||
| 
 |  | ||||||
| > __NOTE:__ Deletes are a much asked for feature. Before requesting this |  | ||||||
| feature or participating in discussion, we ask that you read this section in |  | ||||||
| full and understand the problems behind deletes. |  | ||||||
| 
 |  | ||||||
| While, at first glance, implementing deleting seems simple, there are a number |  | ||||||
| mitigating factors that make many solutions not ideal or even pathological in |  | ||||||
| the context of a registry. The following paragraph discuss the background and |  | ||||||
| approaches that could be applied to arrive at a solution. |  | ||||||
| 
 |  | ||||||
| The goal of deletes in any system is to remove unused or unneeded data. Only |  | ||||||
| data requested for deletion should be removed and no other data. Removing |  | ||||||
| unintended data is worse than _not_ removing data that was requested for |  | ||||||
| removal but ideally, both are supported. Generally, according to this rule, we |  | ||||||
| err on holding data longer than needed, ensuring that it is only removed when |  | ||||||
| we can be certain that it can be removed. With the current behavior, we opt to |  | ||||||
| hold onto the data forever, ensuring that data cannot be incorrectly removed. |  | ||||||
| 
 |  | ||||||
| To understand the problems with implementing deletes, one must understand the |  | ||||||
| data model. All registry data is stored in a filesystem layout, implemented on |  | ||||||
| a "storage driver", effectively a _virtual file system_ (VFS). The storage |  | ||||||
| system must assume that this VFS layer will be eventually consistent and has |  | ||||||
| poor read- after-write consistency, since this is the lower common denominator |  | ||||||
| among the storage drivers. This is mitigated by writing values in reverse- |  | ||||||
| dependent order, but makes wider transactional operations unsafe. |  | ||||||
| 
 |  | ||||||
| Layered on the VFS model is a content-addressable _directed, acyclic graph_ |  | ||||||
| (DAG) made up of blobs. Manifests reference layers. Tags reference manifests. |  | ||||||
| Since the same data can be referenced by multiple manifests, we only store |  | ||||||
| data once, even if it is in different repositories. Thus, we have a set of |  | ||||||
| blobs, referenced by tags and manifests. If we want to delete a blob we need |  | ||||||
| to be certain that it is no longer referenced by another manifest or tag. When |  | ||||||
| we delete a manifest, we also can try to delete the referenced blobs. Deciding |  | ||||||
| whether or not a blob has an active reference is the crux of the problem. |  | ||||||
| 
 |  | ||||||
| Conceptually, deleting a manifest and its resources is quite simple. Just find |  | ||||||
| all the manifests, enumerate the referenced blobs and delete the blobs not in |  | ||||||
| that set. An astute observer will recognize this as a garbage collection |  | ||||||
| problem. As with garbage collection in programming languages, this is very |  | ||||||
| simple when one always has a consistent view. When one adds parallelism and an |  | ||||||
| inconsistent view of data, it becomes very challenging. |  | ||||||
| 
 |  | ||||||
| A simple example can demonstrate this. Let's say we are deleting a manifest |  | ||||||
| _A_ in one process. We scan the manifest and decide that all the blobs are |  | ||||||
| ready for deletion. Concurrently, we have another process accepting a new |  | ||||||
| manifest _B_ referencing one or more blobs from the manifest _A_. Manifest _B_ |  | ||||||
| is accepted and all the blobs are considered present, so the operation |  | ||||||
| proceeds. The original process then deletes the referenced blobs, assuming |  | ||||||
| they were unreferenced. The manifest _B_, which we thought had all of its data |  | ||||||
| present, can no longer be served by the registry, since the dependent data has |  | ||||||
| been deleted. |  | ||||||
| 
 |  | ||||||
| Deleting data from the registry safely requires some way to coordinate this |  | ||||||
| operation. The following approaches are being considered: |  | ||||||
| 
 |  | ||||||
| - _Reference Counting_ - Maintain a count of references to each blob. This is |  | ||||||
|   challenging for a number of reasons: 1. maintaining a consistent consensus |  | ||||||
|   of reference counts across a set of Registries and 2. Building the initial |  | ||||||
|   list of reference counts for an existing registry. These challenges can be |  | ||||||
|   met with a consensus protocol like Paxos or Raft in the first case and a |  | ||||||
|   necessary but simple scan in the second.. |  | ||||||
| - _Lock the World GC_ - Halt all writes to the data store. Walk the data store |  | ||||||
|   and find all blob references. Delete all unreferenced blobs. This approach |  | ||||||
|   is very simple but requires disabling writes for a period of time while the |  | ||||||
|   service reads all data. This is slow and expensive but very accurate and |  | ||||||
|   effective. |  | ||||||
| - _Generational GC_ - Do something similar to above but instead of blocking |  | ||||||
|   writes, writes are sent to another storage backend while reads are broadcast |  | ||||||
|   to the new and old backends. GC is then performed on the read-only portion. |  | ||||||
|   Because writes land in the new backend, the data in the read-only section |  | ||||||
|   can be safely deleted. The main drawbacks of this approach are complexity |  | ||||||
|   and coordination. |  | ||||||
| - _Centralized Oracle_ - Using a centralized, transactional database, we can |  | ||||||
|   know exactly which data is referenced at any given time. This avoids |  | ||||||
|   coordination problem by managing this data in a single location. We trade |  | ||||||
|   off metadata scalability for simplicity and performance. This is a very good |  | ||||||
|   option for most registry deployments. This would create a bottleneck for |  | ||||||
|   registry metadata. However, metadata is generally not the main bottleneck |  | ||||||
|   when serving images. |  | ||||||
| 
 |  | ||||||
| Please let us know if other solutions exist that we have yet to enumerate. |  | ||||||
| Note that for any approach, implementation is a massive consideration. For |  | ||||||
| example, a mark-sweep based solution may seem simple but the amount of work in |  | ||||||
| coordination offset the extra work it might take to build a _Centralized |  | ||||||
| Oracle_. We'll accept proposals for any solution but please coordinate with us |  | ||||||
| before dropping code. |  | ||||||
| 
 |  | ||||||
| At this time, we have traded off simplicity and ease of deployment for disk |  | ||||||
| space. Simplicity and ease of deployment tend to reduce developer involvement, |  | ||||||
| which is currently the most expensive resource in software engineering. Taking |  | ||||||
| on any solution for deletes will greatly effect these factors, trading off |  | ||||||
| very cheap disk space for a complex deployment and operational story. |  | ||||||
| 
 |  | ||||||
| Please see the following issues for more detail: |  | ||||||
| 
 |  | ||||||
| - https://github.com/docker/distribution/issues/422 |  | ||||||
| - https://github.com/docker/distribution/issues/461 |  | ||||||
| - https://github.com/docker/distribution/issues/462 |  | ||||||
| 
 |  | ||||||
| ### Distribution Package  |  | ||||||
| 
 |  | ||||||
| At its core, the Distribution Project is a set of Go packages that make up |  | ||||||
| Distribution Components. At this time, most of these packages make up the |  | ||||||
| Registry implementation.  |  | ||||||
| 
 |  | ||||||
| The package itself is considered unstable. If you're using it, please take care to vendor the dependent version.  |  | ||||||
| 
 |  | ||||||
| For feature additions, please see the Registry section. In the future, we may break out a |  | ||||||
| separate Roadmap for distribution-specific features that apply to more than |  | ||||||
| just the registry. |  | ||||||
| 
 |  | ||||||
| *** |  | ||||||
| 
 |  | ||||||
| ### Project Planning |  | ||||||
| 
 |  | ||||||
| An [Open-Source Planning Process](https://github.com/docker/distribution/wiki/Open-Source-Planning-Process) is used to define the Roadmap. [Project Pages](https://github.com/docker/distribution/wiki) define the goals for each Milestone and identify current progress. |  | ||||||
| 
 | 
 | ||||||
|  | As every container application needs at least one registry as part of its infrastructure, | ||||||
|  | and more cloud native artifacts are using registries as the basis of their distribution, | ||||||
|  | having a widely used and supported open source registry is important for innovation. | ||||||
|  |  | ||||||
		Loading…
	
		Reference in New Issue