Intro

Josh combines the advantages of monorepos with those of multirepos by leveraging a blazingly-fast, incremental, and reversible implementation of git history filtering.
Concept
Traditionally, history filtering has been viewed as an expensive operation that should only be performed to fix issues with a repository, such as purging big binary files or removing accidentally-committed secrets, or as part of a migration to a different repository structure, like switching from multirepo to monorepo (or vice versa).
The implementation shipped with git (git-filter branch) is only usable as a once-in-a-lifetime
last resort for anything but tiny repositories.
Faster versions of history filtering have been implemented, such as git-filter-repo or the BFG repo cleaner. Those, while much faster, are designed for doing occasional, destructive maintenance tasks, usually with the idea already in mind that once the filtering is complete the old history should be discarded.
The idea behind josh started with two questions:
- What if history filtering could be so fast that it can be part of a normal, everyday workflow, running on every single push and fetch without the user even noticing?
- What if history filtering was a non-destructive, reversible operation?
Under those two premises a filter operation stops being a maintenance task. It seamlessly relates histories between repos, which can be used by developers and CI systems interchangeably in whatever way is most suitable to the task at hand.
How is this possible?
Filtering history is a highly predictable task: The set of filters that tend to be used for any given repository is limited, such that the input to the filter (a git branch) only gets modified in an incremental way. Thus, by keeping a persistent cache between filter runs, the work needed to re-run a filter on a new commit (and its history) becomes proportional to the number of changes since the last run; The work to filter no longer depends on the total length of the history. Additionally, most filters also do not depend on the size of the trees.
What has long been known to be true for performing merges also applies to history filtering: The more often it is done the less work it takes each time.
To guarantee filters are reversible we have to restrict the kind of filter that can be used; It is not possible to write arbitrary filters using a scripting language like is allowed in other tools. To still be able to cover a wide range of use cases we have introduced a domain-specific language to express more complex filters as a combination of simpler ones. Apart from guaranteeing reversibility, the use of a DSL also enables pre-optimization of filter expressions to minimize both the amount of work to be done to execute the filter as well as the on-disk size of the persistent cache.
From Linus Torvalds 2007 talk at Google about git:
Audience:
Can you have just a part of files pulled out of a repository, not the entire repository?
Linus:
You can export things as tarballs, you can export things as individual files, you can rewrite the whole history to say “I want a new version of that repository that only contains that part”, you can do that, it is a fairly expensive operation it’s something you would do for example when you import an old repository into a one huge git repository and then you can split it later on to be multiple smaller ones, you can do it, what I am trying to say is that you should generally try to avoid it. It’s not that git cannot handle huge projects, git would not perform as well as it would otherwise. And you will have issues that you wish you didn’t not have.
So I am skipping this issue and going back to the performance issue. One of the things I want to say about performance is that a lot of people seem to think that performance is about doing the same thing, just doing it faster, and that is not true.
That is not what performance is all about. If you can do something really fast, really well, people will start using it differently.
Use cases
Partial cloning
Reduce scope and size of clones by treating subdirectories of the monorepo as individual repositories.
$ git clone http://josh/monorepo.git:/path/to/library.git
The partial repo will act as a normal git repository but only contain the files found in the subdirectory and only commits affecting those files. The partial repo supports both fetch as well as push operation.
This helps not just to improve performance on the client due to having fewer files in the tree, it also enables collaboration on parts of the monorepo with other parties utilizing git’s normal distributed development features. For example, this makes it easy to mirror just selected parts of your repo to public github repositories or specific customers.
Project composition / Workspaces
Simplify code sharing and dependency management. Beyond just subdirectories, Josh supports filtering, re-mapping and composition of arbitrary virtual repositories from the content found in the monorepo.
The mapping itself is also stored in the repository and therefore versioned alongside the code.
Multiple projects, depending on a shared set of libraries, can thus live together in a single repository. This approach is commonly referred to as “monorepo”, and was popularized by Google, Facebook or Twitter to name a few.
In this example, two projects (project1 and project2) coexist in the central monorepo.
| Central monorepo | Project workspaces | workspace.josh file |
|---|---|---|
dependencies = :/modules:[
::tools/
::library1/
]
| ||
libs/library1 = :/modules/library1 |
Frequently Asked Questions
How is Josh different from git sparse-checkout?
Josh operates on the git object graph and is unrelated to checking out files and the working tree on the filesystem, which is the only thing sparse-checkout is concerned with. A sparse checkout does not influence the contents of the object database and also not what gets downloaded over the network. Both can certainly be used together if needed.
How is Josh different from partial clone?
A partial clone will cause git to download only parts of an object database according to some predicate. It is still the same object database with the history having the same commits and sha1s. It still allows loading skipped parts of the object database at a later point. Josh creates an alternate history that has no reference to the skipped parts. It is as such very similar to git filter-branch just faster, with added features and a different user interface.
How is it different from submodules?
Where git submodules are multiple, independent repos, referencing each other with SHAs, Josh supports the monorepo approach. All of the code is in one single repo which can easily be kept in sync, and Josh provides any sub folder (or in the case of workspaces, more complicated recombination of folders) as their own git repository. These repos are transparently synchronised both ways with the main monorepo. Josh can thus do more than submodules can, and is easier and faster to use.
How is it different from git subtree?
The basic idea behind Josh is quite similar to git subtree. However git subtree, just like git filter-branch, is way too slow for everyday use, even on medium sized repos.
git subtree can only achieve acceptable performance when squashing commits and therefore losing history. One core part of Josh is essentially a much faster implementation
of git subtree split which has been specifically optimized for being run frequently inside the same repository.
How is Josh different from git filter-repo?
Both josh-filter as well as git filter-repo enable very fast rewriting of Git history and thus can in simple cases be used
for the same purpose.
Which one is right in more advanced use cases depends on your goals: git filter-repo offers more flexibility and options
on what kind of filtering it supports, like rewriting commit messages or even plugging arbitrary scripts into the filtering.
Josh uses a DSL instead of arbitrary scripts for complex filters and is much more restrictive in the kind of filtering possible, but in exchange for those limitations offers incremental filtering as well as bidirectional operation, meaning converting changes between both the original and the filtered repos.
How is Josh different from all of the above alternatives?
Josh includes josh-proxy which offers repo filtering as a service, mainly intended to support monorepo workflows.
Getting Started
The josh command-line tool lets you clone, fetch, push, and manage filtered views of
git repositories directly from your terminal.
Installation
Install josh using Cargo (requires Rust):
cargo install josh-cli --locked --git https://github.com/josh-project/josh.git
Cloning a repository
josh clone is similar to git clone but takes two required arguments after the URL:
a filter and a local destination path. Unlike git clone,
the destination path is always required and cannot be inferred from the URL.
For example, let’s clone just the documentation folder of the Josh repository:
josh clone https://github.com/josh-project/josh.git :/docs ./josh-docs
The filter :/docs tells Josh to check out only the contents of the docs/ subdirectory.
The resulting repository will contain only the files from that folder and only the commits
that touch them — as if that subdirectory had always been its own repository.
To clone a repository without any filter (equivalent to a plain git clone):
josh clone https://github.com/josh-project/josh.git :/ ./josh
Making and pushing changes
The cloned repository is a normal git repository. Edit files, commit as usual, then use
josh push to send your changes back upstream:
cd josh-docs
# ... edit files, git add, git commit ...
josh push
Josh transparently reverses the filter and applies your commits to the correct location in the upstream repository. From the perspective of the rest of the team, the changes appear exactly as if they had been pushed directly to the monorepo.
Pulling changes
Use josh pull to fetch and integrate updates from upstream:
josh pull
Cloning a part of a repository
Josh becomes particularly useful when you want to work on a filtered view of a larger
repository — for example, a single subdirectory or a composed workspace. The josh CLI
applies the filter client-side, which means the full repository object database is still
downloaded from the upstream host. The filter determines which commits and files are
visible in your working tree and which refs you can push to, but it does not reduce
transfer size.
Note: If a true partial download is important — for example to avoid transferring a large monorepo over a slow connection — you need server-side filtering via a josh-proxy. With the proxy in place, only the filtered objects are ever sent over the network.
Beyond simple subdirectory extraction, Josh’s filter language supports composition, remapping, and exclusions, making it possible to carve out any virtual slice of a repository.
Next steps
- Workspaces — Compose a virtual repository from multiple parts of a monorepo and keep them in sync bidirectionally.
- Stacked changes — Push a series of commits as individual pull requests with automatic PR management.
- Filter syntax — Learn all the available filter operations.
- josh CLI reference — Full reference for all
joshsubcommands and options. - Proxy setup — Running a shared
josh-proxyfor your team or CI/CD infrastructure, so that ordinarygit cloneworks without any special client tooling.
Working with workspaces
NOTE
All the commands are included from the file
workspaces.twhich can be run with cram.
Josh really starts to shine when using workspaces.
Simply put, they are a list of files and folders, remapped from the central repository to a new repository. For example, a shared library could be used by various workspaces, each mapping it to their appropriate subdirectory.
In this chapter, we’re going to set up a new git repository with a couple of libraries, and then use it to demonstrate the use of workspaces.
Test set-up
NOTE
The following section describes how to set-up a local git server with made-up content for the sake of this tutorial. You’re free to follow it, or to use your own existing repository, in which case you can skip to the next section
To host the repository for this test, we need a git server. We’re going to run git as a cgi program using its provided http backend, served with the test server included in the hyper_cgi crate.
Serving the git repo
First, we create a bare repository, which will be served by hyper_cgi. We enable
the option http.receivepack to allow the use of git push from the clients.
$ git init --bare ./remote/real_repo.git/
Initialized empty Git repository in */real_repo.git/ (glob)
$ git config -f ./remote/real_repo.git/config http.receivepack true
Then we start the server which will allow clients to access the repository through http.
$ GIT_DIR=./remote/ GIT_PROJECT_ROOT=${TESTTMP}/remote/ GIT_HTTP_EXPORT_ALL=1 hyper-cgi-test-server\
> --port=8001\
> --dir=./remote/\
> --cmd=git\
> --args=http-backend\
> > ./hyper-cgi-test-server.out 2>&1 &
$ echo $! > ./server_pid
Our server is ready, serving all the repos in the remote folder on port 8001.
$ git clone http://localhost:8001/real_repo.git
Cloning into 'real_repo'...
warning: You appear to have cloned an empty repository.
Adding some content
Of course, the repository is for now empty, and we need to populate it. The populate.sh script creates a couple of libraries, as well as two applications that use them.
$ cd real_repo
$ sh ${TESTDIR}/populate.sh > ../populate.out
$ git push origin HEAD
To http://localhost:8001/real_repo.git
* [new branch] HEAD -> master
$ tree
.
|-- application1
| `-- app.c
|-- application2
| `-- guide.c
|-- doc
| |-- guide.md
| |-- library1.md
| `-- library2.md
|-- library1
| `-- lib1.h
`-- library2
`-- lib2.h
5 directories, 7 files
$ git log --oneline --graph
* f65e94b Add documentation
* f240612 Add application2
* 0a7f473 Add library2
* 1079ef1 Add application1
* 6476861 Add library1
Creating our first workspace
Now that we have a git repo populated with content, let’s serve it through josh:
$ docker run -d --network="host" -e JOSH_REMOTE=http://127.0.0.1:8001 -v josh-vol:$(pwd)/git_data joshproject/josh-proxy:latest > josh.out
NOTE
For the sake of this example, we run docker with –network=“host” instead of publishing the port. This is so that docker can access localhost, where our ad-hoc git repository is served.
To facilitate developement on applications 1 and 2, we want to create workspaces for them. Creating a new workspace looks very similar to checking out a subfolder through josh, as explained in “Getting Started”.
Instead of just the name of the subfolder, though, we also use the :workspace= filter:
$ git clone http://127.0.0.1:8000/real_repo.git:workspace=application1.git application1
Cloning into 'application1'...
$ cd application1
$ tree
.
`-- app.c
0 directories, 1 file
$ git log -2
commit 50cd6112e173df4cac1aca9cb88b5c2a180bc526
Author: Josh <josh@example.com>
Date: Thu Apr 7 22:13:13 2005 +0000
Add application1
Looking into the newly cloned workspace, we see our expected files and the history containing the only relevant commit.
NOTE
Josh allows us to create a workspace out of any directory, even one that doesn’t exist yet.
Adding workspace.josh
The workspace.josh file describes how folders from the central repository (real_repo.git) should be mapped to the workspace repository.
Since we depend on library1, let’s add it to the workspace file.
$ echo "modules/lib1 = :/library1" >> workspace.josh
$ git add workspace.josh
$ git commit -m "Map library1 to the application1 workspace"
[master 06361ee] Map library1 to the application1 workspace
1 file changed, 1 insertion(+)
create mode 100644 workspace.josh
We decided to map library1 to modules/lib1 in the workspace. We can now sync up with the server:
$ git sync origin HEAD
HEAD -> refs/heads/master
From http://127.0.0.1:8000/real_repo.git:workspace=application1
* branch 753d62ca1af960a3d071bb3b40722471228abbf6 -> FETCH_HEAD
HEAD is now at 753d62c Map library1 to the application1 workspace
Pushing to http://127.0.0.1:8000/real_repo.git:workspace=application1.git
POST git-receive-pack (477 bytes)
remote: josh-proxy
remote: response from upstream:
remote: To http://localhost:8001/real_repo.git
remote: f65e94b..37184cc JOSH_PUSH -> master
remote: REWRITE(06361eedf6d6f6d7ada6000481a47363b0f0c3de -> 753d62ca1af960a3d071bb3b40722471228abbf6)
remote:
remote:
updating local tracking ref 'refs/remotes/origin/master'
let’s observe the result:
$ tree
.
|-- app.c
|-- modules
| `-- lib1
| `-- lib1.h
`-- workspace.josh
2 directories, 3 files
$ git log --graph --oneline
* 753d62c Map library1 to the application1 workspace
|\
| * 366adba Add library1
* 50cd611 Add application1
After pushing and fetching the result, we see that it has been successfully mapped by josh.
One suprising thing is the history: our “mapping” commit became a merge commit! This is because josh needs to merge the history of the module we want to map into the repository of the workspace. After this is done, all commits will be present in both of the histories.
NOTE
git syncis a utility provided with josh which will push contents, and, if josh tells it to, fetch the transformed result. Otherwise, it works like git push.
By the way, what does the history look like on the real_repo ?
$ cd ../real_repo
$ git pull origin master
From http://localhost:8001/real_repo
* branch master -> FETCH_HEAD
f65e94b..37184cc master -> origin/master
Updating f65e94b..37184cc
Fast-forward
application1/workspace.josh | 1 +
1 file changed, 1 insertion(+)
create mode 100644 application1/workspace.josh
Current branch master is up to date.
$ tree
.
|-- application1
| |-- app.c
| `-- workspace.josh
|-- application2
| `-- guide.c
|-- doc
| |-- guide.md
| |-- library1.md
| `-- library2.md
|-- library1
| `-- lib1.h
`-- library2
`-- lib2.h
5 directories, 8 files
$ git log --graph --oneline
* 37184cc Map library1 to the application1 workspace
* f65e94b Add documentation
* f240612 Add application2
* 0a7f473 Add library2
* 1079ef1 Add application1
* 6476861 Add library1
We can see the newly added commit for workspace.josh in application1, and as expected, no merge here.
Interacting with workspaces
Let’s now create a second workspace, this time for application2. It depends on library1 and library2.
$ git clone http://127.0.0.1:8000/real_repo.git:workspace=application2.git application2
Cloning into 'application2'...
$ cd application2
$ echo "libs/lib1 = :/library1" >> workspace.josh
$ echo "libs/lib2 = :/library2" >> workspace.josh
$ git add workspace.josh && git commit -m "Create workspace for application2"
[master 566a489] Create workspace for application2
1 file changed, 2 insertions(+)
create mode 100644 workspace.josh
Syncing as before:
$ git sync origin HEAD
HEAD -> refs/heads/master
From http://127.0.0.1:8000/real_repo.git:workspace=application2
* branch 5115fd2a5374cbc799da61a228f7fece3039250b -> FETCH_HEAD
HEAD is now at 5115fd2 Create workspace for application2
Pushing to http://127.0.0.1:8000/real_repo.git:workspace=application2.git
POST git-receive-pack (478 bytes)
remote: josh-proxy
remote: response from upstream:
remote: To http://localhost:8001/real_repo.git
remote: 37184cc..feb3a5b JOSH_PUSH -> master
remote: REWRITE(566a4899f0697d0bde1ba064ed81f0654a316332 -> 5115fd2a5374cbc799da61a228f7fece3039250b)
remote:
remote:
updating local tracking ref 'refs/remotes/origin/master'
And our local folder now contains all the files requested:
$ tree
.
|-- guide.c
|-- libs
| |-- lib1
| | `-- lib1.h
| `-- lib2
| `-- lib2.h
`-- workspace.josh
3 directories, 4 files
And the history includes the history of both of the libraries:
$ git log --oneline --graph
* 5115fd2 Create workspace for application2
|\
| * ffaf58d Add library2
| * f4e4e40 Add library1
* ee8a5d7 Add application2
Note that since we created the workspace and added the dependencies in one single commit, the history just contains this one single merge commit.
Pushing a change from a workspace
While testing application2, we noticed a typo in the library1 dependency.
Let’s go ahead a fix it!
$ sed -i 's/41/42/' libs/lib1/lib1.h
$ git commit -a -m "fix lib1 typo"
[master 82238bf] fix lib1 typo
1 file changed, 1 insertion(+), 1 deletion(-)
We can push this change like any normal git change:
$ git push origin master
remote: josh-proxy
remote: response from upstream:
remote: To http://localhost:8001/real_repo.git
remote: feb3a5b..31e8fab JOSH_PUSH -> master
remote:
remote:
To http://127.0.0.1:8000/real_repo.git:workspace=application2.git
5115fd2..82238bf master -> master
Since the change was merged in the central repository, a developer can now pull from the application1 workspace.
$ cd ../application1
$ git pull
From http://127.0.0.1:8000/real_repo.git:workspace=application1
+ 06361ee...c64b765 master -> origin/master (forced update)
Updating 753d62c..c64b765
Fast-forward
modules/lib1/lib1.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Current branch master is up to date.
The change has been propagated!
$ git log --oneline --graph
* c64b765 fix lib1 typo
* 753d62c Map library1 to the application1 workspace
|\
| * 366adba Add library1
* 50cd611 Add application1
Stacked Changes
Josh supports a stacked-changes workflow where a series of commits on a local branch can each be pushed as a separate, independently-reviewable unit. This is useful when working on a larger feature that is best reviewed in smaller, logical steps.
This feature is separate from Josh’s filtering functionality. It works with any
repository accessible via the josh CLI, regardless of whether you are working with a
filtered view of a monorepo or a plain repository.
Concepts
In a stacked changes workflow, each commit on your local branch represents one
self-contained change. When you use josh publish, Josh creates a
separate git ref for each qualifying commit.
A commit qualifies for a separate ref — and an automatic PR, when forge integration is configured — only if both of the following are true:
- It has a change ID in the commit message footer (see below).
- Its author email matches the email configured in
user.emailin your git config.
Commits without a change ID, or authored by someone else, are silently skipped and are not pushed as individual changes.
Change IDs
A change ID is a short, stable identifier that you add manually to the footer of a commit message, using either of these footers:
Change: my-feature-part-1
or the Gerrit-compatible form:
Change-Id: I1234abcd...
The change ID must not contain @. It must be unique within the stack. It is what
allows josh push to match a commit to an existing PR across rebases and amends —
so once you have assigned an ID to a change, keep it stable.
Example commit message:
Add input validation to the login form
Validates that the email field is non-empty and well-formed before
submission. Returns an error message inline without clearing the form.
Change: login-form-validation
Workflow
1. Write your commits
Work on your feature normally, writing one commit per logical step. Add a Change:
footer to each commit you want to submit for review:
$ git commit -m "Add validation for input fields
Change: input-validation"
$ git commit -m "Wire validation into the form component
Change: form-wiring"
$ git commit -m "Add tests for form validation
Change: validation-tests"
Commits without a Change: footer are included in the push to the base branch but
do not get their own ref or PR.
2. Publish
josh publish
For each qualifying commit Josh pushes a ref under
refs/heads/@changes/<base>/<author>/<change-id>. With GitHub forge integration
enabled, a pull request is created (or updated) for each of these refs automatically.
The first change in the stack targets the repository’s default branch. Each subsequent PR targets the branch of the change before it. Intermediate PRs are automatically marked as draft until the changes before them are merged.
3. Iterate
After receiving review feedback, amend or rebase your commits as needed, keeping the
Change: footers intact:
git rebase -i HEAD~3 # edit commits, preserve Change: footers
josh publish # re-publish; existing PRs are updated, not recreated
As long as the change ID in the footer is preserved through your edits, josh publish
updates the correct existing PRs rather than creating new ones.
4. Merge
Once a PR is approved and its required checks pass, merge it through the forge’s normal UI. Then sync your local branch to account for the merged commit:
josh pull --rebase --autostash
This rebases your remaining local commits on top of the updated upstream state.
--autostash ensures any uncommitted changes are preserved across the operation. After
pulling, the next josh publish will retarget and promote the next PR in the stack
from draft to ready for review.
Without forge integration
josh publish works without forge integration. Josh still
pushes the individual @changes/… refs to the upstream repository; you can then create
pull requests from them manually, or use them as part of a custom review workflow.
Importing projects
When moving to a monorepo setup, especially in existing organisations, it is common that the need to consolidate existing project repositories arises.
The simplest possible case is one where the previous history of a project does not need to be retained. In this case, the projects files can simply be copied into the monoreop at the appropriate location and committed.
If history should be retained, josh can be used for importing a
project as an alternative to built-in git commands like git subtree.
Josh’s filter capability lets you perform transformations on the history of a git repository to arbitrarily (re-)compose paths inside of a repository.
A key aspect of this functionality is that all transformations are
reversible. This means that if you apply a transformation moving
files from the root of a repository to, say, tools/project-b,
followed by an inverse transformation moving files from
tools/project-b back to the root, you receive the same commit hashes
you put in.
We can use this feature to import a project into our monorepo while allowing external users to keep pulling on top of the same git history they already have, just with a new git remote.
There are multiple ways of doing this, with the most common ones
outlined below. You can look at josh#596 for a
discussion of several other methods.
Import with josh-filter
Currently, the easiest way to do this is by using the josh-filter
binary which is a command-line frontend to josh’s filter capabilities.
Inside of our target repository, it would work like this:
-
Fetch the repository we want to import (say, “Project B”, from
$REPO_URL).$ git fetch $REPO_URL masterThis will set the
FETCH_HEADreference to the fetched repository. -
Rewrite the history of that repository through josh to make it look as if the project had always been developed at our target path (say,
tools/project-b).$ josh-filter ':prefix=tools/project-b' FETCH_HEADThis will set the
FILTERED_HEADreference to the rewritten history. -
Merge the rewritten history into our target repository.
$ git merge --allow-unrelated FILTERED_HEADAfter this merge commit, the previously external project now lives at
tools/project-bas expected. -
Any external users can now use the
:/tools/project-bjosh filter to retrieve changes made in the new project location - without the git hashes of their existing commits changing (that is to say, without conflicting).
Import by pushing to josh
If your monorepo is already running a josh-proxy in front of it, you
can also import a project by pushing a project merge to josh.
This has the benefit of not needing to clone the entire monorepo
locally to do a merge, but the drawback of using a different, slightly
slower filter mechanism when exporting the tree back out. For projects
with very large history, consider using the josh-filter mechanism
outlined above.
Pushing a project merge to josh works like this:
-
Assume we have a local checkout of “Project B”, and we want to merge this into our monorepo. There is a
josh-proxyrunning athttps://git.company.name/monorepo.git. We want to merge this project into/tools/project-bin the monorepo. -
In the checkout of “Project B”, add the josh remote:
git remote add josh https://git.company.name/monorepo.git:/tools/project-b.gitNote the use of the
/tools/project-b.gitjosh filter, which points to a path that should not yet exist in the monorepo. -
Push the repository to josh with the
-o mergeoption, creating a merge commit introducing the project history at that location, while retaining its history:git push josh $ref -o merge
Note for Gerrit users
With either method, when merging a set of new commits into a Gerrit repository and going through the standard code-review process, Gerrit might complain about missing Change-IDs in the imported commits.
To work around this, the commits need to first be made “known” to Gerrit. This can be achieved by pushing the new parent of the merge commit to a separate branch in Gerrit directly (without going through the review mechanism). After this Gerrit will accept merge commits referencing that parent, as long as the merge commit itself has a Change-ID.
Some monorepo setups on Gerrit use a special unrestricted branch like
merge-staging for this, to which users with permission to import
projects can force-push unrelated histories.
josh CLI
The josh command is the primary tool for working with filtered git repositories.
It provides projection-aware equivalents of the common git operations.
Installation
cargo install josh-cli --locked --git https://github.com/josh-project/josh.git
josh clone
Clone a repository, optionally applying a filter projection.
josh clone <url> <filter> <out> [options]
| Argument | Description |
|---|---|
<url> | Remote repository URL (HTTPS, SSH, or local path) |
<filter> | Filter spec to apply (e.g. :/docs, :workspace=workspaces/myproject) |
<out> | Local directory to clone into |
Options:
| Flag | Description |
|---|---|
-b, --branch <ref> | Branch or ref to clone (default: HEAD) |
--forge <name> | Forge integration to use (e.g. github) |
--no-forge | Disable forge integration |
Examples:
# Clone only the docs/ subdirectory
josh clone https://github.com/josh-project/josh.git :/docs ./josh-docs
# Clone a workspace projection
josh clone https://github.com/myorg/monorepo.git :workspace=workspaces/frontend ./frontend
# Clone the full repository (no filter)
josh clone https://github.com/josh-project/josh.git :/ ./josh
josh fetch
Fetch from a remote and update the filtered local refs. Equivalent to git fetch but
filter-aware.
josh fetch [options]
Options:
| Flag | Description |
|---|---|
-r, --remote <name> | Remote name or URL to fetch from (default: origin) |
-R, --ref <ref> | Ref to fetch (default: HEAD) |
josh pull
Fetch and integrate changes from a remote. Equivalent to git pull but filter-aware.
josh pull [options]
Options:
| Flag | Description |
|---|---|
-r, --remote <name> | Remote name or URL to pull from (default: origin) |
-R, --ref <ref> | Ref to pull (default: HEAD) |
--rebase | Rebase the current branch on top of the upstream branch |
--autostash | Automatically stash local changes before rebasing |
josh push
Push commits back to the upstream repository. Josh reverses the filter and reconstructs the correct upstream commits, so your changes land in the right place in the monorepo.
josh push [<remote>] [<refspecs>...] [options]
| Argument | Description |
|---|---|
<remote> | Remote name to push to (default: origin) |
<refspecs> | Refs to push (default: current branch) |
Options:
| Flag | Description |
|---|---|
-f, --force | Force-push (non-fast-forward) |
--atomic | Atomic push (all-or-nothing) |
--dry-run | Show what would be pushed without actually pushing |
josh publish
Push each commit as an independent, minimal diff (stacked changes workflow). Each commit with a Change ID is pushed to its own ref and, when forge integration is configured, gets its own pull request.
josh publish [<remote>] [<refspecs>...] [options]
| Argument | Description |
|---|---|
<remote> | Remote name to push to (default: origin) |
<refspecs> | Refs to push (default: current branch) |
Options:
| Flag | Description |
|---|---|
-f, --force | Force-push (non-fast-forward) |
--atomic | Atomic push (all-or-nothing) |
--dry-run | Show what would be pushed without actually pushing |
josh remote
Manage josh-aware remotes.
Note:
josh remote addcan be used in any existing git repository, not only ones originally cloned withjosh clone. This is the standard way to add josh filtering to a repository you already have checked out.
josh remote add
Add a remote with an associated filter projection.
josh remote add <name> <url> <filter> [options]
| Argument | Description |
|---|---|
<name> | Remote name |
<url> | Remote repository URL |
<filter> | Filter spec to associate with this remote |
Options:
| Flag | Description |
|---|---|
--forge <name> | Forge integration (e.g. github) |
--no-forge | Disable forge integration |
Example:
# Add a second remote scoped to the backend/ subdirectory
josh remote add backend https://github.com/myorg/monorepo.git :/services/backend
josh filter
Re-apply the filter for an existing remote to update the local filtered refs. Useful after manually modifying the filter configuration without fetching.
josh filter <remote>
| Argument | Description |
|---|---|
<remote> | Remote name whose filter should be re-applied |
josh auth
Manage authentication credentials for forge integrations. Forge integration is optional and used for automatic pull request management — see Forge integration for details.
josh auth login <forge>
josh auth logout <forge>
The only currently supported forge is github. See
Forge integration for full documentation.
josh-filter (standalone binary)
josh-filter is a lower-level command that rewrites git history using Josh filter specs.
It is intended for scripting and one-off history rewriting tasks rather than day-to-day
development workflows.
By default it reads from HEAD and writes the filtered result to FILTERED_HEAD.
Basic usage:
# Filter HEAD through :/docs and write result to FILTERED_HEAD
josh-filter :/docs
Options:
| Flag | Description |
|---|---|
--file <path> | Read filter spec from a file |
--squash-pattern <pattern> | Squash commits matching the pattern |
--squash-file <path> | Read squash patterns from a file |
--single | Produce a single squashed commit |
-d | Discovery mode: populate cache with probable filters |
-t | Output Chrome tracing data |
-p | Print the filter spec |
-i | Print the filter ID |
--cache-stats | Print cache statistics |
--reverse | Swap input and output (unapply filter) |
History filtering
Josh transforms commits by applying filters to them. As any commit in git represents not just a single state of the file system but also its entire history, applying a filter to a commit produces an entirely new history. The result of a filter is a normal git commit and therefore can be filtered again, making filters chainable.
Syntax
Filters always begin with a colon and can be chained:
:filter1:filter2
When used as part of an URL filters cannot contain white space or newlines. When read from a file
however white space can be inserted between filters (not after the leading colon).
Additionally newlines can be used instead of , inside of composition filters.
Some filters take arguments, and arguments can optionally be quoted using double quotes,
if special characters used by the filter language need to be used (like : or space):
:filter=argument1,"argument2"
Filter options :~(key1="value1",key2="value2")[:filter]
The :~(...)[] syntax allows you to provide filter options (metadata) that affect how the filter
is applied. The options are specified as key-value pairs in parentheses, followed by the actual
filter in square brackets.
Syntax:
:~(option1="value1",option2="value2")[:filter]
Multiple options can be specified by separating them with commas. Option values must be quoted using double quotes.
Example:
:~(key1="value1",key2="value2")[:/sub1]
This applies the :/sub1 filter with the specified options attached as metadata.
History option
The history option controls how the commit history is processed during filtering.
It affects both how commits are walked and how merge commits are handled in the output history.
Available values:
-
history="linear"- Produces a linear history by converting merge commits into regular commits, creating a linear chain of commits in the output history.Example:
:~(history="linear")[:/sub1] -
history="keep-trivial-merges"- Prevents dropping trivial merge commits that would normally be pruned from the output history.Normally, Josh will drop merge commits from the filtered history if their filtered tree is identical to the first parent’s tree. Setting this option to
"keep-trivial-merges"preserves these commits in the output history.Example:
:~(history="keep-trivial-merges")[::file1]Note for users of older versions: In older versions of Josh,
"keep-trivial-merges"was the default behavior. If you’re upgrading from an older version and need to recreate the same history structure, you should explicitly sethistory="keep-trivial-merges"in your filter options.
Gpgsig option
The gpgsig option controls how PGP/GPG signature headers (gpgsig) in commit objects are
handled during filtering.
By default Josh preserves the gpgsig header byte-for-byte. This keeps the commit hash stable
across round-trips but makes the signature invalid (since the tree and parent references change).
Available values:
-
gpgsig="remove"- Strips thegpgsigheader from every filtered commit. Equivalent to the:unsignshorthand filter.Example:
:~(gpgsig="remove")[:/sub1] -
gpgsig="norm-lf"- Normalizes\r\nline endings to\ninside thegpgsigheader before writing the filtered commit.The standard git commit object format uses
\nline endings throughout, including insidegpgsigheaders. Some signing tools or forges write\r\ninstead, which is technically non-standard but valid as far as git is concerned — git treats the header value as opaque bytes. Josh preserves whichever line endings are present in the original commit.An older version of Josh accidentally normalized
\r\nto\nduring filtering. This option restores that behavior and is intended only for deployments that need to reproduce a history produced by the old version — both variants represent the same logical content but produce different commit hashes, causing history to diverge.Example:
:~(gpgsig="norm-lf")[:/sub1]
Available filters
Subdirectory :/a
Take only the selected subdirectory from the input and make it the root
of the filtered tree.
Note that :/a/b and :/a:/b are equivalent ways to get the same result.
Directory ::a/
A shorthand for the commonly occurring filter combination :/a:prefix=a.
File ::a or ::destination=source
Produces a tree with only the specified file.
When using a single argument (::a), the file is placed at the same full path as in the source tree.
When using the destination=source syntax (::destination=source), the file is renamed from source to destination in the filtered tree.
Examples:
::file.txt- Selectsfile.txtand places it atfile.txt::src/file.txt- Selectssrc/file.txtand places it atsrc/file.txt::renamed.txt=src/original.txt- Selectssrc/original.txtand places it atrenamed.txt::subdir/file.txt=src/file.txt- Selectssrc/file.txtand places it atsubdir/file.txt
Note that ::a/b is equivalent to ::a/::b.
Pattern filters (with *) cannot be combined with the destination=source syntax.
Prefix :prefix=a
Take the input tree and place it into subdirectory a.
Note that :prefix=a/b and :prefix=b:prefix=a are equivalent.
Composition :[:filter1,:filter2,...,:filterN]
Compose a tree by overlaying the outputs of :filter1 … :filterN on top of each other.
It is guaranteed that each file will only appear at most once in the output. The first filter
that consumes a file is the one deciding it’s mapped location. Therefore the order in which
filters are composed matters.
Inside of a composition x=:filter can be used as an alternative spelling for
:filter:prefix=x.
Exclusion :exclude[:filter]
Remove all paths present in the output of :filter from the input tree.
It should generally be avoided to use any filters that change paths and instead only
use filters that select paths without altering them.
Invert :invert[:filter]
A shorthand syntax that applies the inverse of the composed filter. The inverse of a filter is
a filter that undoes the transformation. For example, the inverse of :/sub1 (subdirectory)
is :prefix=sub1 (prefix), and vice versa.
Example:
:invert[:/sub1]
This is equivalent to :prefix=sub1, which takes the input tree and places it into
the sub1 subdirectory.
Multiple filters can be provided in the compose:
:invert[:/sub1,:/sub2]
This inverts the composition of :/sub1 and :/sub2.
Scope :<X>[..]
A shorthand syntax that expands to :X:[..]:invert[:X], where:
:Xis a filter (without built-in compose):[..]is a compose filter (like in:exclude)
This filter first applies :X to scope the input, then applies the compose filter :[..],
and finally inverts :X to restore the original scope. This is useful when you want to
apply a composition filter within a specific scope and then restore the original structure.
Example:
:<:/sub1>[::file1,::file2]
This is equivalent to :/sub1:[::file1,::file2]:invert[:/sub1], which:
- Selects the
sub1subdirectory (applies:/sub1) - Applies the composition filter to select
file1andfile2(applies:[::file1,::file2]) - Restores the original scope by inverting the subdirectory selection (applies
:invert[:/sub1])
Stored :+path/to/file
Looks for a file with a .josh extension at the specified path and applies the filter defined in that file.
The path argument should be provided without the .josh extension, as it will be automatically appended.
For example, :+st/config will look for a file at st/config.josh and apply the filter defined in that file.
The resulting tree will contain the contents specified by the filter in the .josh file.
Stored filters apply from the root of the repository, making them useful for configuration files that define filters to be applied at the repository root level.
Workspace :workspace=a
Like stored filters, but with an additional “base” component that first selects the specified directory
(called the “workspace root”) before applying the filter from the workspace.josh file inside it.
The workspace filter is equivalent to :/a combined with applying a stored filter, where the filter is read
from workspace.josh within the selected directory. The resulting tree will contain the contents of the
workspace root as well as additional files specified in the workspace.josh file.
(see Workspaces)
Text replacement :replace("regex_0":"replacement_0",...,"regex_N":"replacement_N")
Applies the supplied regular expressions to every file in the input tree.
Signature removal :unsign
The default behaviour of Josh is to copy, if it exists, the signature of the original commit in the filtered commit. This makes the signature invalid, but allows a perfect round-trip: josh will be able to recreate the original commit from the filtered one.
This behaviour might not be desirable, and this filter drops the signatures from the history.
It is a shorthand for :~(gpgsig="remove")[:/]. See the gpgsig option for
additional gpgsig-related options.
Pattern filters
The following filters accept a glob like pattern X that can contain * to
match any number of characters. Note that two or more consecutive wildcards (**) are not
allowed.
Match directories ::X/
All matching subdirectories in the input root
Match files or directories ::X
All matching files or directories in the input root
Match nested directories ::**/X/
All subdirectories matching the pattern in arbitrarily deep subdirectories of the input
Match nested files ::**/X
All files matching the pattern in arbitrarily deep subdirectories of the input
History filters
These filter do not modify git trees, but instead only operate on the commit graph.
Linearise history :linear[:filter]
Produce a filtered history that does not contain any merge commits. This is done by simply dropping all parents except the first on every commit.
Filter specific parts of the history :rev(…)
The :rev(...) filter allows you to apply different filters to different parts of the commit history based on commit relationships. Each entry in the filter specifies a condition and a filter to apply when that condition matches.
Syntax:
:rev(
<operator><sha>:filter
_:filter
...
)
Commit references must be full 40-character SHA-1 hashes; short/abbreviated SHAs are not accepted.
Operators:
<- Strict ancestor match: matches if the commit is an ancestor of<sha>AND the commit is not equal to<sha><=- Inclusive ancestor match: matches if the commit is an ancestor of<sha>OR the commit equals<sha>==- Exact match: matches only if the commit equals<sha>_- Default filter: matches any commit that doesn’t match any previous condition (no SHA needed)
Matching behavior:
- Rules are evaluated in the order they are specified
- First match wins - once a condition matches, that filter is applied and no further rules are checked
- The default filter (
_) will match any commit that hasn’t matched a previous rule, making any rules after it unreachable
Examples:
:rev(==def4567890123456789012345678901234567890:prefix=new,<=abc123450123456789012345678901234567890:prefix=old)
This applies :prefix=old to commit abc12345... and all its ancestors, and :prefix=new only to commit def45678....
:rev(<abc123450123456789012345678901234567890:prefix=old,_:prefix=default)
This applies :prefix=old to all ancestors of that commit (but not the commit itself), and :prefix=default to all other commits (including that commit and any commits after it).
Prune trivial merge commits :prune=trivial-merge
Produce a history that skips all merge commits whose tree is identical to the first parents tree. Normally Josh will keep all commits in the filtered history whose tree differs from any of it’s parents.
Commit message rewriting :"template" or :"template";"regex"
Rewrite commit messages using a template string. The template can use regex capture groups to extract and reformat parts of the original commit message, as well as special template variables for commit metadata.
Simple message replacement:
:"New message"
This replaces all commit messages with “New message”.
Using regex with named capture groups:
:"[{type}] {message}";"(?s)^(?P<type>fix|feat|docs): (?P<message>.+)$"
This uses a regex to match the original commit message and extract named capture groups ({type} and {message})
which are then used in the template. The regex (?s)^(?P<type>fix|feat|docs): (?P<message>.+)$ matches
commit messages starting with “fix:”, “feat:”, or “docs:” followed by a message, and the template
reformats them as [type] message.
Using template variables: The template supports special variables that provide access to commit metadata:
{#}- The tree object ID (SHA-1 hash) of the commit{@}- The commit object ID (SHA-1 hash){/path}- The content of the file at the specified path in the commit tree{#path}- The object ID (SHA-1 hash) of the tree entry at the specified path
Regex capture groups take priority over template variables. If a regex capture group has the same name as a template variable, the capture group value will be used.
Example:
:"Message: {#} {@}"
This replaces commit messages with “Message: “ followed by the tree ID and commit ID.
Combining regex capture groups and template variables:
:"[{type}] {message} (commit: {@})";"(?s)^(?P<type>Original) (?P<message>.+)$"
This combines regex capture groups ({type} and {message}) with template variables ({@} for the commit ID).
Removing text from messages:
:"";"TODO"
This removes all occurrences of “TODO” from commit messages by matching “TODO” and replacing it with an empty string.
The regex pattern can use (?s) to enable dot-all mode (so . matches newlines), allowing it to work with
multi-line commit messages that include both a subject line and a body.
Pin tree contents
Pin revision of a subtree to revision of the parent commit.
In practical terms, it means that file and folder updates are “held off”, and revisions are “pinned”. If a tree entry already existed in the parent revision, that version will be chosen. Otherwise, the tree entry will not appear in the filtered commit.
The source of the parent revision is always the first commit parent.
Note that this filter is only practical when used with :hook or workspace.josh,
as it should apply per-revision only. Applying :pin for the whole history
will result in the subtree being excluded from all revisions.
Refer to pin_filter_workspace.t and pin_filter_hook.t for reference.
Filter order matters
Filters are applied in the left-to-right order they are given in the filter specification,
and they are not commutative.
For example, this command will filter out just the josh documentation, and store it in a
ref named FILTERED_HEAD:
$ josh-filter :/docs:prefix=josh-docs
However, this command will produce an empty branch:
$ josh-filter :prefix=josh-docs:/docs
What’s happening in the latter command is that because the prefix filter is applied first, the
entire josh history already lives within the josh-docs directory, as it was just
transformed to exist there. Thus, to still get the docs, the command would need to be:
$ josh-filter :prefix=josh-docs:/josh-docs/docs
which will contain the josh documentation at the base of the tree. We’ve lost the prefix, what
gives?? Because the original git tree was already transformed, and then the subdirectory filter
was applied to pull documentation from josh-docs/docs, the prefix is gone - it was filtered out
again by the subdirectory filter. Thus, the order in which filters are provided is crucial, as each
filter further transforms the latest transformation of the tree.
Experimental features
Experimental features are opt-in and must be enabled at runtime by setting the
environment variable JOSH_EXPERIMENTAL_FEATURES=1. Their behaviour or syntax
may change in future releases.
Filters
Object reference :&path
Reads the git object at path (a file or directory) and replaces its content with a text blob
containing the object’s SHA-1 hash. This turns a real file or tree into a lightweight pointer.
If path does not exist in the input tree, the filter is a no-op.
Example: :&sub1 on a commit where sub1 is a directory produces a file sub1 whose content
is the 40-character SHA of that directory tree.
Object dereference :*path
Reads the SHA-1 hash stored as text in the file at path and replaces that file with the git
object the hash points to (a file or a directory tree). This follows the pointer written by
:&path.
If path does not exist the filter is a no-op. If the content is not a valid SHA or the object
is not present in the repository, an error is returned.
Example: given a file sub1 whose content is the SHA of a directory tree, :*sub1 replaces
that file with the actual directory tree at sub1.
Object dereference into subdirectory :*/path
Dereferences the pointer stored at path and then extracts the resulting object directly at the
repository root, discarding the path prefix. This is the typical way to restore content that
was previously stored with :&path.
Expands to :*path:/path. The canonical printed form is the expanded syntax.
Example: :*/sub1 on a tree where sub1 holds a SHA of a directory returns that directory’s
contents at the root, as if sub1 never existed.
Tree ID capture :#path[filter]
Applies filter to the current tree and writes the SHA-1 hash of the resulting tree as a text
file at path. The filter itself does not appear in the output — only the hash it produces.
This lets you record a stable, content-addressed reference to a subtree alongside other files.
Example: :#version.txt[:/sub1] writes the SHA of the sub1 directory tree into version.txt.
Starlark filter :!path/to/script[context filter]
Evaluates a Starlark script stored in the repository
and uses the filter it produces. The script file is loaded from path with a .star extension
appended automatically.
The optional [context filter] scopes the tree that is visible to the script: the context
filter is applied to the input tree first, and the result is what the script sees as tree. The
context filter does not affect the filter that the script returns — it only controls what the
script can read.
The script file itself is always included in the output tree alongside whatever the script’s filter selects.
Script contract
The script must assign a Filter value to the variable named filter. At the start of
execution filter is pre-set to a no-op filter, so a minimal script that selects nothing can
simply leave it unchanged, or assign a new value:
filter = filter.subdir("src")
Global variables available in the script
| Variable | Type | Description |
|---|---|---|
filter | Filter | Starts as a no-op filter. Assign your result here. |
tree | Tree | The commit tree (or the context-filtered tree if a context filter was given). |
Global functions
| Function | Description |
|---|---|
compose([f1, f2, ...]) | Overlay multiple filters, same semantics as :[f1,f2,...]. |
Filter methods
All methods return a new Filter and can be chained.
| Method | Description |
|---|---|
filter.subdir(path) | Select a subdirectory and make it the root. |
filter.prefix(path) | Place the tree under a subdirectory prefix. |
filter.file(path) | Select a single file, keeping its path. |
filter.rename(dst, src) | Select src and place it at dst. |
filter.pattern(pattern) | Select files/directories matching a glob pattern (* allowed). |
filter.chain(other) | Apply other after this filter. |
filter.nop() | No-op; passes the tree through unchanged. |
filter.empty() | Produce an empty tree. |
filter.linear() | Linearise history (drop merge parents). |
filter.workspace(path) | Apply the workspace filter rooted at path. |
filter.stored(path) | Apply the stored filter at path.josh. |
filter.starlark(path, context_filter) | Apply another Starlark filter with an optional context filter. |
filter.author(name, email) | Override the commit author. |
filter.committer(name, email) | Override the committer. |
filter.message(template) | Rewrite commit messages using a template. |
filter.unsign() | Strip GPG signatures from commits. |
filter.prune_trivial_merge() | Remove merge commits whose tree equals their first parent. |
filter.hook(hook) | Apply a hook filter. |
filter.with_meta(key, value) | Attach metadata to the filter. |
filter.is_nop() | Returns True if the filter is a no-op. |
filter.peel() | Strip metadata from the filter. |
Tree methods
The tree object provides read-only access to the commit tree visible to the script.
| Method | Description |
|---|---|
tree.file(path) | Returns the text content of the file at path, or an empty string if absent or binary. |
tree.files(path) | Returns a list of file paths that are direct children of path. |
tree.dirs(path) | Returns a list of directory paths that are direct children of path. |
tree.tree(path) | Returns a Tree object rooted at path. |
Example
A script that dynamically includes every top-level subdirectory as a prefixed subtree:
# st/config.star
parts = [filter.subdir(d).prefix(d) for d in tree.dirs("")]
filter = compose(parts)
Applied with :!st/config.
Working with workspaces
For the sake of this example we will assume a josh-proxy instance is running and serving a
repo on http://josh/world.git with some shared code in shared.
Create a new workspace
To create a new workspace in the path ws/hello simply clone it as if it already exists:
$ git clone http://josh/world.git:workspace=ws/hello.git
git will report that you appear to have cloned an empty repository if that path does not
yet exist.
If you don’t get this message it means that the path already exists in the repo but may
not yet have configured any path mappings.
The next step is to add some path mapping to the workspace.josh file in the root of the
workspace:
$ cd hello
$ echo "mod/a = :/shared/a" > workspace.josh
And and commit the changes:
$ git add workspace.josh
$ git commit -m "add workspace"
If the path did not exist previously, the resulting commit will be a root commit that does not share
any history with the world.git repo.
This means a normal git push will be rejected at this point.
To get correct history, the
resulting commit needs to be a based on the history that already exists in world.git.
There is however no way to do this locally, because we don’t have the data required for this.
Also, the resulting tree should contain the contents of shared/a mapped to mod/a which
means it needs to be produced on the server side because we don’t have the files to put there.
To accomplish that push with the create option:
$ git push -o create origin master
Note: While it is perfectly possible to use Josh without a code review system, it is strongly recommended to use some form of code review to be able to inspect commits created by Josh before they get into the immutable history of your main repository.
As the resulting commit is created on the server side we need to get it from the server:
$ git pull --rebase
Now you should see mod/a populated with the content of the shared code.
Map a shared path into a workspace
To add shared path to a location in the workspace that does not exist yet, first add an
entry to the workspace.josh file and commit that.
You can add the mapping at the end of the file using a simple syntax, and rely on josh to rewrite it for you in a canonical way.
...
new/mapping/location/in/workspace = :/new/mapping/location/in/monorepo
At this point the path is of course empty, so the commit needs to be pushed to the server. When the same commit is then fetched back it will have the mapped path populated with the shared content.
When the commit is pushed, josh will notify you about the rewrite. You can fetch the rewritten commit using the advertised SHA. Alternatively, you can use git sync which will do it for you.
Publish a non-shared path into a shared location
The steps here are exactly the same as for the mapping example above. The only difference being that the path already exists in the workspace but not in the shared location.
Remove a mapping
To remove a mapping remove the corresponding entry from the workspace.josh file.
The content of the previously shared path will stay in the workspace. That means the main
repo will have two copies of that path from that point on. Effectively creating a fork of that code.
Remove a mapped path
To remove a mapped path as well as it’s contents, remove the entry from the
workspace.josh file and also remove the path inside the workspace using git rm.
Forge Integration
Forge integration is an optional feature that connects josh to a code hosting
platform (a “forge”) such as GitHub. It is not required for normal git operations —
cloning, pushing, and pulling all work without it, even with private repositories.
Forge integration is specifically used for automatic pull request management during
stacked changes workflows. When you push a stack of
commits with josh publish, josh can automatically
create or update one pull request per commit on the forge.
GitHub
GitHub is currently the only supported forge.
Authentication
josh uses GitHub’s device flow
for authentication — the same flow used by the official GitHub CLI.
Log in:
josh auth login github
This prints a URL and a one-time code. If clipboard access is available the code is also copied automatically, otherwise it is only printed to the terminal. Open the URL in your browser, enter the code, and authorize the application.
The token is stored in ~/.config/josh-cli/credentials.json with 0600 permissions.
Log out:
josh auth logout github
Alternatively, set the GH_TOKEN environment variable to a
personal access token.
This takes precedence over any stored token and is useful in CI environments:
export GH_TOKEN=ghp_...
What forge integration enables
Once authenticated, josh publish will, in addition to pushing the git refs:
- Create a pull request for each commit that does not yet have one.
- Update existing pull requests (title, body, base branch) when commits are amended or rebased.
- Manage draft status automatically: pull requests whose base branch is not the repository’s default branch are marked as drafts, and promoted to “ready for review” once they target the default branch directly.
See the Stacked changes guide for a full walkthrough.
GraphQL API
Josh implements a GraphQL API to query the content of repositories without a need to clone them via a git client.
The API is exposed at:
http://hostname/~/graphql/name_of_repo.git
To explore the API and generated documentation, an interactive GraphQL shell can be found at:
http://hostname/~/graphiql/name_of_repo.git
josh-proxy
Note:
josh-proxyis primarily intended for infrastructure and DevOps use cases — for example, to deploy a shared caching proxy for your team or CI/CD pipelines so that ordinarygit cloneworks without any special client tooling. Individual developers can use thejoshCLI instead, which requires no server setup.
Josh provides an HTTP proxy server that can be used with any git hosting service which communicates via HTTP.
It needs the URL of the upstream server and a local directory to store its data.
Optionally, a port to listen on can be specified. For example, running a local josh-proxy
instance for github.com on port 8000:
$ docker run -p 8000:8000 -e JOSH_REMOTE=https://github.com -v josh-vol:/data/git joshproject/josh-proxy:latest
Note: While
josh-proxyis intended to be used with a http upstream it can also proxy for an ssh upstream whensshis used instead ofhttpin the url. In that case it will use the ssh private key of the current user (just like git would) and take the username from the downstream http request. This mode of operation can be useful for evaluation or local use by individual developers but should never be used on a normal server deployment.
For a first example of how to make use of josh, just the josh documentation can be checked out as its own repository via this command:
$ git clone http://localhost:8000/josh-project/josh.git:/docs.git
Note: This URL needs to contain the
.gitsuffix twice: once after the original path and once more after the filter spec.
josh-proxy supports read and write access to the repository, so when making changes
to any files in the filtered repository, you can just commit and push them
like you are used to.
Note: The proxy is semantically stateless. The data inside the docker volume is only persisted across runs for performance reasons. This has two important implications for deployment:
- The data does not need to be backed up unless working with very large repos where rebuilding would be very expensive. And 2) Multiple instances of josh-proxy can be used interchangeably for availability or load balancing purposes.
URL syntax and breakdown
This is the URL of a josh-proxy instance:
http://localhost:8000
This is the repository location on the upstream host on which to perform the filter operations:
/josh-project/josh.git
This is the set of filter operations to perform:
:/docs.git
Much more information on the available filters and the syntax of all filters is covered in detail in the filters section.
Repository naming
By default, a git URL is used to point to the remote repository to download and also to dictate how the local repository shall be named. It’s important to learn that the last name in the URL is what the local git client will name the new, local repository. For example:
$ git clone http://localhost:8000/josh-project/josh.git:/docs.git
will create the new repository at directory docs, as docs.git is the last name in the URL.
By default, this leads to rather odd-looking repositories when the prefix filter is the final
filter of a URL:
$ git clone http://localhost:8000/josh-project/josh.git:/docs:prefix=josh-docs.git
This will still clone just the josh documentation, but the final directory structure will look like this:
- prefix=josh-docs
- josh-docs
- <docs>
Having the root repository directory name be the fully-specified filter is most likely not what was
intended. This results from git’s reuse and repurposing of the remote URL, as prefix=josh-docs
is the final name in the URL. With no other alternatives, this gets used for the repository name.
To explicitly specify a repository name, provide the desired name after the URL when cloning a new repository:
$ git clone http://localhost:8000/josh-project/josh.git:/docs:prefix=josh-docs.git my-repo
Serving a github repo
To prompt for authentication, Josh relies on the server requesting it on fetch. When using a server which doesn’t need authentication for fetching, Josh will not automatically prompt for authentication when pushing, and it will be impossible to provide credentials for pushing.
To solve this, you need to pass the --require-auth option to josh-proxy.
This can be done with JOSH_EXTRA_OPTS when using the docker image like so:
docker run -d -p 8000:8000 -e JOSH_EXTRA_OPTS="--require-auth" -e JOSH_REMOTE=https://github.com/josh-project -v josh-vol:$(pwd)/git_data joshproject/josh-proxy:latest
In this example, we serve only the josh-project repositories. Be aware that if you don’t add the organisation or repo URL, your instance will be able to serve all of github. You can (and should) restrict it to your repository or organisation by making it part of the URL.
Container configuration
Container options
| Variable | Meaning |
|---|---|
JOSH_REMOTE
|
HTTP remote, including protocol.
Example: https://github.com
|
JOSH_REMOTE_SSH
|
SSH remote, including protocol.
Example: ssh://git@github.com
|
JOSH_HTTP_PORT
|
HTTP port to listen on. Default: 8000 |
JOSH_SSH_PORT
|
SSH port to listen on. Default: 8022 |
JOSH_SSH_TIMEOUT
|
Timeout, in seconds, for a single request when serving repos over SSH. This time should cover fetch from upstream repo, filtering, and serving repo to client. Default: 300 |
JOSH_EXTRA_OPTS
|
Extra options passed directly to
josh-proxy process
|
Container volumes
| Volume | Purpose |
|---|---|
/data/git
|
Git cache volume. If this volume is not mounted, the cache will be lost every time the container is shut down. |
/data/keys
|
SSH server keys. If this volume is not mounted, a new key will be generated on each container startup |
Container services
The Josh container uses s6-overlay as a process supervisor to manage multiple services:
Long-running services
-
josh-proxy - The main HTTP proxy service that handles git requests, applies filters, and communicates with upstream repositories.
-
sshd - OpenSSH server that provides SSH access to the proxy.
One-shot initialization services
These services run once during container startup to prepare the environment:
-
josh-generate-keys - Generates SSH server keys at
/data/keys/.ssh/id_ed25519if they don’t already exist. Therefore, SSH server keys are persisted across container restarts when the volume is mounted. -
sshd-generate-config - Generates the sshd configuration file from a template, applying configuration environment variables.
SSH access
Josh supports SSH access.
To use SSH, you need to add the following lines to your ~/.ssh/config:
Host your-josh-instance.com
ForwardAgent yes
PreferredAuthentications publickey
Alternatively, you can pass those options via GIT_SSH_COMMAND:
GIT_SSH_COMMAND="ssh -o PreferredAuthentications=publickey -o ForwardAgent=yes" git clone ssh://git@your-josh-instance.internal/...
In other words, you need to ensure SSH agent forwarding is enabled.
How SSH access works
sequenceDiagram
participant User as Git CLI<br/>(User Machine)
participant Agent as SSH Agent<br/>(User Machine)
participant SSHD as sshd<br/>(Josh Container)
participant Shell as josh-ssh-shell<br/>(Josh Container)
participant Proxy as josh-proxy<br/>(Josh Container)
participant Upstream as Upstream Repo<br/>(e.g., GitHub)
User->>Agent: Load SSH keys
Note over User,Agent: User's SSH keys remain<br/>on local machine
User->>SSHD: git clone ssh://git@josh:8022/repo.git:/filter.git
Note over User,SSHD: SSH connection with<br/>ForwardAgent=yes
SSHD->>Agent: Forward SSH_AUTH_SOCK
Note over SSHD,Agent: Agent socket forwarded<br/>into container
SSHD->>Shell: Execute git-upload-pack
Note over Shell: Validates SSH_AUTH_SOCK<br/>is a valid socket
Shell->>Proxy: HTTP POST /serve_namespace
Note over Shell,Proxy: Sends command, repo path,<br/>and SSH_AUTH_SOCK path
Proxy->>Upstream: git fetch (via SSH)
Note over Proxy,Upstream: Uses forwarded SSH_AUTH_SOCK<br/>for authentication
Upstream->>Agent: Authenticate via SSH_AUTH_SOCK
Agent-->>Upstream: Sign challenge with user's key
Upstream-->>Proxy: Repository objects
Proxy->>Proxy: Apply josh filter
Note over Proxy: Transform repository<br/>based on filter spec
Proxy->>Shell: Serve filtered repo via Unix sockets
Shell->>SSHD: Proxy git protocol
SSHD->>User: Filtered repository
Note over User,Upstream: User's SSH keys never leave<br/>their machine. Authentication<br/>is delegated through the<br/>forwarded agent socket.
-
SSH Agent Forwarding: The user’s SSH agent socket is forwarded into the container, allowing josh-proxy to authenticate to upstream repositories without accessing the user’s private keys directly.
-
Two-stage architecture:
sshdhandles the SSH connection and launchesjosh-ssh-shell, which then communicates withjosh-proxyvia HTTP. -
Security: Private SSH keys never enter the container. All authentication is performed by delegating to the user’s local SSH agent through the forwarded socket.
Testing
Currently the Josh project mainly uses integration tests for it’s verification, so make sure you will be able to run and check them.
The following sections will describe how to run the different kind’s of tests used for the verification of the Josh project.
UnitTests & DocTests
cargo test --all
Integration Tests
1. Setup the test environment
Due to the fact that the integration tests need additional tools and a more complex environment and due to the fact that the integration test are done using cram. you will need to crate an extra environment to run these tests. To simplify the setup of the integration testing we have set up a Nix Shell environment which you can start by using the following command if you have installed the Nix Shell.
Attention: Currently it is still necessary to install the following tools in your host system.
- curl
- hyper_cgi
cargo install hyper_cgi --features=test-server
Setup the Nix Shell
Attention: When running this command the first time, this command will take quite a bit to finish. You also will need internet access while executing this command. Depending on performance of your connection the command will take more or less time.
nix-shell shell.nix
Once the command is finished you will be prompted with the nix-shell which will provide the needed shell environment to execute the integration tests.
2. Verify you have built all necessary binaries
cargo build
cargo build --bin josh-filter
cargo build --manifest-path josh-proxy/Cargo.toml
cargo build --manifest-path josh-ui/Cargo.toml
3. Setup static files for the josh-ui
cd josh-ui
trunk build
cd ..
4. Run the integration tests
Attention: Be aware that all tests except the once in experimental should be green.
sh run-tests.sh -v tests/
UI Tests
TBD: Currently disabled, stabilize, enable and document process.