History filtering

Josh transforms commits by applying filters to them. As any commit in git represents not just a single state of the file system but also its entire history, applying a filter to a commit produces an entirely new history. The result of a filter is a normal git commit and therefore can be filtered again, making filters chainable.

Syntax

Filters always begin with a colon and can be chained:

:filter1:filter2

When used as part of an URL filters cannot contain white space or newlines. When read from a file however white space can be inserted between filters (not after the leading colon). Additionally newlines can be used instead of , inside of composition filters.

Some filters take arguments, and arguments can optionally be quoted using double quotes, if special characters used by the filter language need to be used (like : or space):

:filter=argument1,"argument2"

Available filters

Subdirectory :/a

Take only the selected subdirectory from the input and make it the root of the filtered tree. Note that :/a/b and :/a:/b are equivalent ways to get the same result.

Directory ::a/

A shorthand for the commonly occurring filter combination :/a:prefix=a.

File ::a or ::destination=source

Produces a tree with only the specified file.

When using a single argument (::a), the file is placed at the same full path as in the source tree. When using the destination=source syntax (::destination=source), the file is renamed from source to destination in the filtered tree.

Examples:

  • ::file.txt - Selects file.txt and places it at file.txt
  • ::src/file.txt - Selects src/file.txt and places it at src/file.txt
  • ::renamed.txt=src/original.txt - Selects src/original.txt and places it at renamed.txt
  • ::subdir/file.txt=src/file.txt - Selects src/file.txt and places it at subdir/file.txt

Note that ::a/b is equivalent to ::a/::b. Pattern filters (with *) cannot be combined with the destination=source syntax.

Prefix :prefix=a

Take the input tree and place it into subdirectory a. Note that :prefix=a/b and :prefix=b:prefix=a are equivalent.

Composition :[:filter1,:filter2,...,:filterN]

Compose a tree by overlaying the outputs of :filter1 ... :filterN on top of each other. It is guaranteed that each file will only appear at most once in the output. The first filter that consumes a file is the one deciding it's mapped location. Therefore the order in which filters are composed matters.

Inside of a composition x=:filter can be used as an alternative spelling for :filter:prefix=x.

Exclusion :exclude[:filter]

Remove all paths present in the output of :filter from the input tree. It should generally be avoided to use any filters that change paths and instead only use filters that select paths without altering them.

Invert :invert[:filter]

A shorthand syntax that applies the inverse of the composed filter. The inverse of a filter is a filter that undoes the transformation. For example, the inverse of :/sub1 (subdirectory) is :prefix=sub1 (prefix), and vice versa.

Example:

:invert[:/sub1]

This is equivalent to :prefix=sub1, which takes the input tree and places it into the sub1 subdirectory.

Multiple filters can be provided in the compose:

:invert[:/sub1,:/sub2]

This inverts the composition of :/sub1 and :/sub2.

Scope :<X>[..]

A shorthand syntax that expands to :X:[..]:invert[:X], where:

  • :X is a filter (without built-in compose)
  • :[..] is a compose filter (like in :exclude)

This filter first applies :X to scope the input, then applies the compose filter :[..], and finally inverts :X to restore the original scope. This is useful when you want to apply a composition filter within a specific scope and then restore the original structure.

Example:

:<:/sub1>[::file1,::file2]

This is equivalent to :/sub1:[::file1,::file2]:invert[:/sub1], which:

  1. Selects the sub1 subdirectory (applies :/sub1)
  2. Applies the composition filter to select file1 and file2 (applies :[::file1,::file2])
  3. Restores the original scope by inverting the subdirectory selection (applies :invert[:/sub1])

Stored :+path/to/file

Looks for a file with a .josh extension at the specified path and applies the filter defined in that file. The path argument should be provided without the .josh extension, as it will be automatically appended.

For example, :+st/config will look for a file at st/config.josh and apply the filter defined in that file. The resulting tree will contain the contents specified by the filter in the .josh file.

Stored filters apply from the root of the repository, making them useful for configuration files that define filters to be applied at the repository root level.

Workspace :workspace=a

Like stored filters, but with an additional "base" component that first selects the specified directory (called the "workspace root") before applying the filter from the workspace.josh file inside it.

The workspace filter is equivalent to :/a combined with applying a stored filter, where the filter is read from workspace.josh within the selected directory. The resulting tree will contain the contents of the workspace root as well as additional files specified in the workspace.josh file. (see Workspaces)

Text replacement :replace("regex_0":"replacement_0",...,"regex_N":"replacement_N")

Applies the supplied regular expressions to every file in the input tree.

Signature removal :unsign

The default behaviour of Josh is to copy, if it exists, the signature of the original commit in the filtered commit. This makes the signature invalid, but allows a perfect round-trip: josh will be able to recreate the original commit from the filtered one.

This behaviour might not be desirable, and this filter drops the signatures from the history.

Pattern filters

The following filters accept a glob like pattern X that can contain * to match any number of characters. Note that two or more consecutive wildcards (**) are not allowed.

Match directories ::X/

All matching subdirectories in the input root

Match files or directories ::X

All matching files or directories in the input root

Match nested directories ::**/X/

All subdirectories matching the pattern in arbitrarily deep subdirectories of the input

Match nested files ::**/X

All files matching the pattern in arbitrarily deep subdirectories of the input

History filters

These filter do not modify git trees, but instead only operate on the commit graph.

Linearise history :linear

Produce a filtered history that does not contain any merge commits. This is done by simply dropping all parents except the first on every commit.

Filter specific parts of the history :rev(<sha_0>:filter_0,...,<sha_N>:filter_N)

Produce a history where the commits specified by <sha_N> are replaced by the result of applying :filter_N to it.

It will appear like <sha_N> and all its ancestors are also filtered with <filter_N>. If an ancestor also has a matching entry in the :rev(...) it's filter will replace <filter_N> for all further ancestors (and so on).

This special value 0000000000000000000000000000000000000000 can be used as a <sha_n> to filter commits that don't match any of the other shas.

Start filtering from a specific commit :from(:filter)

Produce a history that keeps the original history up to and including the specified commit <sha> unchanged, but applies the given :filter to all commits after that commit.

Prune trivial merge commits :prune=trivial-merge

Produce a history that skips all merge commits whose tree is identical to the first parents tree. Normally Josh will keep all commits in the filtered history whose tree differs from any of it's parents.

Commit message rewriting :"template" or :"template";"regex"

Rewrite commit messages using a template string. The template can use regex capture groups to extract and reformat parts of the original commit message.

Simple message replacement:

:"New message"

This replaces all commit messages with "New message".

Using regex with named capture groups:

:"[{type}] {message}";"(?s)^(?P<type>fix|feat|docs): (?P<message>.+)$"

This uses a regex to match the original commit message and extract named capture groups ({type} and {message}) which are then used in the template. The regex (?s)^(?P<type>fix|feat|docs): (?P<message>.+)$ matches commit messages starting with "fix:", "feat:", or "docs:" followed by a message, and the template reformats them as [type] message.

Removing text from messages:

:"";"TODO"

This removes all occurrences of "TODO" from commit messages by matching "TODO" and replacing it with an empty string. The regex pattern can use (?s) to enable dot-all mode (so . matches newlines), allowing it to work with multi-line commit messages that include both a subject line and a body.

Pin tree contents

Pin revision of a subtree to revision of the parent commit.

In practical terms, it means that file and folder updates are "held off", and revisions are "pinned". If a tree entry already existed in the parent revision, that version will be chosen. Otherwise, the tree entry will not appear in the filtered commit.

The source of the parent revision is always the first commit parent.

Note that this filter is only practical when used with :hook or workspace.josh, as it should apply per-revision only. Applying :pin for the whole history will result in the subtree being excluded from all revisions.

Refer to pin_filter_workspace.t and pin_filter_hook.t for reference.

Filter order matters

Filters are applied in the left-to-right order they are given in the filter specification, and they are not commutative.

For example, this command will filter out just the josh documentation, and store it in a ref named FILTERED_HEAD:

$ josh-filter :/docs:prefix=josh-docs

However, this command will produce an empty branch:

$ josh-filter :prefix=josh-docs:/docs

What's happening in the latter command is that because the prefix filter is applied first, the entire josh history already lives within the josh-docs directory, as it was just transformed to exist there. Thus, to still get the docs, the command would need to be:

$ josh-filter :prefix=josh-docs:/josh-docs/docs

which will contain the josh documentation at the base of the tree. We've lost the prefix, what gives?? Because the original git tree was already transformed, and then the subdirectory filter was applied to pull documentation from josh-docs/docs, the prefix is gone - it was filtered out again by the subdirectory filter. Thus, the order in which filters are provided is crucial, as each filter further transforms the latest transformation of the tree.