Airbyte Protocol Docker Interface
Summary
The Airbyte Protocol describes a series of structs and interfaces for building data pipelines. The Protocol article describes those interfaces in language agnostic pseudocode, this article transcribes those into docker commands. Airbyte's implementation of the protocol is all done in docker. Thus, this reference is helpful for getting a more concrete look at how the Protocol is used. It can also be used as a reference for interacting with Airbyte's implementation of the Protocol.
Source
Pseudocode:
spec() -> ConnectorSpecification
check(Config) -> AirbyteConnectionStatus
discover(Config) -> AirbyteCatalog
read(Config, ConfiguredAirbyteCatalog, State) -> Stream<AirbyteRecordMessage | AirbyteStateMessage>
Docker:
docker run --rm -i <source-image-name> spec
docker run --rm -i <source-image-name> check --config <config-file-path>
docker run --rm -i <source-image-name> discover --config <config-file-path>
docker run --rm -i <source-image-name> read --config <config-file-path> --catalog <catalog-file-path> [--state <state-file-path>] > message_stream.json
The read command will emit a stream records to STDOUT.
Destination
Pseudocode:
spec() -> ConnectorSpecification
check(Config) -> AirbyteConnectionStatus
write(Config, AirbyteCatalog, Stream<AirbyteMessage>(stdin)) -> Stream<AirbyteStateMessage>
Docker:
docker run --rm -i <destination-image-name> spec
docker run --rm -i <destination-image-name> check --config <config-file-path>
cat <&0 | docker run --rm -i <destination-image-name> write --config <config-file-path> --catalog <catalog-file-path>
The write command will consume AirbyteMessages from STDIN.
I/O:
- Connectors receive arguments on the command line via JSON files. e.g. --catalog catalog.json
- They read AirbyteMessages from STDIN. The destinationwriteaction is the only command that consumesAirbyteMessages.
- They emit AirbyteMessages on STDOUT.
Additional Docker Image Requirements
Environment variable: AIRBYTE_ENTRYPOINT
The Docker image must contain an environment variable called AIRBYTE_ENTRYPOINT. This must be the same as the ENTRYPOINT of the image.
Important: The AIRBYTE_ENTRYPOINT environment variable must use absolute paths to ensure proper execution. Note that the Airbyte platform may change the working directory at runtime (for instance, to /source for sources and /dest for destinations). Using relative paths in the entrypoint can cause execution failures when the working directory is overridden.
Example:
- ✅ Correct: ENV AIRBYTE_ENTRYPOINT="python /airbyte/integration_code/main.py"
- ❌ Incorrect: ENV AIRBYTE_ENTRYPOINT="./main.py"
Non-Root User: airbyte
The Docker image should run under a user named airbyte.
Specified /airbyte directory
The Docker image must have a directory called /airbyte, which the user airbyte owns and can write to.
This is the directory to which temporary files will be mounted, including the config.json and catalog.json files.
Only write file artifacts to directories permitted by the base image
The connector code must only write to directories permitted within the connector's base image.
For a list of permitted write directories, please consult the base image definitions in the airbytehq/airbyte repo, under the docker-images directory.
Must be an amd64 or multi-arch image
To run on Airbyte Platform, the image bust be valid for amd64. Since most developers contribute from (ARM-based) Mac M-series laptops, we recommend creating a multi-arch image that covers both arm64/amd64 so that the same image tags work on both ARM and AMD runtimes.