Bruno Arine

How to debug an AWS SageMaker training container

This example uses environment variables. They are automatically picked up by docker-composer if you store them in a .env file in the same directory as the composer YAML file.

  1. Build the image locally.

    docker build . -t ${IMAGE_NAME} --build-arg REGION=${AWS_REGION}
    
  2. Create a Docker composer configuration file like the one below. The pdb argument -c continue is necessary if you wish the container to run when launched. If omitted, it will ask for confirmation right from the start.

version: "2"

services:
  training:
    build: ${DOCKERFILE_PATH}
    image: ${IMAGE_NAME}
    volumes:
      - ${TRAIN_PATH}:/opt/ml/input/data/training
      - ${TEST_PATH}:/opt/ml/input/data/testing
      - ${MODEL_PATH}:/opt/ml/model  # after training, the container should store model artifacts here
      - ${CODE_PATH}:/opt/ml/code  # only if you want to replace the code in the Docker image
    command: python3 -m pdb -c continue train.py ${ARGS}
    stdin_open: true
    tty: true
  1. Launch the training container with docker-compose -f <path/to/docker-composer/file.yml> up.

  2. Find the running container ID with docker ps -a and attach to it with docker attach <container-id>.