########################################### Use Version 2.x of the SageMaker Python SDK ########################################### .. contents:: :local: :depth: 2 ************ Installation ************ To install the latest version: .. code:: bash pip install --upgrade sagemaker If you are executing this pip install command in a notebook, make sure to restart your kernel. **************** Breaking Changes **************** This section is for major changes that may require updates to your SageMaker Python SDK code. For the full list of changes, see the `CHANGELOG `_. Removals ============ Python 2 Support ---------------- This library is no longer compatible with Python 2. Python 2 has been EOL since January 1, 2020. Please upgrade to Python 3 if you haven't already. Remove Legacy TensorFlow --------------------------- TensorFlow versions 1.4-1.10 and some variations of versions 1.11-1.12 (see `What Constitutes "Legacy TensorFlow Support" `_) are no longer natively supported by the SageMaker Python SDK. To use those versions of TensorFlow, you must specify the Docker image URI explicitly, and configure settings via hyperparameters or environment variables rather than using SDK parameters. For more information, see `Upgrade from Legacy TensorFlow Support `_. SageMaker Python SDK CLI ------------------------ The SageMaker Python SDK CLI has been removed. (This is different from the AWS CLI.) ``delete_endpoint()`` for Estimators and ``HyperparameterTuner`` ---------------------------------------------------------------- The ``delete_endpoint()`` method for estimators and ``HyperparameterTuner`` is now a no-op. Please use :func:`sagemaker.predictor.Predictor.delete_endpoint` instead. ``update_endpoint`` in ``deploy()`` ----------------------------------- The ``update_endpoint`` argument in ``deploy()`` methods for estimators and models is now a no-op. Please use :func:`sagemaker.predictor.Predictor.update_endpoint` instead. ``serializer`` and ``deserializer`` in ``create_model()`` --------------------------------------------------------- The ``serializer`` and ``deserializer`` arguments in :func:`sagemaker.estimator.Estimator.create_model` are now no-ops. Please specify serializers and deserializers in ``deploy()`` methods instead. ``content_type`` and ``accept`` in the Predictor Constructor ------------------------------------------------------------ The ``content_type`` and ``accept`` parameters are now no-ops in the following classes and methods: - ``sagemaker.predictor.Predictor`` - ``sagemaker.estimator.Estimator.create_model`` - ``sagemaker.algorithms.AlgorithmEstimator.create_model`` - ``sagemaker.tensorflow.model.TensorFlowPredictor`` Please specify content types in a serializer or deserializer class instead. Changes in Default Behavior =========================== Require ``framework_version`` and ``py_version`` for Frameworks --------------------------------------------------------------- Framework estimator and model classes now require ``framework_version`` and ``py_version`` instead of supplying defaults, unless an image URI is explicitly supplied. For example: .. code:: python from sagemaker.tensorflow import TensorFlow TensorFlow( entry_point="script.py", framework_version="2.2.0", # now required py_version="py37", # now required role="my-role", instance_type="ml.m5.xlarge", instance_count=1, ) from sagemaker.mxnet import MXNetModel MXNetModel( model_data="s3://bucket/model.tar.gz", role="my-role", entry_point="inference.py", framework_version="1.6.0", # now required py_version="py3", # now required ) Log Display Behavior with ``attach()`` -------------------------------------- Logs are no longer printed when using ``attach()`` with an estimator. To view logs after attaching a training job to an estimator, use :func:`sagemaker.estimator.EstimatorBase.logs`. ``HyperparameterTuner.fit()`` and ``Transformer.transform()`` ------------------------------------------------------------- :func:`sagemaker.tuner.HyperparameterTuner.fit` and :func:`sagemaker.transformer.Transformer.transform` now wait until the completion of the Hyperparameter Tuning Job or Batch Transform Job, respectively. To make the function non-blocking, use ``wait=False``. XGBoost Predictor ----------------- The default serializer of ``sagemaker.xgboost.model.XGBoostPredictor`` has been changed from ``NumpySerializer`` to ``LibSVMSerializer``. Parameter Order Changes ======================= ``sagemaker.model.Model`` Parameter Order ----------------------------------------- The parameter order for :class:`sagemaker.model.Model` changed: instead of ``model_data`` being first, ``image_uri`` (formerly ``image``) is first. As a result, ``model_data`` has been made into an optional parameter. If you are using the :class:`sagemaker.model.Model` class, your code should be changed as follows: .. code:: python # v1.x Model("s3://bucket/path/model.tar.gz", "my-image:latest") # v2.0 and later Model("my-image:latest", model_data="s3://bucket/path/model.tar.gz") Airflow Parameter Order ----------------------- For :func:`sagemaker.workflow.airflow.model_config` and :func:`sagemaker.workflow.airflow.model_config_from_estimator`, ``instance_type`` is no longer the first positional argument and is now an optional keyword argument. Dependency Changes ================== SciPy ----- SciPy is no longer a required dependency of the SageMaker Python SDK. If you use :func:`sagemaker.amazon.common.write_spmatrix_to_sparse_tensor` and don't already install SciPy in your environment, you can use our ``scipy`` installation target: .. code:: bash pip install sagemaker[scipy] TensorFlow ---------- The ``tensorflow`` installation target has been removed, as it is no longer needed for any SageMaker Python SDK functionality. If you want to install TensorFlow, see `the TensorFlow documentation `_. ******************** Non-Breaking Changes ******************** Deprecations ============ Pre-instantiated Serializer and Deserializer Objects ---------------------------------------------------- The ``csv_serializer``, ``json_serializer``, ``npy_serializer``, ``csv_deserializer``, ``json_deserializer``, and ``numpy_deserializer`` objects have been deprecated. Please instantiate the objects instead. +--------------------------------------------+------------------------------------------------+ | v1.x | v2.0 and later | +============================================+================================================+ | ``sagemaker.predictor.csv_serializer`` | ``sagemaker.serializers.CSVSerializer()`` | +--------------------------------------------+------------------------------------------------+ | ``sagemaker.predictor.json_serializer`` | ``sagemaker.serializers.JSONSerializer()`` | +--------------------------------------------+------------------------------------------------+ | ``sagemaker.predictor.npy_serializer`` | ``sagemaker.serializers.NumpySerializer()`` | +--------------------------------------------+------------------------------------------------+ | ``sagemaker.predictor.csv_deserializer`` | ``sagemaker.deserializers.CSVDeserializer()`` | +--------------------------------------------+------------------------------------------------+ | ``sagemaker.predictor.json_deserializer`` | ``sagemaker.deserializers.JSONDeserializer()`` | +--------------------------------------------+------------------------------------------------+ | ``sagemaker.predictor.numpy_deserializer`` | ``sagemaker.deserializers.NumpyDeserializer()``| +--------------------------------------------+------------------------------------------------+ ``sagemaker.content_types`` --------------------------- The ``sagemaker.content_types`` module is deprecated in v2.0 and later of the SageMaker Python SDK. Instead of importing constants from ``sagemaker.content_types``, explicitly write MIME types as a string. +-------------------------------+--------------------------------+ | v1.x | v2.0 and later | +===============================+================================+ | ``CONTENT_TYPE_JSON`` | ``"application/json"`` | +-------------------------------+--------------------------------+ | ``CONTENT_TYPE_CSV`` | ``"text/csv"`` | +-------------------------------+--------------------------------+ | ``CONTENT_TYPE_OCTET_STREAM`` | ``"application/octet-stream"`` | +-------------------------------+--------------------------------+ | ``CONTENT_TYPE_NPY`` | ``"application/x-npy"`` | +-------------------------------+--------------------------------+ Image URI Functions (e.g. ``get_image_uri``) -------------------------------------------- The following functions have been deprecated in favor of :func:`sagemaker.image_uris.retrieve`: - ``sagemaker.amazon_estimator.get_image_uri()`` - ``sagemaker.fw_utils.create_image_uri()`` - ``sagemaker.fw_registry.registry()`` - ``sagemaker.utils.get_ecr_image_uri_prefix()`` For more information about usage, see :func:`sagemaker.image_uris.retrieve`. ``enable_cloudwatch_metrics`` for Estimators and Models ------------------------------------------------------- The parameter ``enable_cloudwatch_metrics`` has been deprecated. CloudWatch metrics are already emitted for all Training Jobs, etc. ``sagemaker.fw_utils.parse_s3_url`` ----------------------------------- The ``sagemaker.fw_utils.parse_s3_url`` function has been deprecated. Please use :func:`sagemaker.s3.parse_s3_url` instead. ``sagemaker.session.ModelContainer`` ------------------------------------ The class ``sagemaker.session.ModelContainer`` has been deprecated, as it is not needed for creating inference pipelines. ``sagemaker.workflow.condition_step.JsonGet`` --------------------------------------------- The class ``sagemaker.workflow.condition_step.JsonGet`` has been deprecated. Please use :class:`sagemaker.workflow.functions.JsonGet` instead. Parameter and Class Name Changes ================================ Estimators ---------- Renamed Estimator Parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following estimator parameters have been renamed: +------------------------------+------------------------+ | v1.x | v2.0 and later | +==============================+========================+ | ``train_instance_count`` | ``instance_count`` | +------------------------------+------------------------+ | ``train_instance_type`` | ``instance_type`` | +------------------------------+------------------------+ | ``train_max_run`` | ``max_run`` | +------------------------------+------------------------+ | ``train_use_spot_instances`` | ``use_spot_instances`` | +------------------------------+------------------------+ | ``train_max_wait`` | ``max_wait`` | +------------------------------+------------------------+ | ``train_volume_size`` | ``volume_size`` | +------------------------------+------------------------+ | ``train_volume_kms_key`` | ``volume_kms_key`` | +------------------------------+------------------------+ Serializer and Deserializer Classes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The follow serializer/deserializer classes have been renamed and/or moved: +--------------------------------------------------------+-------------------------------------------------------+ | v1.x | v2.0 and later | +========================================================+=======================================================+ | ``sagemaker.predictor._CsvDeserializer`` | ``sagemaker.deserializers.CSVDeserializer`` | +--------------------------------------------------------+-------------------------------------------------------+ | ``sagemaker.predictor._CsvSerializer`` | ``sagemaker.serializers.CSVSerializer`` | +--------------------------------------------------------+-------------------------------------------------------+ | ``sagemaker.predictor.BytesDeserializer`` | ``sagemaker.deserializers.BytesDeserializers`` | +--------------------------------------------------------+-------------------------------------------------------+ | ``sagemaker.predictor.StringDeserializer`` | ``sagemaker.deserializers.StringDeserializer`` | +--------------------------------------------------------+-------------------------------------------------------+ | ``sagemaker.predictor.StreamDeserializer`` | ``sagemaker.deserializers.StreamDeserializer`` | +--------------------------------------------------------+-------------------------------------------------------+ | ``sagemaker.predictor._JsonSerializer`` | ``sagemaker.serializers.JSONSerializer`` | +--------------------------------------------------------+-------------------------------------------------------+ | ``sagemaker.predictor._NumpyDeserializer`` | ``sagemaker.deserializers.NumpyDeserializer`` | +--------------------------------------------------------+-------------------------------------------------------+ | ``sagemaker.predictor._NPYSerializer`` | ``sagemaker.serializers.NumpySerializer`` | +--------------------------------------------------------+-------------------------------------------------------+ | ``sagemaker.amazon.common.numpy_to_record_serializer`` | ``sagemaker.amazon.common.RecordSerializer`` | +--------------------------------------------------------+-------------------------------------------------------+ | ``sagemaker.amazon.common.record_deserializer`` | ``sagemaker.amazon.common.RecordDeserializer`` | +--------------------------------------------------------+-------------------------------------------------------+ | ``sagemaker.predictor._JsonDeserializer`` | ``sagemaker.deserializers.JSONDeserializer`` | +--------------------------------------------------------+-------------------------------------------------------+ ``sagemaker.serializers.LibSVMSerializer`` has been added in v2.0. ``distributions`` ~~~~~~~~~~~~~~~~~ For TensorFlow and MXNet estimators, ``distributions`` has been renamed to ``distribution``. Specify Custom Training Images ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``image_name`` parameter has been renamed to ``image_uri`` for specifying a custom Docker image URI to use with training. Models ------ Specify Custom Serving Image ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``image`` parameter has been renamed to ``image_uri`` for specifying a custom Docker image URI to use with inference. TensorFlow Serving Model ~~~~~~~~~~~~~~~~~~~~~~~~ ``sagemaker.tensorflow.serving.Model`` has been renamed to :class:`sagemaker.tensorflow.model.TensorFlowModel`. (For the previous implementation of that class, see `Remove Legacy TensorFlow <#remove-legacy-tensorflow>`_). Predictors ---------- Generic Predictor Class Name ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``sagemaker.predictor.RealTimePredictor`` has been renamed to :class:`sagemaker.predictor.Predictor`. Endpoint Argument Name ~~~~~~~~~~~~~~~~~~~~~~ For :class:`sagemaker.predictor.Predictor`, :class:`sagemaker.sparkml.model.SparkMLPredictor`, and predictors for Amazon algorithm (e.g. Factorization Machines, Linear Learner, etc.), the ``endpoint`` attribute has been renamed to ``endpoint_name``. TensorFlow Serving Predictor ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``sagemaker.tensorflow.serving.Predictor`` has been renamed to :class:`sagemaker.tensorflow.model.TensorFlowPredictor`. (For the previous implementation of that class, see `Remove Legacy TensorFlow <#remove-legacy-tensorflow>`_). Inputs ------ ``s3_input`` ~~~~~~~~~~~~ ``sagemaker.session.s3_input`` has been renamed to :class:`sagemaker.inputs.TrainingInput`. ``ShuffleConfig`` ~~~~~~~~~~~~~~~~~ ``sagemaker.session.ShuffleConfig`` has been renamed to :class:`sagemaker.inputs.ShuffleConfig`. Airflow ------- For :func:`sagemaker.workflow.airflow.model_config`, :func:`sagemaker.workflow.airflow.model_config_from_estimator`, and :func:`sagemaker.workflow.airflow.transform_config_from_estimator`, the ``image`` argument has been renamed to ``image_uri``. ******************************* Automatically Upgrade Your Code ******************************* To help make your transition as seamless as possible, v2 of the SageMaker Python SDK comes with a command-line tool to automate updating your code. It automates as much as possible, but there are still syntactical and stylistic changes that cannot be performed by the script. .. warning:: While the tool is intended to be easy to use, we recommend using it as part of a process that includes testing before and after you run the tool. Usage ===== Currently, the tool supports only converting one file at a time: .. code:: $ sagemaker-upgrade-v2 --in-file input.py --out-file output.py $ sagemaker-upgrade-v2 --in-file input.ipynb --out-file output.ipynb You can apply it to a set of files using a loop: .. code:: bash $ for file in $(find input-dir); do sagemaker-upgrade-v2 --in-file $file --out-file output-dir/$file; done Limitations =========== Jupyter Notebook Cells with Shell Commands ------------------------------------------ If your Jupyter notebook has a code cell with lines that start with either ``%%`` or ``!``, the tool ignores that cell. The other cells in the notebook are still updated. Aliased Imports --------------- The tool checks for a limited number of patterns when looking for constructors. For example, if you are using a TensorFlow estimator, only the following invocation styles are handled: .. code:: python TensorFlow() sagemaker.tensorflow.TensorFlow() sagemaker.tensorflow.estimator.TensorFlow() If you have aliased an import, e.g. ``from sagemaker.tensorflow import TensorFlow as TF``, the tool does not take care of updating its parameters. TensorFlow Serving ------------------ If you are using the ``sagemaker.tensorflow.serving.Model`` class, the tool does not take care of adding a framework version or changing it to ``sagemaker.tensorflow.TensorFlowModel``. ``sagemaker.model.Model`` ------------------------- If you are using the :class:`sagemaker.model.Model` class, the tool does not take care of switching the order between ``model_data`` and ``image_uri`` (formerly ``image``). ``update_endpoint`` and ``delete_endpoint`` ------------------------------------------- The tool does not take care of removing the ``update_endpoint`` argument from a ``deploy`` call. If you are using that argument, please modify your code to use :func:`sagemaker.predictor.Predictor.update_endpoint` instead. The tool also does not handle ``delete_endpoint`` calls on estimators or ``HyperparameterTuner``. If you are using that method, please modify your code to use :func:`sagemaker.predictor.Predictor.delete_endpoint` instead.