## NodeJS - NPM Lambda Builder

### Scope

The scope for this builder is to take an existing
directory containing customer code, including a valid `package.json` manifest
specifying third-party dependencies. The builder will use NPM to include
production dependencies and exclude test resources in a way that makes them
deployable to AWS Lambda.

### Challenges

NPM normally stores all dependencies in a `node_modules` subdirectory. It
supports several dependency categories, such as development dependencies
(usually third-party build utilities and test resources), optional dependencies
(usually required for local execution but already available on the production
environment, or peer-dependencies for optional third-party packages) and
production dependencies (normally the minimum required for correct execution).
All these dependency types are mixed in the same directory.

To speed up Lambda startup time and optimise usage costs, the correct thing to
do in most cases is just to package up production dependencies. During development 
work we can expect that the local `node_modules` directory contains all the 
various dependency types, and NPM does not provide a way to directly identify
just the ones relevant for production. 

There are two ways to include only production dependencies in a package:

1. **without a bundler**: Copy the source to a clean temporary directory and
   re-run dependency installation there. 

2. **with a bundler**: Apply a javascript code  bundler (such as `esbuild` or
   `webpack`) to produce a single-file javascript bundle by recursively
   resolving included dependencies, starting from the main lambda handler.
  
A frequently used trick to speed up NodeJS Lambda deployment is to avoid 
bundling the `aws-sdk`, since it is already available on the Lambda VM.
This makes deployment significantly faster for single-file lambdas, for
example. Although this is not good from a consistency and compatibility
perspective (as the version of the API used in production might be different
from what was used during testing), people do this frequently enough that the
packager should handle it in some way. A common way of marking this with ClaudiaJS
is to include `aws-sdk` as an optional dependency, then deploy without optional
dependencies. 

Other runtimes do not have this flexibility, so instead of adding a specific
parameter to the SAM CLI, the packager should support a flag to include or
exclude optional dependencies through environment variables. 

NPM also provides support for running user-defined scripts as part of the build
process, so this packager needs to support standard NPM script execution.

NPM, since version 5, uses symbolic links to optimise disk space usage, so
cross-project dependencies will just be linked to elsewhere on the local disk 
instead of included in the `node_modules` directory. This means that just copying
the `node_modules` directory (even if symlinks would be resolved to actual paths)
far from optimal to create a stand-alone module. Copying would lead to significantly
larger packages than necessary, as sub-modules might still have test resources, and
common references from multiple projects would be duplicated.

NPM also uses two locking mechanisms (`package-lock.json` and `npm-shrinkwrap.json`) 
that can be used to freeze versions of dependencies recursively, and provide reproducible
builds. Before version 7, the locking mechanism was in many ways more
broken than functional, as it in some cases hard-codes locks to local disk
paths, and gets confused by including the same package as a dependency
throughout the project tree in different dependency categories
(development/optional/production). Although the official tool recommends
including this file in the version control, as a way to pin down dependency
versions, when using on several machines with different project layout it can
lead to uninstallable dependencies. 

NPM dependencies are usually plain javascript libraries, but they may include
native binaries precompiled for a particular platform, or require some system 
libraries to be installed. A notable example is `sharp`, a popular image 
manipulation library, that uses symbolic links to system libraries. Another 
notable example is `puppeteer`, a library to control a headless Chrome browser,
that downloads a Chromium binary for the target platform during installation.

To fully deal with those cases, this packager may need to execute the
dependency installation step on a Docker image compatible with the target
Lambda environment.

### Choosing the packaging type

For a large majority of projects, packaging using a bundler has significant
advantages (speed and runtime package size, supporting local dependencies). 

However, there are also some drawbacks to using a bundler for a small set of
use cases (namely including packages with binary dependencies, such as `sharp`, a
popular image processing library). 

Because of this, it's important to support both ways of packaging. The version
without a bundler is slower, but will be correct in case of binary dependencies.
For backwards compatibility, this should be the default.

Users should be able to activate packaging with a bundler for projects where that
is safe to do, such as those without any binary dependencies. 

The proposed approach is to use a "aws-sam" property in the package manifest 
(`package.json`). If the `nodejs_npm` Lambda builder finds a matching property, it 
knows that it is safe to use the bundler to package.

The rest of this section outlines the major differences between packaging with
and without a bundler.

#### packaging speed

Packaging without a bundler is slower than using a bundler, as it
requires copying the project to a clean working directory, installing
dependencies and archiving into a single ZIP.  

Packaging with a bundler runs directly on files already on the disk, without
the need to copy or move files around. This approach can use the fast `npm ci`
command to just ensure that the dependencies are present on the disk instead of
always downloading all the dependencies.

#### additional tools

Packaging without a bundler does not require additional tools installed on the
development environment or CI systems, as it can just work with NPM.  

#### handling local dependencies

Packaging without a bundler requires complex
rewriting to handle local dependencies, and recursively packaging archives. In
theory, this was going to be implemented as a subsequent release after the
initial version of the `npm_nodejs` builder, but due to issues with container
environments and how `aws-lambda-builders` mounts the working directory, it was
not added for several years, and likely will not be implemented soon.

#### including non-javascript files

Packaging without a bundler zips up entire contents of NPM packages.

Packaging with a bundler only locates JavaScript files in the dependency tree.

Some NPM packages include important binaries or resources in the NPM package,
which would not be included in the package without a bundler. This means that
packaging using a bundler is not universally applicable, and may never fully
replace packaging without a bundler.

Some NPM packages include a lot of additional files not required at runtime.
`aws-sdk` for JavaScript (v2) is a good example, including TypeScript type
definitions, documentation and REST service definitions for automated code
generators.  Packaging without a bundler includes these files as well,
unnecessarily increasing Lambda archive size. Packaging with a bundler just
ignores all these additional files out of the box.

#### error reporting

Packaging without a bundler leaves original file names and line numbers, ensuring
that any stack traces or exception reports correspond directly to the original 
source files.

Packaging with a bundler creates a single file from all the dependencies, so
stack traces on production no longer correspond to original source files. As a
workaround, bundlers can include a 'source map' file, to allow translating
production stack traces into source stack traces. Prior to Node 14, this
required including a separate NPM package, or additional tools. Since Node 14,
stack trace translation can be [activated using an environment
variable](https://serverless.pub/aws-lambda-node-sourcemaps/)

### Implementation without a bundler

The general algorithm for preparing a node package for use on AWS Lambda
without a JavaScript bundler (`esbuild` or `webpack`) is as follows.

#### Step 1: Prepare a clean copy of the project source files

Execute `npm pack` to perform project-specific packaging using the supplied
`package.json` manifest, which will automatically exclude temporary files, 
test resources and other source files unnecessary for running in a production 
environment.

This will produce a `tar` archive that needs to be unpacked into the artifacts
directory.  Note that the archive will actually contain a `package`
subdirectory containing the files, so it's not enough to just directly unpack
files. 

#### Step 2: Rewrite local dependencies

_(out of scope for the current version)_

To optimise disk space and avoid including development dependencies from other
locally linked packages, inspect the `package.json` manifest looking for dependencies
referring to local file paths (can be identified as they start with `.` or `file:`),
then for each dependency recursively execute the packaging process 

Local dependencies may include other local dependencies themselves, this is a very 
common way of sharing configuration or development utilities such as linting or testing 
tools. This means that for each packaged local dependency this packager needs to
recursively apply the packaging process. It also means that the packager needs to 
track local paths and avoid re-packaging directories it already visited.

NPM produces a `tar` archive while packaging that can be directly included as a
dependency.  This will make NPM unpack and install a copy correctly. Once the
packager produces all `tar` archives required by local dependencies, rewrite
the manifest to point to `tar` files instead of the original location.

If the project contains a package lock file, this will cause NPM to ignore changes
to the package.json manifest. In this case, the packager will need to remove 
`package-lock.json` so that dependency rewrites take effect. 
_(out of scope for the current version)_

#### Step 3: Install dependencies

The packager should then run `npm install` to download an expand all dependencies to
the local `node_modules` subdirectory. This has to be executed in the directory with
a clean copy of the source files.

Note that NPM can be configured to use proxies or local company repositories using 
a local file, `.npmrc`. The packaging process from step 1 normally excludes this file, so it needs 
to be copied before dependency installation, and then removed. 

Some users may want to exclude optional dependencies, or even include development dependencies. 
To avoid incompatible flags in the `sam` CLI, the packager should allow users to specify 
options for the `npm install` command using an environment variable.
_(out of scope for the current version)_

To fully support dependencies that download or compile binaries for a target platform, this step
needs to be executed inside a Docker image compatible with AWS Lambda. 
_(out of scope for the current version)_