Lamp allocates data as ATen tensors which are stored off heap. Native ATen tensors are exposed to the JVM via the aten.Tensor class. Each aten.Tensor JVM object is a handle to the tensor - actually handle to native Tensor object which is a handle itself to the tensor’s data.

Tensors must be released manually with the aten.Tensor#release or releaseAll methods. A double release might crash the VM.

autograd Variables and STen tensors

In contrast with aten.Tensors lamp.autograd.Variables and lamp.STens are managed. Allocation of these require a scope (lamp.Scope) which demarkate the lifetime of the variable. Autograd variables own up to two tensors: their value and optionally their partial derivatives.

lamp.STen is a shallow wrapper around aten.Tensors. It ensures that an appropriate scope is present before allocation and it provides a more fluent chainable API.

Example:

import lamp.{STen,Scope}
  def squaredEuclideanDistance(v1: STen, v2: STen)(
      implicit scope: Scope // parent scope
  ): STen = {
    Scope { implicit scope => // this is a local scope cleared up when block ends
      val outer = v1.mm(v2.t) // these allocations will get released at the end of the block
      val n1 = (v1 * v1).rowSum
      val n2 = (v2 * v2).rowSum
      (n1 + n2.t - outer * 2) 
    } // once the block exits all resources allocated within the block are released, with the exception of the 
      // return value which is moved to the parent scope
  }

With IO:

import lamp.{STen,Scope}
  import cats.effect.IO
  def squaredEuclideanDistanceIO(v1: STen, v2: STen)(
      implicit scope: Scope // parent scope
  ): IO[STen] = {
    Scope.bracket(scope) { implicit scope => // this is a local scope cleared up when block ends
      IO{
        val outer = v1.mm(v2.t) // these allocations will get released once the IO finished execution
        val n1 = (v1 * v1).rowSum
        val n2 = (v2 * v2).rowSum
        (n1 + n2.t - outer * 2) 
      }
    }  
      // return value which is moved to the parent scope
  }

`lamp.Scope`

Both STen and Variable own references to aten.Tensors which need to be managed (released) manually. The constructors of STen and Variable take an instance of Scope and register the Tensors with the Scope. A Scope can be released which releases all the registered Tensors. This simplifies memory management because a Scope instance can be injected into a Scala lexical block and released once the block exits.

A Scope may be built with any of the factory methods in its companion object:

Scope.apply , Scope.bracket and Scope.root: these factories take a lambda, thus inject the Scope instance in the lexical scope of the lambda.

Scope.apply is meant to be used in scope which itself has a parent scope. It will not release the return value, but move it to its parent scope. Consequently the return type of the lambda it takes are restricted to members of the Movable type class.

The Movable type class provides compile time introspection so that the library can extract the list of Tensors from the return value and move them to the parent scope. It is defined as

trait Movable[-R] {
  def list(movable: R): List[Tensor]
}

Most regular Scala types and primities have a Movable instance which return the empty list. lamp.STen, Variables, and lamp.GenericModule[_,_] are members of the Movable type class.

Scope.root is meant to be used as the outermost Scope. It can not return anything, thus it takes a lambda with a Unit return type.

Debug memory leaks with TensorLogger

Lamp comes with utility in lamp.data.TensorLogger which records tensor allocations and deallocations. This is disabled by default. You can enable the utility with the lamp.data.TensorLogger.start method, which has the following signature:

 def start(
      frequency: FiniteDuration = 5 seconds
  )(
      logger: String => Unit,
      filter: (TensorTraceData, Double) => Boolean,
      detailMinMs: Double,
      detailMaxMs: Double,
      detailNum: Int
  ) : TensorLogger

Once started, this will log a summary and detailed information on live tensors with the required frequency. It should be stopped with the cancel() method on the TensorLogger instance.

Example, this will print a stack trace of all active CUDA tensors older than 1 minute, but younger than 5 minutes.

import scala.concurrent.duration._
val stop = lamp.data.TensorLogger.start(
  frequency = 5 seconds // FiniteDuration
)(
  logger = line => println(line), 
  filter = (tensorInfo, _) => !tensorInfo.getCpu , 
  detailMinMs = 60 * 1000,
  detailMaxMs = 5 * 60 * 1000, 
  detailNum = 5
  )

Docs

Memory management

autograd Variables and STen tensors

`lamp.Scope`

Debug memory leaks with TensorLogger