Packages

case class TransformerEncoderBlock(attention: MultiheadAttention, layerNorm1: LayerNorm, layerNorm2: LayerNorm, w1: Constant, b1: Constant, w2: Constant, b2: Constant, scale1: Constant, scale2: Constant, dropout: Double, train: Boolean, gptOrder: Boolean) extends GenericModule[(Variable, Option[STen]), Variable] with Product with Serializable

A single block of the transformer self attention encoder using GELU

Input is (data, maxLength) where data is (batch, sequence, input dimension), double tensor maxLength is a 1D or 2D long tensor used for attention masking.

The order of operations depends on gptOrder param. If gptOrder is true then:

  • y = attention(norm(input))+input
  • result = mlp(norm(y))+y
  • Note that in this case there is no normalization at the end of the transformer. One may wants to add one separately. This is how GPT2 is defined in hugging face or nanoGPT.
  • Note that the residual connection has a path which does not flow through the normalization.
  • + dimension wise learnable scale parameter in each residual path

If gptOrder is false then:

  • y = norm(attention(input)+input )
  • result = norm(mlp(y)+y)
  • This follows chapter 11.7 in d2l.ai v1.0.0-beta0. (Same as in https://arxiv.org/pdf/1706.03762.pdf)
  • Note that the residual connection has a path which flows through the normalization.

Output is (bach, sequence, output dimension)

Linear Supertypes
Serializable, Product, Equals, GenericModule[(Variable, Option[STen]), Variable], AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. TransformerEncoderBlock
  2. Serializable
  3. Product
  4. Equals
  5. GenericModule
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new TransformerEncoderBlock(attention: MultiheadAttention, layerNorm1: LayerNorm, layerNorm2: LayerNorm, w1: Constant, b1: Constant, w2: Constant, b2: Constant, scale1: Constant, scale2: Constant, dropout: Double, train: Boolean, gptOrder: Boolean)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. def apply[S](a: (Variable, Option[STen]))(implicit arg0: Sc[S]): Variable

    Alias of forward

    Alias of forward

    Definition Classes
    GenericModule
  5. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  6. val attention: MultiheadAttention
  7. val b1: Constant
  8. val b2: Constant
  9. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @IntrinsicCandidate() @native()
  10. val dropout: Double
  11. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  12. def forward[S](x: (Variable, Option[STen]))(implicit arg0: Sc[S]): Variable

    The implementation of the function.

    The implementation of the function.

    In addition of x it can also use all the state to compute its value.

    Definition Classes
    TransformerEncoderBlockGenericModule
  13. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @IntrinsicCandidate() @native()
  14. val gptOrder: Boolean
  15. final def gradients(loss: Variable, zeroGrad: Boolean = true): Seq[Option[STen]]

    Computes the gradient of loss with respect to the parameters.

    Computes the gradient of loss with respect to the parameters.

    Definition Classes
    GenericModule
  16. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  17. val layerNorm1: LayerNorm
  18. val layerNorm2: LayerNorm
  19. final def learnableParameters: Long

    Returns the total number of optimizable parameters.

    Returns the total number of optimizable parameters.

    Definition Classes
    GenericModule
  20. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  21. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  22. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @IntrinsicCandidate() @native()
  23. final def parameters: Seq[(Constant, PTag)]

    Returns the state variables which need gradient computation.

    Returns the state variables which need gradient computation.

    Definition Classes
    GenericModule
  24. def productElementNames: Iterator[String]
    Definition Classes
    Product
  25. val scale1: Constant
  26. val scale2: Constant
  27. def state: List[(Constant, LeafTag)]

    List of optimizable, or non-optimizable, but stateful parameters

    List of optimizable, or non-optimizable, but stateful parameters

    Stateful means that the state is carried over the repeated forward calls.

    Definition Classes
    TransformerEncoderBlockGenericModule
  28. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  29. val train: Boolean
  30. val w1: Constant
  31. val w2: Constant
  32. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  33. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  34. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  35. final def zeroGrad(): Unit
    Definition Classes
    GenericModule

Deprecated Value Members

  1. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable]) @Deprecated
    Deprecated

    (Since version 9)

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from GenericModule[(Variable, Option[STen]), Variable]

Inherited from AnyRef

Inherited from Any

Ungrouped