package data
- Alphabetic
- Public
- Protected
Package Members
- package bytesegmentencoding
Greedy contraction of consecutive n-grams
- package distributed
- package schemas
Type Members
- trait BatchStream[+I, S, C] extends AnyRef
A functional stateful stream of items
A functional stateful stream of items
lamp's training loops work from data presented in BatchStreams.
An instance of BatchStream is an description of the data stream, it does not by itself allocates or stores any data. The stream needs to be driven by an interpreter. lamp.data.IOLoops and the companion object BatchStream contain those interpreters to make something useful with a BatchStream.
See the abstract members and the companion object for more documentation.
- I
the item type , the stream will yield items of this type
- S
the state type, the stream will carry over and accumulate state of this type
- C
type of accessory resources (e.g. buffers), the stream might need an instance of this type for its working. The intended use for fixed, pre-allocated pinned buffer pairs to facilitate host-device copies. See lamp.Device.toBatched and lamp.BufferPair.
- trait Codec extends AnyRef
An abstraction around byte to token encodings.
- trait CodecFactory[T <: Codec] extends AnyRef
An abstraction around byte to token encodings.
- sealed trait LoopState extends AnyRef
- case class NonEmptyBatch[I](batch: I) extends StreamControl[I] with Product with Serializable
- case class Peek(label: String) extends Module with Product with Serializable
- case class SWALoopState(model: Seq[STen], optimizer: Seq[STen], epoch: Int, lastValidationLoss: Option[Double], minValidationLoss: Option[Double], numberOfAveragedModels: Int, averagedModels: Option[Seq[Tensor]], learningCurve: List[(Int, Double, Option[Double])]) extends LoopState with Product with Serializable
- case class SimpleLoopState(model: Seq[STen], optimizer: Seq[STen], epoch: Int, lastValidationLoss: Option[Double], minValidationLoss: Option[Double], minValidationLossModel: Option[(Int, Seq[Tensor])], learningCurve: List[(Int, Double, Option[(Double, Double)])]) extends LoopState with Product with Serializable
- case class SimpleThenSWALoopState(simple: SimpleLoopState, swa: Option[SWALoopState]) extends LoopState with Product with Serializable
- sealed trait StreamControl[+I] extends AnyRef
- case class TensorLogger(stop: () => Unit) extends Product with Serializable
Class holding a lambda to stop the logging.
Class holding a lambda to stop the logging. See its companion object. See lamp.data.TensorLogger#start
- trait TrainingCallback[M] extends AnyRef
- trait ValidationCallback[M] extends AnyRef
Value Members
- object BatchStream
- object BufferedImageHelper
- object DataParallel
- case object EmptyBatch extends StreamControl[Nothing] with Product with Serializable
- case object EndStream extends StreamControl[Nothing] with Product with Serializable
- object GraphBatchStream
- object IOLoops
Contains a training loops and helpers around it
Contains a training loops and helpers around it
The two training loops implemented here are:
- lamp.data.IOLoops.epochs
- lamp.data.IOLoops.withSWA implements Stochastic Weight Averaging
- object IdentityCodec extends Codec
- object IdentityCodecFactory extends CodecFactory[IdentityCodec.type]
- object Reader
- object SWA
- object StateIO
Helpers to read and write training loop state
- object StreamControl
- object TensorLogger extends Serializable
Utility to periodically log active tensors See lamp.data.TensorLogger#start
- object Text
- object Writer
Serializes tensors
Serializes tensors
This format is similar to the ONNX external tensor serialization format, but it uses JSON rather then protobuf.
Format specification
Sequences of tensors are serialized into a JSON descriptor and a data blob. The schema of the descriptor is the case class lamp.data.schemas.TensorList. The location field in this schema holds a path to the data blob. If this is a relative POSIX path then it is relative to the file path where the descriptor itself is written. Otherwise it is an absolute path of the data blob file.
The descriptor may be embedded into larger JSON structures.
The data blob itself is the raw data in little endian byte order. Floating point is IEEE-754. The descriptor specifies the byte offset and byte length of the tensors inside the data blob. As such, the data blob contains no framing or other control bytes, but it may contain padding bytes between tensors.