Indexed vector

1D vector with index: Series[K,V] #

A Series combines a Vec with an Index that provides an ordered key-value mapping. We’ll talk more about the details of Index later.

The key type of a must have a natural ordering (ie, an Ordering of that type within the implicit scope). However, the Series maintains the order in which its data was supplied unless ordered othewise.

Let’s look at a few constructions:

import org.saddle.order._
import org.saddle._
import org.saddle.ops.BinOps._
// we already know we can convert a Vec
Series(Vec(32, 12, 9))
// res0: Series[Int, Int] = [3 x 1]
// 0 -> 32
// 1 -> 12
// 2 ->  9
// 

// we can pass a pair of tuples
Series("a" -> 1, "b" -> 2, "c" -> 3)
// res1: Series[String, Int] = [3 x 1]
// a ->  1
// b ->  2
// c ->  3
// 

// any series of tuples will work, eg:
Series(List("a" -> 1, "b" -> 2, "c" -> 3) : _*)
// res2: Series[String, Int] = [3 x 1]
// a ->  1
// b ->  2
// c ->  3
// 

// can pass data and index separately:
Series(Vec(1,2,3), Index("a", "b", "c"))
// res3: Series[String, Int] = [3 x 1]
// a ->  1
// b ->  2
// c ->  3
// 

// you can create an empty Series like so:
Series.empty[String, Int]
// res4: Series[String, Int] = Empty Series

// supplied order is maintained:
Series(Vec(1,2,3), Index("c", "b", "a"))
// res5: Series[String, Int] = [3 x 1]
// c ->  1
// b ->  2
// a ->  3
// 

// unlike map, multiple keys are entirely fine:
Series(Vec(1,2,3,4), Index("c", "b", "a", "b"))
// res6: Series[String, Int] = [4 x 1]
// c ->  1
// b ->  2
// a ->  3
// b ->  4
//

With construction out of the way, let’s look at a few ways we can get data out of a Series.

val q = Series(Vec(1,3,2,4), Index("c", "b", "a", "b"))
// q: Series[String, Int] = [4 x 1]
// c ->  1
// b ->  3
// a ->  2
// b ->  4
// 

// get the values or index
 q.values
// res7: Vec[Int] = [4 x 1]
// 1
// 3
// 2
// 4
// 
 q.index
// res8: Index[String] = [Index 4 x 1]
// c
// b
// a
// b
// 

// extract value by numerical offset
 q.at(2)
// res9: scalar.Scalar[Int] = Value(el = 2)

 q.at(2,3,1)
// res10: Series[String, Int] = [3 x 1]
// a ->  2
// b ->  4
// b ->  3
// 

// or extract key
 q.keyAt(2)
// res11: scalar.Scalar[String] = Value(el = "a")

 q.keyAt(2,3,1)
// res12: Index[String] = [Index 3 x 1]
// a
// b
// b
// 

// sort by index ordering
 q.sortedIx
// res13: Series[String, Int] = [4 x 1]
// a ->  2
// b ->  3
// b ->  4
// c ->  1
// 

// sort by value ordering
 q.sorted
// res14: Series[String, Int] = [4 x 1]
// c ->  1
// a ->  2
// b ->  3
// b ->  4
// 

// extract elements matching the index
 q("b")
// res15: Series[String, Int] = [2 x 1]
// b ->  3
// b ->  4
// 

 q("a", "b")
// res16: Series[String, Int] = [3 x 1]
// a ->  2
// b ->  3
// b ->  4
// 

// notice ordering subtleties:
 q("b", "a")
// res17: Series[String, Int] = [3 x 1]
// b ->  3
// b ->  4
// a ->  2
// 

// get first or last values
 q.first
// res18: scalar.Scalar[Int] = Value(el = 1)
 q.last
// res19: scalar.Scalar[Int] = Value(el = 4)

// or key
 q.firstKey
// res20: scalar.Scalar[String] = Value(el = "c")
 q.lastKey
// res21: scalar.Scalar[String] = Value(el = "b")

// "reindex" to a new index:
 q.reindex(Index("a","c","d"))
// res22: Series[String, Int] = [3 x 1]
// a ->  2
// c ->  1
// d -> NA
// 

// or just by a sequence of keys:
 q.reindex("a","c","d")
// res23: Series[String, Int] = [3 x 1]
// a ->  2
// c ->  1
// d -> NA
// 

// notice that 'slicing' ignores unknown keys:
 q("a", "d")
// res24: Series[String, Int] = [1 x 1]
// a ->  2
//
// we cannot reindex with "b", because it isn't unique.
// (the problem is, which "b" would we choose?)
 q.reindex("a", "b")
// java.lang.IllegalArgumentException: requirement failed: Could not reindex unambiguously
// 	at scala.Predef$.require(Predef.scala:337)
// 	at org.saddle.Index.getIndexer(Index.scala:436)
// 	at org.saddle.Index.getIndexer$(Index.scala:432)
// 	at org.saddle.index.IndexAny.getIndexer(IndexAny.scala:28)
// 	at org.saddle.Series$mcI$sp.reindex$mcI$sp(Series.scala:258)
// 	at org.saddle.Series$mcI$sp.reindex$mcI$sp(Series.scala:267)
// 	at repl.MdocSession$MdocApp$$anonfun$1.apply(3_series.md:102)
// 	at repl.MdocSession$MdocApp$$anonfun$1.apply(3_series.md:102)
// we can "reset" the index to integer labels
 q.resetIndex
// res25: Series[Int, Int] = [4 x 1]
// 0 ->  1
// 1 ->  3
// 2 ->  2
// 3 ->  4
// 

// or to a new index altogether
 q.setIndex(Index("w", "x", "y", "z"))
// res26: Series[String, Int] = [4 x 1]
// w ->  1
// x ->  3
// y ->  2
// z ->  4
// 

// to 'slice', we need a sorted index; slice is inclusive by default
 val s = q.sortedIx
// s: Series[String, Int] = [4 x 1]
// a ->  2
// b ->  3
// b ->  4
// c ->  1
// 
 s.sliceBy("b", "c")
// res27: Series[String, Int] = [3 x 1]
// b ->  3
// b ->  4
// c ->  1
// 

// syntactic sugar is provided:
 s.sliceBy("b" -> "c")
// res28: Series[String, Int] = [3 x 1]
// b ->  3
// b ->  4
// c ->  1
// 
 s.sliceBy(* -> "b")
// res29: Series[String, Int] = [3 x 1]
// a ->  2
// b ->  3
// b ->  4
// 

// where slice is by offset, exclusive by default, and the
// index doesn't have to be sorted:
 q.slice(0,2)
// res30: Series[String, Int] = [2 x 1]
// c ->  1
// b ->  3
// 

// there are head/tail methods:
 q.head(2)
// res31: Series[String, Int] = [2 x 1]
// c ->  1
// b ->  3
// 
 q.tail(2)
// res32: Series[String, Int] = [2 x 1]
// a ->  2
// b ->  4
//

Aside from extracting values, there are many fun ways to compute with Series. Try the following:

q.mapValues(_ + 1)
// res33: Series[String, Int] = [4 x 1]
// c ->  2
// b ->  4
// a ->  3
// b ->  5
// 
q.mapIndex(_ + "x")
// res34: Series[String, Int] = [4 x 1]
// cx ->  1
// bx ->  3
// ax ->  2
// bx ->  4
// 
q.shift(1)
// res35: Series[String, Int] = [4 x 1]
// c -> NA
// b ->  1
// a ->  3
// b ->  2
// 
q.filter(_ > 2)
// res36: Series[String, Int] = [2 x 1]
// b ->  3
// b ->  4
// 
q.filterIx(_ != "b")
// res37: Series[String, Int] = [2 x 1]
// c ->  1
// a ->  2
// 
q.filterAt { case loc => loc != 1 && loc != 3 }
// res38: Series[String, Int] = [2 x 1]
// c ->  1
// a ->  2
// 
q.find(_ == 2)
// res39: Vec[Int] = [1 x 1]
// 2
// 
q.findKey { case x => x == 2 || x == 3 }
// res40: Index[String] = [Index 2 x 1]
// b
// a
// 
q.findOneKey { case x => x == 2 || x == 3 }
// res41: scalar.Scalar[String] = Value(el = "b")
q.minKey
// res42: scalar.Scalar[String] = Value(el = "c")
q.contains("a")
// res43: Boolean = true
q.scanLeft(0) { case (acc, v) => acc + v }
// res44: Series[String, Int] = [4 x 1]
// c ->  1
// b ->  4
// a ->  6
// b -> 10
// 
q.reversed
// res45: Series[String, Int] = [4 x 1]
// b ->  4
// a ->  2
// b ->  3
// c ->  1
// 

val ma = q.mask(q.values > 2)
// ma: Series[String, Int] = [4 x 1]
// c ->  1
// b -> NA
// a ->  2
// b -> NA
// 
ma.hasNA
// res46: Boolean = true
ma.dropNA
// res47: Series[String, Int] = [2 x 1]
// c ->  1
// a ->  2
// 

q.rolling(2, _.minKey)
// res48: Series[String, scalar.Scalar[String]] = [3 x 1]
// b ->  c
// a ->  a
// b ->  a
// 
q.splitAt(2)
// res49: (Series[String, Int], Series[String, Int]) = (
//   [2 x 1]
// c ->  1
// b ->  3
// ,
//   [2 x 1]
// a ->  2
// b ->  4
// 
// )
q.sortedIx.splitBy("b")
// res50: (Series[String, Int], Series[String, Int]) = (
//   [1 x 1]
// a ->  2
// ,
//   [3 x 1]
// b ->  3
// b ->  4
// c ->  1
// 
// )

We can of course convert to a Vec or a Seq if we need to. The Series.toSeq method yields a sequence of key/value tuples.

q.toVec
// res51: Vec[Int] = [4 x 1]
// 1
// 3
// 2
// 4
// 
 q.toSeq
// res52: IndexedSeq[(String, Int)] = ArraySeq(
//   ("c", 1),
//   ("b", 3),
//   ("a", 2),
//   ("b", 4)
// )

We can also group by key in order to transform or combine the groupings, which themselves are Series. For example:

q.groupBy.combine(_.sum)
// res53: Series[String, Int] = [3 x 1]
// a ->  2
// b ->  7
// c ->  1
// 

q.groupBy.transform(s => s - s.mean)
// res54: Series[String, Double] = [4 x 1]
// c ->  0.0000
// b -> -0.5000
// a ->  0.0000
// b ->  0.5000
//

You can also group by another index, or by a transformation of the current index, by passing an argument into groupBy. See the Saddle API for more info.

The expressive nature of working with Series becomes apparent when you need to align data:

val a = Series(Vec(1,4,2,3), Index("a","b","c","d"))
// a: Series[String, Int] = [4 x 1]
// a ->  1
// b ->  4
// c ->  2
// d ->  3
// 
val b = Series(Vec(5,2,1,8,7), Index("b","c","d","e","f"))
// b: Series[String, Int] = [5 x 1]
// b ->  5
// c ->  2
// d ->  1
// e ->  8
// f ->  7
// 

a + b
// res55: Series[String, Int] = [6 x 1]
// a -> NA
// b ->  9
// c ->  4
// d ->  4
// e -> NA
// f -> NA
//

You see that the indexes have been aligned prior to operation being performed. Because there is a missing observation in each label of a, e, and f, the summation is not done and instead an NA value is inserted into the result.

Generally, a full-outer join is performed. So, for instance:

val c = Series(Vec(1,4,2), Index("a","b","b"))
// c: Series[String, Int] = [3 x 1]
// a ->  1
// b ->  4
// b ->  2
// 
val d = Series(Vec(5,2,1), Index("b","b","d"))
// d: Series[String, Int] = [3 x 1]
// b ->  5
// b ->  2
// d ->  1
// 

c + d
// res56: Series[String, Int] = [6 x 1]
// a -> NA
// b ->  9
// b ->  6
// b ->  7
// b ->  4
// d -> NA
//

Most basic math and boolean operations are supported between two Series, as well as between a Series and a scalar value.

We mentioned joins. Let’s look at a few join operations; the result is a Frame, which we will touch on a bit later. These are similar in nature to SQL joins.

a.join(b, how=index.LeftJoin)
// res57: Frame[String, Int, Int] = [4 x 2]
//       0  1 
//      -- -- 
// a ->  1 NA 
// b ->  4  5 
// c ->  2  2 
// d ->  3  1 
// 

 a.join(b, how=index.RightJoin)
// res58: Frame[String, Int, Int] = [5 x 2]
//       0  1 
//      -- -- 
// b ->  4  5 
// c ->  2  2 
// d ->  3  1 
// e -> NA  8 
// f -> NA  7 
// 

 a.join(b, how=index.InnerJoin)
// res59: Frame[String, Int, Int] = [3 x 2]
//       0  1 
//      -- -- 
// b ->  4  5 
// c ->  2  2 
// d ->  3  1 
// 

 a.join(b, how=index.OuterJoin)
// res60: Frame[String, Int, Int] = [6 x 2]
//       0  1 
//      -- -- 
// a ->  1 NA 
// b ->  4  5 
// c ->  2  2 
// d ->  3  1 
// e -> NA  8 
// f -> NA  7 
//