Performance patches in Go 1,11

2020-03-01 308浏览

  • 1.Performance patches in Go 1.11 Aliaksandr Valialkin, VictoriaMetrics
  • 2.About me ● I like Go and performance optimizations ● I’m the author of fasthttp, fastjson, fastrpc, fastrand, quicktemplate and many other projects - seehttps://github.com/valyala/● Now I work on the fastest time series DB - VictoriaMetrics. It is written in Go
  • 3.Agenda ● Compiler and runtime optimizations ● Math/big optimizations (aka ‘crypto-optimizations’) ● Standard library optimizations ● Arch-specific optimizations
  • 4.Compiler and runtime optimizations
  • 5.109918 cmd/compile:refactor inlining parameters; inline panic
  • 6.What is inlining? ● ● ● ● Inlining is the process of embedding function code into the place of function call Inlining eliminates function call overhead Inlining opens additional optimization opportunities for the compiler Inlining may improve performance
  • 7.What is inlining? ● ● ● ● But sometimes inlining may hurt performance Big functions’ inlining may lead to binary size bloat and bad performance if the resulting binary code stops fitting CPU instruction cache So it is better to inline small functions Go compiler performs basic inlining
  • 8.Inline functions with panic ● Go 1.11 may inline functions with panic() ● Panic may beimplicit:○ If slice element is accessed without explicit or provable bounds check, then the compiler translates a[i] into if i < 0 i >= len(a) { panic(“out of bounds access”) } a[i]
  • 9.Inline functions with panic ● Go 1.11 may inline functions with panic() ● Panic may beimplicit:○ If struct field is accessed via struct pointer without explicit or provable nill check, then the compiler may translate foo.bar for foo *T into if foo == nil { panic(“nil dereference”) } foo.bar
  • 10.Inline functions with panic ● Go 1.11 is able to inline the followingfunction:type T struct { N int } func f(a []int, b *T) { b.N = a[2] }
  • 11.110055 cmd/compile:optimize map-clearing range idiom
  • 12.Optimize map-clearing range idiom ● Go 1.11 now detects and optimizes the following code func clearMap(m map[K]V) { for k := range m { delete(m, k) } }
  • 13.Optimize map-clearing range idiom ● It is substituted by somethinglike:func clearMap(m map[K]V) { // fastClearMap is an optimized function // for clearing the map. It preserves m capacity // in order to reduce overhead during // subsequent additions into the map fastClearMap(m) }
  • 14.109517 cmd/compile:optimize append(x, make([]T, y)...) slice extension
  • 15.Optimize append(x, make([]T, y)...) ● Previously the following code was frequently used for fast sliceextension:func growSlice(a []T, itemsToAdd int) []T { newSize := len(a) + itemsToAdd for cap(a) < newSize { a = append(a[:cap(a)], T{}) } return a[:newSize] }
  • 16.Optimize append(x, make([]T, y)...) ● Now this code may be substituted by simpler and fastercode:func growSlice(a []T, itemsToAdd int) []T { return append(x, make([]T, itemsToAdd)...) } ● Previously such code wasn’t optimal, since Go performed an unnecessary allocation for make([]T, itemsToAdd).
  • 17.91557 cmd/compile:avoid extra mapaccess in "m[k] op= r"
  • 18.Avoid extra mapaccess in “m[k] op= r” ● Now the following code worksfaster:func countWords(words []string) map[string]int { m := make(map[string]int) for _, w := range words { m[w] += 1 } return m } // This line works faster in Go 1.11
  • 19.100838 cmd/compile:avoid mapaccess at m[k]=append(m[k]..
  • 20.Avoid mapaccess at m[k]=append(m[k]...) ● Now the following code worksfaster:func groupWordsByLen(words []string) map[int][]string { m := make(map[int][]string) for _, w := range words { wLen := len(w) // The following line works faster in Go 1.11 m[wLen] = append(m[wLen], w) } return m }
  • 21.84055 cmd/compile/internal/ssa:update regalloc in loops
  • 22.Update regalloc in loops ● Improves performance for the following code by using better register allocation insideloops:for ... { if hard_case { call() } // simple case, without call }
  • 23.100718 cmd/compile:specialize Move up to 79B on amd64
  • 24.Specialize Move up to 79B on amd64 ● ● Improves performance when copying structs and arrays with sizes from 32 bytes to 79 bytes Benchmarkresults:CopyFat24-4 0.80ns ± 0% 0.40ns ± 0% -50.00% (p=0.001 n=8+9) CopyFat32-4 2.01ns ± 0% 0.40ns ± 0% -80.10% (p=0.000 n=8+8) CopyFat64-4 2.87ns ± 0% 0.40ns ± 0% -86.07% (p=0.000 n=8+10)
  • 25.Bounds check elimination (BCE) improvements
  • 26.What is bounds check elimination (BCE)? ● ● ● ● For a[i] go checks whether i exceeds a bounds by adding the following guard code before each a[i]: if i < 0 i >= len(a) { panic(“out of bounds access”) } This code sometimes becomes redundant if a similar check already exists before a[i] Detection with subsequent removal of such guard code is called BCE BCE improves performance
  • 27.BCE improvements ● Go 1.11 contains many patches for detecting and eliminating more bounds checks comparing to previous go versions. Here are a few of suchpatches:○ ○ ○ ○ ○ ○ ○ ○ ○ 104037 cmd/compile:'>compile: