oow 2017 in-memory
2020-02-27 58浏览
- 1.
- 2.• The founder of DBGeeK user group • Oracle ACE Associate • Oracle Database Performance geek(10+ years) • Troubleshooter • Worked on WOQU Technologyhttp://woqutech.comLet Data Drive
- 3.• Memory is the future • Why IM high performance • Column format • SIMD • Compression • Data Skipping • When to Use Oracle Database In-Memory • The impact of enabling IM feature on OLTP • The Advantage of Oracle IM compares the other IM databases Let Data Drive
- 4.In December 2013 IDC firm predicted that “Memory Optimized (“In-Memory”) Database Technology is taking over Enterprise Databases”.
- 5.Microsoft SQL Server Oracle IN-Memory IBM DB2 BLU Acceleration Teradata Intelligent Memory 2014 MemSQL SAP HANA 2010 2013 2012 SAP released the first version of its InMemory database product HANA back in November 2011 Let Data Drive
- 6.• Business-driven • Data-driven • The maturity of the technical conditions Let Data Drive
- 7.• Business-driven • Data-driven • The maturity of the technical conditions Let Data Drive
- 8.Latency Comparison Number L1 cache reference Branch mispredict L2 cache reference Main memory reference Compressor 1k bytes with zippy Send 1K bytes over 1 Gbps network Read 4K randomly from SSD* Read 1MB sequentially from memory Round trip within same datacenter Read 1MB sequentially from SSD* Disk seek Read 1MB sequentially from disk Send packet CA->Netherlands->CA Latency Number Every Programmer Should Know 0.5 ns 5 ns 7 ns 14× L1 cache 100 ns 20× L2 cache,200× L2 cache 3,000 ns 3 us 10,000 ns 10 us 150,000 ns 150 us ~1GB/sec SSD 250,000 ns 250 us 500,000 ns 500 us 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD,4× memory 10,000,000 ns 10,000 us 10 ms 20× datacenter roundtrip 20,000,000 ns 20,000 us 20 ms 80× memory, 20× SSD 150,000,000 ns 150,000 us 150 ms Let Data Drive
- 9.• Business-driven • Data-driven • The maturity of the technical conditions Let Data Drive
- 10.• Driven by business • Driven by the amount of data • The rip of the technical conditions Let Data Drive
- 11.Tape is Dead, Disk is Tape, Flash is Disk, RAM Locality is King. Jim Gray, 2006
- 12.• First introduced in 12.1.0.2 release • Accelerate data analysis,not for oltp • The other IMDB product of oracle , timesten ,for oltp •Dual-Format:Column and Row • Oracle optimizer is smart Let Data Drive
- 13.• The data consistency between the two formats • The data in IM column format only resides in RAM • In 12CR2 can sync the data in column format to disks Let Data Drive
- 14.• IBM DB2 BLU Acceleration is very similar to Oracle IMDB in the dual-format architecture • SAP HANA dual-format architecture,but cannot be both simultaneously • Oracle perfect ? pay some price for the data consistency between row and column format Let Data Drive
- 15.
- 16.• Column Format • SIMD • Compression • Data Skipping Let Data Drive
- 17.• Column Format • SIMD • Compression • Data Skipping Let Data Drive
- 18.• Column Format • SIMD • Compression • Data Skipping Let Data Drive
- 19.• Column Format • SIMD • Compression • Data Skipping Let Data Drive
- 20.Let Data Drive • Column data tightly packed together • Improve access efficiency • Reduce memory traffic • The cost of accessing to any column is the same
- 21.SQL> desc wrh Name Type ------------- --------------- --- ID1 NUMBER ID2 NUMBER ID3 NUMBER ID4 NUMBER ID5 NUMBER ID6 NUMBER ID7 NUMBER ID8 NUMBER ID9 NUMBER ID10 NUMBER ID11 NUMBER ID12 NUMBER ID13 NUMBER ID14 NUMBER ID15 NUMBER ID16 NUMBER ID17 NUMBER ID18 NUMBER ID19 NUMBER ID20 NUMBER Let Data Drive Cached the table in Oracle buffer cache, and populated it into IM Count the total number of rows for column 3, 6, 9, 12, 15, 18 and 20 respectively select count(ID3) from wrh where id1>1 and id2<1000000; select count(ID6) from wrh where id1>1 and id2<1000000; select count(ID9) from wrh where id1>1 and id2<1000000; select count(ID12) from wrh where id1>1 and id2<1000000; select count(ID15) from wrh where id1>1 and id2<1000000;
- 22.# perf stat -d -p 26031 sleep 5 Measure what’s going on inside Performance counter stats for process id '26031': a 11.767587 task-clock (msec) CPU! # 0.002 CPUs utilized 3 context-switches # 0.255 K/sec 1 cpu-migrations # 0.085 K/sec 374 page-faults # 0.032 M/sec 14,850,049 cycles # 1.262 GHz (51.26%) 9,410,174 stalled-cycles-frontend # 63.37% frontend cycles idle (55.56%) 26,277,633 instructions # 1.77 insns per cycle # 0.36 stalled cycles per insn (66.22%) 1,861,488 branches # 158.188 M/sec (66.23%) 30,688 branch-misses # 1.65% of all branches (66.24%) 4,916,126 L1-dcache-loads # 417.768 M/sec (21.91%) 1,054,064 L1-dcache-load-misses # 21.44% of all L1-dcache hits (17.10%) 299,978 LLC-loads # 25.492 M/sec (24.23%) 240,057 LLC-load-misses # 80.02% of all LL-cache hits (32.41%) Let Data Drive
- 23.ROW FORMAT COLUMN FORMAT Let Data Drive The later the column is, the more instructions it consumes Stays the same when accessing different columns
- 24.Row format Column format The difference is pretty obvious - near 200 times. Let Data Drive
- 25.• SIMD,Single instruction, multiple data • Get the CPU to simultaneously process multiple values in a vector • Modern Intel CPUs Have 16-32 SIMD registers • Applies only to column format • The columnar data is packed tightly together, take full advantage of the CPU features such as SIMD, superscalar, the friendly data structure is the key point. Let Data Drive
- 26.Let Data Drive
- 27.Let Data Drive The CPU evaluates the data asfollows:1. Loads the first 8 values from the promo_id column into the SIMD register, and then compares them with the value 9999 in a single instruction. 2. Discards the entries. 3. Loads another 8 values into the SIMD register, and then continues in this way until it has evaluated all entries.
- 28.$ grep "^model name" /proc/cpuinfo sort uniq model name : Intel(R) Xeon(R) CPU E5-4627 v2 @ 3.30GHz $ grep ^flags /proc/cpuinfo egrep "avx sse " sed 's/ /\n/g' egrep "avx sse " sort uniq avx sse In my environment, support AVX and sse2 SSEx extensions, does not support sse4_1 AVX2, AVX512 extensions. sse4_2 ssse3 Let Data Drive
- 29.$ pmap 8527 grep libshpk 00007feeeb310000 2484K r-x-- /u01/app/oracle/product/12.2.0/dbhome_1/lib/libshpkavx12.so 00007feeeb57d000 2044K ----- /u01/app/oracle/product/12.2.0/dbhome_1/lib/libshpkavx12.so 00007feeeb77c000 132K rw--- /u01/app/oracle/product/12.2.0/dbhome_1/lib/libshpkavx12.so In my environment the AVX has been used by oracle. Let Data Drive
- 30.$ find /u01/app/oracle/product -name *avx* /u01/app/oracle/product/12.2.0/dbhome_1/rdbms/admin/libshpkavx12.def /u01/app/oracle/product/12.2.0/dbhome_1/rdbms/admin/libshpkavx212.def /u01/app/oracle/product/12.2.0/dbhome_1/lib/libmkl_avx512.so /u01/app/oracle/product/12.2.0/dbhome_1/lib/libmkl_avx512_mic.so /u01/app/oracle/product/12.2.0/dbhome_1/lib/libmkl_vml_avx512.so /u01/app/oracle/product/12.2.0/dbhome_1/lib/libshpkavx212.so /u01/app/oracle/product/12.2.0/dbhome_1/lib/libmkl_vml_avx512_mic.so • 12CR1, does not support AVX2, AVX512 extensions • 12CR2, supports AVX2,but I am not sure about AVX512, through Oracle lib directory already exists AVX512 lib Let Data Drive
- 31.There are two important benefits : • Less memory traffic • Decompression on‐the‐fly (probably) benefits from CPU L2/L3 cache The general purpose of compression is to save space, But for IM it‘s just a side effect Let Data Drive
- 32.• CPU is faster • RAM access is the bottleneck of modern computers • Want to wait less? Do it less! Let Data Drive
- 33.• Query can read the data without decompression • Only decompress when the data need to return • Read the compressed data can benefit from the CPU L2 / L3 cache • Reducing memory writes Let Data Drive
- 34.COMPRESSION LEVEL NO MEMCOMPRESS MEMCOMPRESS FOR DML MEMCOMPRESS FOR QUERY LOW MEMCOMPRESS FOR QUERY HIGH MEMCOMPRESS FOR CAPACITY LOW MEMCOMPRESS FOR CAPACITY HIGH DESCRIPTION Data is populated without any compression MEMCOMPRESS FOR DML Optimized for query performance (default) Optimized for query performance as well as space saving Balanced with a greater bias towards space saving Optimized for space saving It is recommended to use the FOR QUERY compression algorithm, SQL queries execute directly on the compressed data Let Data Drive
- 35.Let Data Drive • The IM stores data for a single object (table, partition, materialized view) in a set of IMCUs. • An IMCU stores columnar data for one and only one object.
- 36.Let Data Drive • A Column Compression Unit (CU) is contiguous storage for a single column in an IMCU. • Every IMCU has one or more CUs.
- 37.Let Data Drive • A CU is divided into a body and a header • The header contains metadata about the values stored in the CU body • It may also contain a local Dictionary • The local Dictionary is a sorted list of the distinct values in that column and their
- 38.SQL> select /*+ parallel(16) */ count(*),count(distinct id) from c1; COUNT(*) -------------138572154 COUNT(DISTINCTID) ----------------------------- 138572154 The table c1 has only one column, its value is generated according to a sequence. For primary columns, date columns, or the number of distinct value are very high columns,the local dictionary takes up a lot of space Let Data Drive
- 39.Compression method memcompress for query low Original 1688 M After compression 2177 M Bigger than original Ratio -30% Let Data Drive
- 40.• The traditional btree indexes have no advantage in the data analysis • Data skipping technology the major memory database vendors have • Tell the database quickly which blocks do not need to be accessed • Automatically create and maintain,only exists in memory • Reduce memory traffic Let Data Drive
- 41.• The Storage Index is already available in the first release of Exadata in 2008 • Now this feature has been migrated to IM • Tell the database which blocks do not need to visit • Each CU ‘s head records the maximum and minimum values Let Data Drive
- 42.For example, for queries such as Where prod_id> 14 and prod_id <29, according to the maximum minimum information recorded by the CU header ,the IMCU 1 and IMCU 2 are skipped directly during the scan. Let Data Drive
- 43.Retrieve 10% rows out of a 20 GBtable:1. Enable storage index select /*+ full(wxh) INMEMORY_PRUNING */count(object_name),count(object_type) from wxh where id>1 and id<10000000; 2. Disable storage index select /*+ full(wxh) NO_INMEMORY_PRUNING */count(object_name),count(object_type) from wxh where id>1 and id<100000000; Let Data Drive
- 44.SQL Elapsed Time select /*+ full(wxh) INMEMORY_PRUNING */ count(object_name),count(object_type) from wxh where id>1 and id<10000000; 30 ms select /*+ full(wxh) NO_INMEMORY_PRUNING */ count(object_name),count(object_type) from wxh where id>1 and id<100000000; 160 ms Speed up 5x • The (NO_)INMEMORY_PRUNING hint can enable/disable storage indexes. • You haven’t a reason to disable the storage index in the production environment. Let Data Drive
- 45.Retrieve 15% rows out of a 20 GBtable:1. FULL TABLE SCAN BUFFER CACHE(NO INMEMORY) select /*+ full(wxh) */count(object_name) from wxh where object_id>1 and object_id<10000; 2. FULL TABLE SCAN IN MEMORY select /*+ full(wxh) */count(object_name) from wxh where object_id>1 and object_id<10000; Let Data Drive
- 46.The IM is 110 times faster than buffer cache Metric task-clock (msec) cycles instructions insns per cycle Buffer Cache 11684 ms 41,357,721,213 41,453,927,963 1 In-Memory 106 ms 179,523,876 425,354,074 2.37 CPU time Let Data Drive Ratio 110 Speed up 110x 230 97 0.42
- 47.• Parallel Execution doesn’t mean “work smarter” • You’re actually willing to accept to “work harder” • IM is smart • IM+Parallel is a best practice • More slaves • more PGA_AGGREGATE_TARGET • IM has a Bigger IPC, insns per cycle , higher is better Let Data Drive
- 48.Life cycle of aquery:Retrieve • Column format • Storage Index • Compression • SIMD Let Data Drive Process Return
- 49.• In-Memory Joins accelerates join operations through Bloom filter • Join Groups (12CR2 New feature) join groups eliminate the performance overhead of decompressing and hashing column values • In-Memory Aggregation also converts the join into a filter operation • Virtual columns and IM Expressions (12CR2 New feature), you can further improve performance for some CPU resource-intensive queries The effect is very dependent on the actual situation Let Data Drive
- 50.? Why oracle try its best to convert the join into a filter operation Because,Filter is very efficient through SIMD vector Processing. Let Data Drive
- 51.Let Data Drive • IM can indirectly improve the performance of OLTP system • These analytic indexes are no longer needed • Reduce the overhead of maintaining the indexes
- 52.A coin, there are always two sides
- 53.TPS drops by 20% after enabling IM on related tables Based on the swingbench tool, 25 concurrent users, 10G of data volume. Let Data Drive
- 54.• Column Format • SIMD • Data Skipping • Compression The technology about high performance these IM database vendors are using is similar Let Data Drive
- 55.Let Data Drive
- 56.• Scalability • RAC • Active Data Guard These enterprise features is real advantage of oracle IM • Application transparency • Mixed workload support Let Data Drive
- 57.Thank You! Let Data Drive