oow 2017 in-memory

2020-02-27 58浏览

1.
2.• The founder of DBGeeK user group • Oracle ACE Associate • Oracle Database Performance geek(10+ years) • Troubleshooter • Worked on WOQU Technologyhttp://woqutech.comLet Data Drive
3.• Memory is the future • Why IM high performance • Column format • SIMD • Compression • Data Skipping • When to Use Oracle Database In-Memory • The impact of enabling IM feature on OLTP • The Advantage of Oracle IM compares the other IM databases Let Data Drive
4.In December 2013 IDC firm predicted that “Memory Optimized (“In-Memory”) Database Technology is taking over Enterprise Databases”.
5.Microsoft SQL Server Oracle IN-Memory IBM DB2 BLU Acceleration Teradata Intelligent Memory 2014 MemSQL SAP HANA 2010 2013 2012 SAP released the first version of its InMemory database product HANA back in November 2011 Let Data Drive
6.• Business-driven • Data-driven • The maturity of the technical conditions Let Data Drive
7.• Business-driven • Data-driven • The maturity of the technical conditions Let Data Drive
8.Latency Comparison Number L1 cache reference Branch mispredict L2 cache reference Main memory reference Compressor 1k bytes with zippy Send 1K bytes over 1 Gbps network Read 4K randomly from SSD* Read 1MB sequentially from memory Round trip within same datacenter Read 1MB sequentially from SSD* Disk seek Read 1MB sequentially from disk Send packet CA->Netherlands->CA Latency Number Every Programmer Should Know 0.5 ns 5 ns 7 ns 14× L1 cache 100 ns 20× L2 cache,200× L2 cache 3,000 ns 3 us 10,000 ns 10 us 150,000 ns 150 us ~1GB/sec SSD 250,000 ns 250 us 500,000 ns 500 us 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD,4× memory 10,000,000 ns 10,000 us 10 ms 20× datacenter roundtrip 20,000,000 ns 20,000 us 20 ms 80× memory, 20× SSD 150,000,000 ns 150,000 us 150 ms Let Data Drive
9.• Business-driven • Data-driven • The maturity of the technical conditions Let Data Drive
10.• Driven by business • Driven by the amount of data • The rip of the technical conditions Let Data Drive
11.Tape is Dead, Disk is Tape, Flash is Disk, RAM Locality is King. Jim Gray, 2006
12.• First introduced in 12.1.0.2 release • Accelerate data analysis,not for oltp • The other IMDB product of oracle , timesten ,for oltp •Dual-Format:Column and Row • Oracle optimizer is smart Let Data Drive
13.• The data consistency between the two formats • The data in IM column format only resides in RAM • In 12CR2 can sync the data in column format to disks Let Data Drive
14.• IBM DB2 BLU Acceleration is very similar to Oracle IMDB in the dual-format architecture • SAP HANA dual-format architecture，but cannot be both simultaneously • Oracle perfect ? pay some price for the data consistency between row and column format Let Data Drive
15.
16.• Column Format • SIMD • Compression • Data Skipping Let Data Drive
17.• Column Format • SIMD • Compression • Data Skipping Let Data Drive
18.• Column Format • SIMD • Compression • Data Skipping Let Data Drive
19.• Column Format • SIMD • Compression • Data Skipping Let Data Drive
20.Let Data Drive • Column data tightly packed together • Improve access efficiency • Reduce memory traffic • The cost of accessing to any column is the same
21.SQL> desc wrh Name Type ------------- --------------- --- ID1 NUMBER ID2 NUMBER ID3 NUMBER ID4 NUMBER ID5 NUMBER ID6 NUMBER ID7 NUMBER ID8 NUMBER ID9 NUMBER ID10 NUMBER ID11 NUMBER ID12 NUMBER ID13 NUMBER ID14 NUMBER ID15 NUMBER ID16 NUMBER ID17 NUMBER ID18 NUMBER ID19 NUMBER ID20 NUMBER Let Data Drive Cached the table in Oracle buffer cache, and populated it into IM Count the total number of rows for column 3, 6, 9, 12, 15, 18 and 20 respectively select count(ID3) from wrh where id1>1 and id2<1000000; select count(ID6) from wrh where id1>1 and id2<1000000; select count(ID9) from wrh where id1>1 and id2<1000000; select count(ID12) from wrh where id1>1 and id2<1000000; select count(ID15) from wrh where id1>1 and id2<1000000;
22.# perf stat -d -p 26031 sleep 5 Measure what’s going on inside Performance counter stats for process id '26031': a 11.767587 task-clock (msec) CPU! # 0.002 CPUs utilized 3 context-switches # 0.255 K/sec 1 cpu-migrations # 0.085 K/sec 374 page-faults # 0.032 M/sec 14,850,049 cycles # 1.262 GHz (51.26%) 9,410,174 stalled-cycles-frontend # 63.37% frontend cycles idle (55.56%) 26,277,633 instructions # 1.77 insns per cycle # 0.36 stalled cycles per insn (66.22%) 1,861,488 branches # 158.188 M/sec (66.23%) 30,688 branch-misses # 1.65% of all branches (66.24%) 4,916,126 L1-dcache-loads # 417.768 M/sec (21.91%) 1,054,064 L1-dcache-load-misses # 21.44% of all L1-dcache hits (17.10%) 299,978 LLC-loads # 25.492 M/sec (24.23%) 240,057 LLC-load-misses # 80.02% of all LL-cache hits (32.41%) Let Data Drive
23.ROW FORMAT COLUMN FORMAT Let Data Drive The later the column is, the more instructions it consumes Stays the same when accessing different columns
24.Row format Column format The difference is pretty obvious - near 200 times. Let Data Drive
25.• SIMD,Single instruction, multiple data • Get the CPU to simultaneously process multiple values in a vector • Modern Intel CPUs Have 16-32 SIMD registers • Applies only to column format • The columnar data is packed tightly together, take full advantage of the CPU features such as SIMD, superscalar, the friendly data structure is the key point. Let Data Drive
26.Let Data Drive
27.Let Data Drive The CPU evaluates the data asfollows:1. Loads the first 8 values from the promo_id column into the SIMD register, and then compares them with the value 9999 in a single instruction. 2. Discards the entries. 3. Loads another 8 values into the SIMD register, and then continues in this way until it has evaluated all entries.
28.$ grep "^model name" /proc/cpuinfo sort uniq model name : Intel(R) Xeon(R) CPU E5-4627 v2 @ 3.30GHz $ grep ^flags /proc/cpuinfo egrep "avx sse " sed 's/ /\n/g' egrep "avx sse " sort uniq avx sse In my environment, support AVX and sse2 SSEx extensions, does not support sse4_1 AVX2, AVX512 extensions. sse4_2 ssse3 Let Data Drive
29.$ pmap 8527 grep libshpk 00007feeeb310000 2484K r-x-- /u01/app/oracle/product/12.2.0/dbhome_1/lib/libshpkavx12.so 00007feeeb57d000 2044K ----- /u01/app/oracle/product/12.2.0/dbhome_1/lib/libshpkavx12.so 00007feeeb77c000 132K rw--- /u01/app/oracle/product/12.2.0/dbhome_1/lib/libshpkavx12.so In my environment the AVX has been used by oracle. Let Data Drive
30.$ find /u01/app/oracle/product -name *avx* /u01/app/oracle/product/12.2.0/dbhome_1/rdbms/admin/libshpkavx12.def /u01/app/oracle/product/12.2.0/dbhome_1/rdbms/admin/libshpkavx212.def /u01/app/oracle/product/12.2.0/dbhome_1/lib/libmkl_avx512.so /u01/app/oracle/product/12.2.0/dbhome_1/lib/libmkl_avx512_mic.so /u01/app/oracle/product/12.2.0/dbhome_1/lib/libmkl_vml_avx512.so /u01/app/oracle/product/12.2.0/dbhome_1/lib/libshpkavx212.so /u01/app/oracle/product/12.2.0/dbhome_1/lib/libmkl_vml_avx512_mic.so • 12CR1, does not support AVX2, AVX512 extensions • 12CR2, supports AVX2,but I am not sure about AVX512, through Oracle lib directory already exists AVX512 lib Let Data Drive
31.There are two important benefits : • Less memory traffic • Decompression on‐the‐fly (probably) benefits from CPU L2/L3 cache The general purpose of compression is to save space, But for IM it‘s just a side effect Let Data Drive
32.• CPU is faster • RAM access is the bottleneck of modern computers • Want to wait less? Do it less! Let Data Drive
33.• Query can read the data without decompression • Only decompress when the data need to return • Read the compressed data can benefit from the CPU L2 / L3 cache • Reducing memory writes Let Data Drive
34.COMPRESSION LEVEL NO MEMCOMPRESS MEMCOMPRESS FOR DML MEMCOMPRESS FOR QUERY LOW MEMCOMPRESS FOR QUERY HIGH MEMCOMPRESS FOR CAPACITY LOW MEMCOMPRESS FOR CAPACITY HIGH DESCRIPTION Data is populated without any compression MEMCOMPRESS FOR DML Optimized for query performance (default) Optimized for query performance as well as space saving Balanced with a greater bias towards space saving Optimized for space saving It is recommended to use the FOR QUERY compression algorithm, SQL queries execute directly on the compressed data Let Data Drive
35.Let Data Drive • The IM stores data for a single object (table, partition, materialized view) in a set of IMCUs. • An IMCU stores columnar data for one and only one object.
36.Let Data Drive • A Column Compression Unit (CU) is contiguous storage for a single column in an IMCU. • Every IMCU has one or more CUs.
37.Let Data Drive • A CU is divided into a body and a header • The header contains metadata about the values stored in the CU body • It may also contain a local Dictionary • The local Dictionary is a sorted list of the distinct values in that column and their
38.SQL> select /*+ parallel(16) */ count(*),count(distinct id) from c1; COUNT(*) -------------138572154 COUNT(DISTINCTID) ----------------------------- 138572154 The table c1 has only one column, its value is generated according to a sequence. For primary columns, date columns, or the number of distinct value are very high columns,the local dictionary takes up a lot of space Let Data Drive
39.Compression method memcompress for query low Original 1688 M After compression 2177 M Bigger than original Ratio -30% Let Data Drive
40.• The traditional btree indexes have no advantage in the data analysis • Data skipping technology the major memory database vendors have • Tell the database quickly which blocks do not need to be accessed • Automatically create and maintain,only exists in memory • Reduce memory traffic Let Data Drive
41.• The Storage Index is already available in the first release of Exadata in 2008 • Now this feature has been migrated to IM • Tell the database which blocks do not need to visit • Each CU ‘s head records the maximum and minimum values Let Data Drive
42.For example, for queries such as Where prod_id> 14 and prod_id <29, according to the maximum minimum information recorded by the CU header ,the IMCU 1 and IMCU 2 are skipped directly during the scan. Let Data Drive
43.Retrieve 10% rows out of a 20 GBtable:1. Enable storage index select /*+ full(wxh) INMEMORY_PRUNING */count(object_name),count(object_type) from wxh where id>1 and id<10000000; 2. Disable storage index select /*+ full(wxh) NO_INMEMORY_PRUNING */count(object_name),count(object_type) from wxh where id>1 and id<100000000; Let Data Drive
44.SQL Elapsed Time select /*+ full(wxh) INMEMORY_PRUNING */ count(object_name),count(object_type) from wxh where id>1 and id<10000000; 30 ms select /*+ full(wxh) NO_INMEMORY_PRUNING */ count(object_name),count(object_type) from wxh where id>1 and id<100000000; 160 ms Speed up 5x • The (NO_)INMEMORY_PRUNING hint can enable/disable storage indexes. • You haven’t a reason to disable the storage index in the production environment. Let Data Drive
45.Retrieve 15% rows out of a 20 GBtable:1. FULL TABLE SCAN BUFFER CACHE(NO INMEMORY) select /*+ full(wxh) */count(object_name) from wxh where object_id>1 and object_id<10000; 2. FULL TABLE SCAN IN MEMORY select /*+ full(wxh) */count(object_name) from wxh where object_id>1 and object_id<10000; Let Data Drive
46.The IM is 110 times faster than buffer cache Metric task-clock (msec) cycles instructions insns per cycle Buffer Cache 11684 ms 41,357,721,213 41,453,927,963 1 In-Memory 106 ms 179,523,876 425,354,074 2.37 CPU time Let Data Drive Ratio 110 Speed up 110x 230 97 0.42
47.• Parallel Execution doesn’t mean “work smarter” • You’re actually willing to accept to “work harder” • IM is smart • IM+Parallel is a best practice • More slaves • more PGA_AGGREGATE_TARGET • IM has a Bigger IPC, insns per cycle , higher is better Let Data Drive
48.Life cycle of aquery:Retrieve • Column format • Storage Index • Compression • SIMD Let Data Drive Process Return
49.• In-Memory Joins accelerates join operations through Bloom filter • Join Groups (12CR2 New feature) join groups eliminate the performance overhead of decompressing and hashing column values • In-Memory Aggregation also converts the join into a filter operation • Virtual columns and IM Expressions (12CR2 New feature), you can further improve performance for some CPU resource-intensive queries The effect is very dependent on the actual situation Let Data Drive
50.? Why oracle try its best to convert the join into a filter operation Because,Filter is very efficient through SIMD vector Processing. Let Data Drive
51.Let Data Drive • IM can indirectly improve the performance of OLTP system • These analytic indexes are no longer needed • Reduce the overhead of maintaining the indexes
52.A coin, there are always two sides
53.TPS drops by 20% after enabling IM on related tables Based on the swingbench tool, 25 concurrent users, 10G of data volume. Let Data Drive
54.• Column Format • SIMD • Data Skipping • Compression The technology about high performance these IM database vendors are using is similar Let Data Drive
55.Let Data Drive
56.• Scalability • RAC • Active Data Guard These enterprise features is real advantage of oracle IM • Application transparency • Mixed workload support Let Data Drive
57.Thank You! Let Data Drive