TiDB 申砾_Design and Architecture
2020-02-27 55浏览
- 1.关注公众号回复help, 可获取更多经典学习资 料和文档,电子书
- 2.TiDB Design And Architecture ShenLi PingCAP
- 3. Shen Li (申砾) Tech Lead of TiDB, VP of Engineering Netease / 360 / PingCAP Infrastructure software engineer
- 4. Why we need a new database The goal of TiDB Design && Architecture Storage Layer Scheduler SQL Layer Spark integration TiDB on Kubernetes
- 5. From scratch What’s wrong with the existing DBs? RDBMS NoSQL & Middleware NewSQL:F1 & Spanner RDBMS 1970s NoSQL NewSQL 2010 2015 Present MySQL PostgreSQL Oracle DB2 ... Redis HBase Cassandra MongoDB ... Google Spanner Google F1 TiDB
- 6. Scalability High Availability ● ACID Transaction SQL A Distributed, Consistent, Scalable, SQL Database that supports the best features of both traditional RDBMS and NoSQL Open source, of course
- 7. Data storage Data distribution Data replication Auto balance ACID Transaction SQL at scale
- 8.SQL Layer Storage Layer Applications MySQL Drivers(e.g. JDBC) MySQL Protocol TiDB RPC TiKV
- 9.
- 10. Good start! RocksDB is fast and stable. Atomic batch write Snapshot However… It’s a locally embedded KV store. Can’t tolerate machine failures Scalability depends on the capacity of the disk
- 11.Fault Tolerance Use Raft to replicate data Key features of Raft Strongleader:leader does most of the work, issue all log updates Leader election Membership changes Implementation: Ported from etcd Replicas are distributed across machines/racks/data-centers
- 12.Fault Tolerance Raft Raft RocksDB Machine 1 RocksDB Machine 2 RocksDB Machine 3
- 13.Scalability What if we SPLIT data into many regions? We got many Raft groups. Region = Contiguous Keys Hash partitioning or Range partitioning? Redis:Hash partitioning HBase:Range partitioning RangeScan:Select * from t where c > 10 and c < 100;
- 14.Key:Byte Array Logical Key Space A globally ordered map Can’t use hash partitioning Use range partitioning Region 1 -> [a - d] Region 2 -> [e - h] … Region n -> [w - z] (-∞, +∞) Sorted Map Data is stored/replicated/scheduled in regionsMeta:[Start_key, end_key)
- 15. That’s simple Logical split Just Split && Move Split safely using Raft Region 1 Region 1 Region 2
- 16.Region 1* Region 2 Region 3 Node A Node B Region 1 Region 2 Region 2 Region 3 Region Reg1ion 3 Node C Node D
- 17.Node B Region 1* Region 2 Region 3 Node A New Node E Region Reg1io^n 2 Region 2 Region 3 Region Reg1ion 3 Node C Node D
- 18.Node B Region 1 Region 2 Region 3 Node A New Node E Region Reg1io*n 2 Region 1 Region 2 Region 3 Region Reg1ion 3 Node C Node D
- 19.Node B Region Reg1io*n 2 Region 2 Region 3 Node A New Node E Region 1 Region 2 Region 3 Region Reg1ion 3 Node C Node D
- 20.Raft Group RPC Store 1 Region 1 Region 3 Region 5 Region 4 TiKV node 1 Client RPC RPC Store 2 Region 1 Region 2 Region 4 Region 3 Store 3 Region 2 Region 5 Region 3 TiKV node 2 TiKV node 3 RPC Store 4 Region 1 Region 2 Region 5 Region 4 TiKV node 4 Placement Driver PD 1 PD 2 PD 3
- 21. MVCC Data layout key1_version2 -> value key1_version1 -> value key2_version3 -> value Lock-free snapshot reads Transaction Inspired by Google Percolator ‘Almost’ decentralized 2-phase commit
- 22.● Highly layered Transaction ● Raft for consistency and scalability MVCC ● No distributed file system RaftKV ○ For better performance and lower latency Local KV Storage (RocksDB)
- 23.
- 24. Provide the God’s view of the entire cluster Store the metadata Clients have cache of placement information. Placemen t Driver Maintain the replication constraint 3 replicas, by default Raft Data movement for balancing the workload Placemen t It’s a cluster too, of course. Driver Thanks to Raft. Raft Raft Placemen t Driver
- 25.Node 1 Region A Region B Movement Node 2 Region C HeartBeat with Info Schedulin g Command PD Cluster Info Scheduling Strategy Confi g Admin
- 26. Replica number in a raft group Replica geo distribution Read/Write workload Leaders and followers Tables and TiKV instances Other customized scheduling strategy
- 27.
- 28.● SQL is simple and very productive ● We want to write code likethis:SELECT COUNT(*) FROM user WHERE age > 20 and age < 30;
- 29. Mapping relational model to Key-Value model Full-featured SQL layer Cost-based optimizer (CBO) Distributed execution engine
- 30. Row Key:'>Key: