RailsConf 2017 - The Secret Life of SQL: How to Optimize Database Performance

2020-02-27 254浏览

  • 1.! The Secret Life of SQL How to Optimize Database Performance @bryanaknight " How people build software
  • 2.What’s this talk about? ! ! How people build software Some “best practices” aren’t always “best practices” after all 2
  • 3.Overview ! ! WHAT ARE INDEXES • Let’s visualize them • How do they work COMMON RULES OF INDEXING, QUERYING, DATA MODELING ! • What are they • When/Why do they break down • Tools you can use • How to move forward ! How people build software 3
  • 4.What even is an index? ! • An index is a copy of selected columns of data from a table that can be searched quickly and efficiently • Index is sorted • Has as many rows as the table (1 row in the index for 1 row in the table itself) • Indexes and tables are really the same thing - but you can get to the data you need in an index faster and then lookup whatever else you need in the main table knowing exactly where to look ! How people build software 4
  • 5.All about indexes. ! How people build software ! 5
  • 6.All about indexes. ! • Filter data • Sort data • Retrieve records • Get to the smallest amount of data the soonest ! How people build software 6
  • 7.All about indexes. • SELECT * from repositories WHERE owner_id ! > 500 • without any index, full table scan Repositories ! How people build software 7
  • 8.All about indexes. ! • SELECT * from repositories WHERE owner_id > 500 Repositories ! How people build software Index over owner_id 8
  • 9.All about indexes. Repositories ! How people build software ! Index over owner_id 9
  • 10.All about indexes. ! • SELECT name from repositories WHERE owner_id > 500 Repositories ! How people build software Index over owner_id_and_name 10
  • 11.All about indexes. ! Index over owner_id_and_name • Index contents are used left to right • Example • ‘SELECT * FROM repositories where name = ‘rails’ ‘ would NOT use the index ! How people build software 11
  • 12.! COMMON RULES OF INDEXING, QUERYING, AND DATA MODELING • What are they • When/Why do they break down • Tools you can use • How to move forward ! How people build software 12
  • 13.Rule #1 ! ! How people build software Any columns involved in queries should be covered by an index 13
  • 14.Rule #1: Any columns in queries should be indexed ! `index_labels_on_repository_id` ! How people build software 14
  • 15.Rule #1: Any columns in queries should be indexed ! `index_labels_on_repository_id` `index_labels_on_repository_id_and_name` ! How people build software 15
  • 16.Redundant indexes `index_labels_on_repository_id` ! How people build software ! `index_labels_on_repository_id_and_name` 16
  • 17.When to break the rule. ! • We’re not really breaking it, we’re adding a caveat of “unless there is already a covering index.” Rule #1: Any columns involved in queries should be covered by an index ! How people build software 17
  • 18.Why does it matter? ! • So we have a redundant index. So what? • Takes up precious space • ! Adding an index slows down UPDATEs, INSERTs and DELETEs, because those operations need to be performed on the table and the index How people build software 18
  • 19.Redundant indexes tools ! How people build software ! 19
  • 20.Rule #1 ! Any columns involved in queries should be covered by an index* *BUT avoid redundant or unused indexes *Sometimes it’s better to use an index prefix ! How people build software 20
  • 21.What is an index prefix? Index over repository name ! How people build software ! Index prefix over repository name 21
  • 22.Rule #1: Any columns in queries should be indexed ! “Give me the recent code reviewers for a certain code path” ! How people build software 22
  • 23.Rule #1: Any columns in queries should be indexed ! `index_pull_request_reviews_on_repository_id_and_path_and_created_at` ! How people build software 23
  • 24.Redundant indexes tools ! How people build software ! 24
  • 25.When to break the rule. ! • When should you use an index prefix instead of a regular index? • long datatype (text, blob, varbinary) • querying the data works well with a prefix (i.e. usernames starting with ‘a’) Rule #1: Any columns involved in queries should be covered by an index ! How people build software 25
  • 26.Sometimes it’s better to use an index prefix ! • How long should we make the prefix? • long enough to differentiate values within this field • base it off of real data if possible “I chose 255 bytes as the prefix because the longest path in github/github is 199 bytes” ! How people build software 26
  • 27.Sometimes it’s better to use an index prefix ! • What do we gain from using an index prefix? ! • less space required for the index • ability to index big data types • comparable performance improvement as a full index if length calculated thoughtfully or query tuned for index prefix How people build software 27
  • 28.Rule #1 ! Any columns involved in queries should be covered by an index* *BUT avoid redundant or unused indexes *Sometimes it’s better to use an index prefix ! How people build software 28
  • 29.Rule #2 ! ! How people build software Use an OR to return records satisfying one or more of several conditions 29
  • 30.Rule #2: Use an “OR” ! “SELECT MAX(id) FROM users WHERE spammy=1 OR suspended_at IS NOT NULL” ! How people build software 30
  • 31.EXPLAIN it to me, MySQL ! • MySQL explains how it would process the statements • How are tables JOINed and in which order • Which indexes, if any, are used ! How people build software 31
  • 32.EXPLAIN it to me, MySQL ! How people build software ! 32
  • 33.What indexes are we working with? ! • Indexes with columns in ourquery:• `index_users_on_spammy` (`spammy`) • `index_users_on_suspended_at` (`suspended_at`) • the `OR` operator is limited because MySQL can only use one index per table during a query, so it chooses to use neither ! How people build software 33
  • 34.When to break the rule. ! • The table being queried isn’t small, so a full table scan is not performant • The use of the ‘OR’ is preventing any index from being used Rule #2: Use an OR to return records satisfying one or more of several conditions ! How people build software 34
  • 35.Rule #2: Use “OR” “UNION” ! How people build software ! 35
  • 36.Rule #2: Use an “OR” ! “SELECT MAX(id) FROM users WHERE spammy=1 OR suspended_at IS NOT NULL” PostgreSQL has an index merge feature! .01 sec…yay! ! How people build software 36
  • 37.Rule #2 ! Use an OR to return records satisfying one of more of several conditions* *except when the OR is preventing an index from being used and you have to do a full table scan ! How people build software 37
  • 38.Rule #3 ! ! How people build software If there’s an index over all the fields in your query, you’re all set 38
  • 39.Rule #3: If there’s an index, you’re all set ! “Give me the commit comments for this repository, ordered by id” ! How people build software 39
  • 40.Rule #3: If there’s an index, you’re all set ! • EXPLAIN it to me, MySQL `index_commit_comments_on_repository_id_and_user_id` ! How people build software 40
  • 41.Rule #3: If there’s an index, you’re all set ! • MySQL thinks its better to scan the PK index just so that the rows are generated in the right order ! How people build software 41
  • 42.When to break the rule. ! • MySQL is choosing to use an index that isn’t performant, or no index at all when one is available Rule #3: If there’s an index over all the fields in your query, you’re all set ! How people build software 42
  • 43.Let’s help MySQL Out! ! How people build software ! 43
  • 44.Rule #3: If there’s an index, you’re all set ! • Other options • USE INDEX - more of a hint • IGNORE INDEX - explicitly state which index not to use, but let MySQL decide from the remaining indexes ! How people build software 44
  • 45.Index Hint gotchas. ! FUTURE PROOFING # # ! How people build software • Beware if you use one and the index is deleted POSTGRESQL DOESN’T HAVE THEM • Sorry! 45
  • 46.Rule #3 ! If there’s an index over all the fields in your query, you’re all set *except when the query planner doesn’t know to use the index, in which case, help it out ! How people build software 46
  • 47.Rule #4 ! ! How people build software Avoid redundant data across tables 47
  • 48.A normalized database ! How people build software ! 48
  • 49.A normalized database ! “Give me all these Pull Requests, excluding those opened by spammy user” “Give me all these users’ gists, unless the user is spammy” “Give me this group of repositories, except those owned by a spammy user” ! How people build software 49
  • 50.When to break the rule. ! • Additional reads (JOINs) are causing noticeable performance degradation • High ratio of reads to writes Rule #4: Avoid redundant data across tables ! How people build software 50
  • 51.A denormalized database ! How people build software ! 51
  • 52.Rule #4: normalize denormalize data ! “Give me this issue’s comments from non-spammy users” ! How people build software 52
  • 53.Results. ! How people build software ! 53
  • 54.Gotchas. ! DATABASE CHANGES # # • Lots of migrations to add new column to tables • Data transitions to backfill data DATA QUALITY • The `user_hidden` column is updated in the tables via a background job after a user is marked as spammy • Nightly job to resolve any mismatches b/w the tables ! How people build software 54
  • 55.Rule #4 ! Avoid redundant data across tables* *except when you are JOINing on another table for almost every request, and it’s getting costly, and you have a high volume of reads to writes for this data ! How people build software 55
  • 56.Recap ! • Rule #1: Any columns involved in queries should be covered by an index • *but avoid redundant and unused indexes • *sometimes an index prefix is needed and sufficient • Rule #2: Use an OR to return records satisfying one or more of several conditions • *but a UNION might be necessary to exploit the indexes • Rule #3: If there’s an index over all the fields in your query, you’re all set • *if MySQL is making the wrong choice, help it make the right one • Rule #4: Avoid redundant data across tables • *except when reads are slow, read/write ratio is high, you can denormalize ! How people build software 56
  • 57.Takeaways ! • Index, but don’t over index • Tune your query to exploit the indexes that give you the best results • There are tools to help you • explain queries • examine existing columns and indexes You can “do everything right” and still have performance problems • ! How people build software 57
  • 58.! come work with us! bit.ly/platform-data Thanks! " How people build software