京东 吴国晓 Spark With Cloud Native JVM Profiling

2020-03-01 194浏览

  • 1.
  • 2.About Me • • wuguoxiao@jd.com • @ • • Spark • 15 & TPS
  • 3.Agenda • Apache Spark at JD • JVM • Cloud Native JVM Profiling • •
  • 4.Apache Spark at • 10W+ Spark workload • 10W+ Hive workload Spark x 10000 Spark App Scale 12 10 8 6 4 2 0 2018 Q1 2018 Q2 2018 Q3 Application 2018 Q4 2019 Q1
  • 5.Agenda • Apache Spark at JD • JVM • Cloud Native JVM Profiling • •
  • 6.JVM • JVM • • • JVM • • JVM
  • 7.JVM • Hight CPU / Hot Method ü ü ü
  • 8.JVM • Memory Leak ü ü ü Live Set ü Used Heap GC Pause
  • 9.JVM • Allocations ü ü ü Allocations
  • 10.Agenda • Apache Spark at • JVM • Cloud Native JVM Profiling • •
  • 11.Cloud Native JVM Profiling Spark Runtime Executor Program Driver Program JD-JVM javaagent JD-JVM javaagent Troubleshooting Service JVM Profiling Web
  • 12.Cloud Native JVM Profiling Kubernetes Profiling Service K8s API Server Troubleshooting Service JVM Profiling Web Pod JD-JVM
  • 13.Agenda • Apache Spark at JD • JVM • Cloud Native JVM Profiling • •
  • 14.– JDK Flight Recorder • • •
  • 15.– JDK Flight Recorder • • 7u4 Oracle JDK • • JVM 3% • • Ad-hoc Always-running “after-the-fact”
  • 16.– JDK Flight Recorder • Application • • JVM Internal • OS
  • 17.– JDK Flight Recorder • High JVM CPU Load • Method Profiling • Free Memory Application • GC Pressure • File I/O JVM Internal • Socket I/O • Thrown Exceptions/Errors OS
  • 18.– JDK Flight Recorder • Fatal Error • GC Pauses Application • Metaspace Live Set Trend • TLAB Allocation Ratio JVM Internal OS
  • 19.– JDK Flight Recorder • Competing CPU Usage • Competing Processes Application • Passwords in Environment • Passwords in Properties JVM Internal OS
  • 20.– Container Awareness • • Backport to JD-JDK • Container.metrics() • thread thread gc jit compile
  • 21.– AppCDS 0.45 0.4 • • • • Spark SQL workload startup footprint container /day 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0 startup Nornal With AppCDS
  • 22.Kubernetes Spark Fink Storm K8s API Server Profiling Service Binlog Sync
  • 23.Agenda • Apache Spark at JD • JVM • Cloud Native JVM Profiling • •
  • 24.– Hot Method • Driver Hang6 + • • Spark • JVM Profiling
  • 25.– Hot Method • • •
  • 26.– Hot Method • • • LZO
  • 27.– Allocations • Spark Job Failed • Mem • Container Killed • RSS •
  • 28.– Allocations • • • map outputs Reduce Task Buffer 2 3
  • 29.– Native Memory Tracking Task • kill • NMT • Could Native JVM Profiling NMT
  • 30.– Native Memory Tracking • NMT -XX:NativeMemoryTracking=[summarydetail] • • jcmdVM.native_memory baseline jcmdVM.native_memory detail.diff • Spark • javaagent / NMT
  • 31.– Native Memory Tracking Native MemoryTracking:Total:reserved=28483532KB +9683KB, committed=27185624KB +9683KB <--- total memory changes vs. earlier baseline - Java Heap (reserved=25165824KB, committed=25165824KB) (mmap:'>mmap: