csv w12 ephemeral devops adventures in managing short lived systems

2020-03-01 56浏览

  • 1.#RSAC SESSIONID:CSV-W12 EPHEMERALDEVOPS:ADVENTURES IN MANAGING SHORT-LIVED SYSTEMS Todd Carr DevOps Engineer Unity Technologies @frozenfoxx
  • 2.Who am I? #RSAC DevOps Engineer at Unity Technologies Security Enthusiast Enormous fan of config managementGithub:Keybase:Twitter:frozenfoxx frozenfoxx @frozenfoxx 2
  • 3.#RSAC WHAT ARE EPHEMERAL SYSTEMS
  • 4.What are Ephemeral Systems? #RSAC Short-lived 4
  • 5.What are Ephemeral Systems? #RSAC Short-lived Light, middle, or heavyweight VMs 5
  • 6.What are Ephemeral Systems? #RSAC Short-lived Light, middle, or heavyweight VMs Dynamically deployed 6
  • 7.What are Ephemeral Systems? #RSAC Short-lived Light, middle, or heavyweight VMs Dynamically deployed Dynamically configured 7
  • 8.What are Ephemeral Systems? #RSAC Short-lived Light, middle, or heavyweight VMs Dynamically deployed Dynamically configured Dynamically destroyed 8
  • 9.What are Ephemeral Systems? #RSAC Short-lived Light, middle, or heavyweight VMs Dynamically deployed Dynamically configured Dynamically destroyed Usually heterogeneous 9
  • 10.What did I build? #RSAC Create and destroy about 600~1,000 heavyweight virtual machines an hour Most of those run extremely CPU and disk intensive operations Updating existing and new VM configurations takes seconds Upgrades can be rolled out or rolled back in production extremely quickly Small team (three people) maintains it Bootstrapped with vSphere + Puppet 10
  • 11.#RSAC WHY EPHEMERAL SYSTEMS?
  • 12.Why Ephemeral Systems? #RSAC Multiple immediately-available VMs 12
  • 13.Why Ephemeral Systems? #RSAC Multiple immediately-available VMs Non-containerized applications Desktop apps Legacy apps Complex VMs 13
  • 14.Why Ephemeral Systems? #RSAC Multiple immediately-available VMs Non-containerized applications Desktop apps Legacy apps Complex VMs Heterogeneous target pools Multiple OSes Multiple configurations Multiple patch targets Lots of iterative testing 14
  • 15.Why Ephemeral Systems? #RSAC Multiple immediately-available VMs Non-containerized applications Desktop apps Legacy apps Complex VMs Heterogeneous target pools Multiple OSes Multiple configurations Multiple patch targets Lots of iterative testing Existing infrastructure New flexibility without breaking anything Doesn’t require buying new hardware 15
  • 16.Why Ephemeral Systems? #RSAC Testing Rapid, immediate feedback with new code 16
  • 17.Why Ephemeral Systems? #RSAC Testing Rapid, immediate feedback with new code Experimenting Rapidly deploy on-the-fly changes 17
  • 18.Why Ephemeral Systems? #RSAC Testing Rapid, immediate feedback with new code Experimenting Rapidly deploy on-the-fly changes Simulating Fully leverage dynamic environment configuration management tools r10k (Puppet) grinder (Salt) 18
  • 19.Why Ephemeral Systems? #RSAC Testing Rapid, immediate feedback with new code Experimenting Rapidly deploy on-the-fly changes Simulating Fully leverage dynamic environment configuration management tools r10k (Puppet) grinder (Salt) Parallelization Building Testing 19
  • 20.Why Ephemeral Systems? #RSAC Testing Rapid, immediate feedback with new code Experimenting Rapidly deploy on-the-fly changes Simulating Fully leverage dynamic environment configuration management tools r10k (Puppet) grinder (Salt) Parallelization Building Testing Don’t have budget for new data centers or administrators 20
  • 21.Why Ephemeral Systems in Security? #RSAC Exploit Development Write a revision, grab a target from multiple different pools of targets, destroy when done! Make a pool for every target Hook the grab, use, and destroy VM loop for every test script 21
  • 22.Why Ephemeral Systems in Security? #RSAC Clean Slate Experimentation Rapidly deploy on-the-fly changes Simply call the API to destroy a machine at the conclusion of every test New machines for every run No more restore from snapshot 22
  • 23.Why Ephemeral Systems in Security? #RSAC Dynamic Behavior Simulate changes in active installations Simply commit a change to a Hiera data file, run Puppet Need something even more dynamic? Make a Puppet Environment branch, deploy, and run the same machine against both branches No need to manually modify machines, all are still built from the same template 23
  • 24.Why Ephemeral Systems in Security? #RSAC Narrowed Attack Window Non-containerized applications tend to stick around a long time Complex VM requirements Non-Linux OSes Specific patch levels Custom software installations Treat these VMs as containers Create, use, destroy, loop, all via API 24
  • 25.Why Ephemeral Systems in Security? #RSAC Information Isolation No more wiping machines or rolling back to snapshots and hoping nothing is left on disk Grab a VM, use it, and dump it When the old one is destroyed it takes its environment with it, ensuring no disk recovery within the VM 25
  • 26.#RSAC TOOLS
  • 27.Tools #RSAC vSphere VMs 27
  • 28.Tools #RSAC vSphere VMs Puppet 4 (https://puppet.com/)Agent, Server, PuppetDB r10k 28
  • 29.Tools #RSAC vSphere VMs Puppet 4 (https://puppet.com/)Agent, Server, PuppetDB r10k VmPooler (https://github.com/puppetlabs/vmpooler)29
  • 30.Tools #RSAC vSphere VMs, VM parameters Puppet 4 (https://puppet.com/)Agent, Server, PuppetDB r10k VmPooler (https://github.com/puppetlabs/vmpooler)Redis BIND ISC-DHCP-Server Dynamic DNS Updates from DHCP Server rbvmomi 30
  • 31.Tools #RSAC vSphere VMs, VM parameters Puppet 4 (https://puppet.com/)Agent, Server, PuppetDB r10k VmPooler (https://github.com/puppetlabs/vmpooler)Redis BIND ISC-DHCP-Server Dynamic DNS Updates from DHCP Server rbvmomi Coffee 31
  • 32.#RSAC BUILD
  • 33.Build:Concepts #RSAC Pools 33
  • 34.Build:Concepts #RSAC Pools Self configuration Puppet Hiera VMware GuestInfo Variables (hostname, pool, DNS, etc) 34
  • 35.Build:Concepts #RSAC Pools Self configuration Puppet Hiera VMware GuestInfo Variables (hostname, pool, DNS, etc) Cleanup scripts 35
  • 36.Build:Flow #RSAC 36
  • 37.Build:Flow #RSAC 37
  • 38.Build:Support #RSAC Puppet Autosigner (https://github.com/frozenfoxx/util/blob/master/puppet/puppetautosign)Certificate cleanup — Remove old & dead node certs, reinventory —https://github.com/frozenfoxx/util/blob/master/puppet/puppet-reapNodes cleaning script — Reports, facts, nodes —https://github.com/frozenfoxx/util/blob/master/puppet/puppet-cleanup-nodes38
  • 39.Build:Support #RSAC Puppet Autosigner (https://github.com/frozenfoxx/util/blob/master/puppet/puppetautosign)Certificate cleanup — — Remove old & dead node certs, reinventoryhttps://github.com/frozenfoxx/util/blob/master/puppet/puppet-reap— — Reports, facts, nodeshttps://github.com/frozenfoxx/util/blob/master/puppet/puppet-cleanup-nodesNodes cleaning script VmPooler Logrotate for vmpooler.log Install provided init script 39
  • 40.Build:Support #RSAC Puppet Autosigner (https://github.com/frozenfoxx/util/blob/master/puppet/puppet-autosign)Certificate cleanup — Remove old & dead node certs, reinventoryhttps://github.com/frozenfoxx/util/blob/master/puppet/puppet-reap— — Reports, facts, nodeshttps://github.com/frozenfoxx/util/blob/master/puppet/puppet-cleanup-nodes— Nodes cleaning script VmPooler Logrotate for vmpooler.log Install provided init script vSphere Ramdisk cleaner 40
  • 41.Build:Monitoring #RSAC Pools empty PuppetServer, PuppetDB down Full disk Too many files in a dir to remove Certificates BIND/DHCP issues Logging can get massive Weird vSphere things Ramdisk fills up from creating/destroying VMs 41
  • 42.#RSAC PERFORMANCE
  • 43.Performance #RSAC PuppetServer holds up well 4 Cores, 16GB RAM, Linux Around 600~1,000 VMs per hour Loadavg:3.0 ~ 5.0 Creating certs, deleting certs, signing certs, compiling catalogs 43
  • 44.Performance #RSAC PuppetServer holds up well 4 Cores, 16GB RAM, Linux Around 600~1,000 VMs per hour Loadavg:3.0 ~ 5.0 Creating certs, deleting certs, signing certs, compiling catalogs PuppetDB hold up extremely well Not even phased by this usage, very low load 44
  • 45.Performance #RSAC PuppetServer holds up well 4 Cores, 16GB RAM, Linux Around 600~1,000 VMs per hour Loadavg:3.0 ~ 5.0 Creating certs, deleting certs, signing certs, compiling catalogs PuppetDB hold up extremely well Not even phased by this usage, very low load vSphere holds up okay Linked clones are instantaneous (!) vSphere VM itself may fall over, taking the API with it Needs restarting every six to nine months, YMMV 45
  • 46.Performance #RSAC PuppetServer holds up well 4 Cores, 16GB RAM, Linux Around 600~1,000 VMs per hour Loadavg:3.0 ~ 5.0 Creating certs, deleting certs, signing certs, compiling catalogs PuppetDB hold up extremely well Not even phased by this usage, very low load vSphere holds up okay Linked clones are instantaneous (!) vSphere VM itself may fall over, taking the API with it Needs restarting every six to nine months, YMMV DHCP/BIND holds up okay...mostly Once a year or so stops adding/removing, just restart 46
  • 47.#RSAC USAGE
  • 48.Usage:General #RSAC Get a box curl -d --urlvmpooler.somewhere.com:4567/api/v1/vm/[vm-type]Checks out a box, [box hostname] Use that box All done? Dump the box curl -X DELETE --urlvmpooler.somewhere.com:4567/api/v1/vm/[boxhostname] Loop 48
  • 49.Usage:Parallel Testing Batches #RSAC Array of tests Get boxes Loop over retrieval for array of boxes curl -d --urlvmpooler.somewhere.com:4567/api/v1/vm/[vm-type]Run block of tests against array of boxes All done? Dump the boxes Loop over array of boxes curl -X DELETE --urlvmpooler.somewhere.com:4567/api/v1/vm/[boxhostname] Loop 49
  • 50.Usage:Dynamic Environments #RSAC New Puppet branch, need to test Get a box curl -d --urlvmpooler.somewhere.com:4567/api/v1/vm/[vm-type]Checks out a box, [box hostname] SSH to that box Let’s config that box Normalmode:puppet agent --test Newfeature:puppet agent --test --environment [featurebranch] All done? Dump the box curl -X DELETE --urlvmpooler.somewhere.com:4567/api/v1/vm/[boxhostname] Loop Merge Puppet branch 50
  • 51.Usage:Dynamic App Behavior #RSAC Make a new Puppet environment, [newbehavior] Users, configs, whatever needs to be simulated in Hiera and Manifests Deploy with r10k Get a box curl -d --urlvmpooler.somewhere.com:4567/api/v1/vm/[vm-type]SSH to that box, alter the app behavior Normalbehavior:'>behavior: