MongoDB 内核开发工程师周思远——MongoDB和Hadoop 完美大数据方案

2020-02-27 58浏览

  • 1.MongoDB and Hadoop Siyuan Zhou Software Engineer, MongoDB
  • 2.Agenda • Complementary Approaches to Data • MongoDB & Hadoop Use Cases • MongoDB Connector Overview and Features • Examples
  • 3.Complementary Approaches to Data
  • 4.Operational:MongoDB Real-Time Product/Ass Analytics et Catalogs Churn Analysis Recommen der Security & Fraud Internet of Things Warehouse & ETL Risk Modeling Mobile Apps Customer Data Mgmt Trade Surveillanc e Predictive Analytics Single View Social Ad Targeting Sentiment Analysis
  • 5.MongoDB • Fast storage and retrieval • Easy administration • Built-in analytical tools – Aggregation framework – JavaScript MapReduce – Geo/text indexes
  • 6.Analytical:Hadoop Real-Time Product/Ass Analytics et Catalogs Churn Analysis Recommen der Security & Fraud Internet of Things Warehouse & ETL Risk Modeling Mobile Apps Customer Data Mgmt Trade Surveillanc e Predictive Analytics Single View Social Ad Targeting Sentiment Analysis
  • 7.Hadoop The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. • Terabyte and Petabyte datasets • Data warehousing • Advanced analytics • Ecosystem
  • 8.Operational vs.Analytical:Lifecycle Real-Time Product/Ass Analytics et Catalogs Churn Analysis Recommen der Security & Fraud Internet of Things Warehouse & ETL Risk Modeling Mobile Apps Customer Data Mgmt Trade Surveillanc e Predictive Analytics Single View Social Ad Targeting Sentiment Analysis
  • 9.Management & Monitoring Security & Auditing Enterprise IT Stack Applicatio CRM, ERP, Collabnorsation, Mobile, BI Data Management Operatio nal RDBMS RDBMS Analyti cal EDW Infrastruct OS & Virtualizatiounr, eCompute, Storage, Network
  • 10.MongoDB & Hadoop Use Cases
  • 11.Commerce Applicati ons powered by Analysis powered by Hadoop Connector • Products & Inventory • Recommended products • Customer profile • Session management • Elastic pricing • Recommendation models • Predictive analytics • Clickstream history
  • 12.Insurance Applicati ons powered by • Customer profiles • Insurance policies • Session data • Call center data Analysis powered by Hadoop Connector • Customer action analysis • Churn analysis • Churn prediction • Policy rates
  • 13.Fraud Detection Payments Online payments processing quer y only Fraud Detection quer y only MongoDB Connector for Hadoop Results Cache Nightly Analysis Fraud modeling 3rd Party Data Sources
  • 14.MongoDB Connector for Hadoop
  • 15.Connector Overview Hadoop Map Reduce, Hive, Pig, Spark HDFS / S3 Text Files Hadoop Connector BSON Files Hadoop Connector MongoDB Single Node, Replica Set, Cluster Apache Hadoop / Cloudera CDH / Hortonworks HDP / Amazon EMR
  • 16.Data Movement Dynamic queries to MongoDB vs. BSON snapshots in HDFS • Dynamic queries with latest data • Puts load on operational database • Snapshots move load to Hadoop • Add predictable load to MongoDB
  • 17.Connector Features and Functionality • Computes splits to read data – Single Node, Replica Sets, Sharded Clusters • Mappings for Pig and Hive – MongoDB as a standard data source/destination • Support for – Filtering data with MongoDB queries – Authentication – Reading from Replica Set tags – Appending to existing collections
  • 18.Data Split • Standalone and replica set – Split data into chunks • Cluster – Unsharded, same as standalone – Sharded, split per chunk – Sharded, split per shard • BSON Files – .split file stores metadata
  • 19.MapReduce Configuration • MongoDB input – m ongo.job.input.form at = com .hadoop.M ongoInputForm at – m ongo.input.uri= mongodb://m'>ongodb://m