MongoDB 内核开发工程师周思远——MongoDB和Hadoop 完美大数据方案
2020-02-27 58浏览
- 1.MongoDB and Hadoop Siyuan Zhou Software Engineer, MongoDB
- 2.Agenda • Complementary Approaches to Data • MongoDB & Hadoop Use Cases • MongoDB Connector Overview and Features • Examples
- 3.Complementary Approaches to Data
- 4.Operational:MongoDB Real-Time Product/Ass Analytics et Catalogs Churn Analysis Recommen der Security & Fraud Internet of Things Warehouse & ETL Risk Modeling Mobile Apps Customer Data Mgmt Trade Surveillanc e Predictive Analytics Single View Social Ad Targeting Sentiment Analysis
- 5.MongoDB • Fast storage and retrieval • Easy administration • Built-in analytical tools – Aggregation framework – JavaScript MapReduce – Geo/text indexes
- 6.Analytical:Hadoop Real-Time Product/Ass Analytics et Catalogs Churn Analysis Recommen der Security & Fraud Internet of Things Warehouse & ETL Risk Modeling Mobile Apps Customer Data Mgmt Trade Surveillanc e Predictive Analytics Single View Social Ad Targeting Sentiment Analysis
- 7.Hadoop The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. • Terabyte and Petabyte datasets • Data warehousing • Advanced analytics • Ecosystem
- 8.Operational vs.Analytical:Lifecycle Real-Time Product/Ass Analytics et Catalogs Churn Analysis Recommen der Security & Fraud Internet of Things Warehouse & ETL Risk Modeling Mobile Apps Customer Data Mgmt Trade Surveillanc e Predictive Analytics Single View Social Ad Targeting Sentiment Analysis
- 9.Management & Monitoring Security & Auditing Enterprise IT Stack Applicatio CRM, ERP, Collabnorsation, Mobile, BI Data Management Operatio nal RDBMS RDBMS Analyti cal EDW Infrastruct OS & Virtualizatiounr, eCompute, Storage, Network
- 10.MongoDB & Hadoop Use Cases
- 11.Commerce Applicati ons powered by Analysis powered by Hadoop Connector • Products & Inventory • Recommended products • Customer profile • Session management • Elastic pricing • Recommendation models • Predictive analytics • Clickstream history
- 12.Insurance Applicati ons powered by • Customer profiles • Insurance policies • Session data • Call center data Analysis powered by Hadoop Connector • Customer action analysis • Churn analysis • Churn prediction • Policy rates
- 13.Fraud Detection Payments Online payments processing quer y only Fraud Detection quer y only MongoDB Connector for Hadoop Results Cache Nightly Analysis Fraud modeling 3rd Party Data Sources
- 14.MongoDB Connector for Hadoop
- 15.Connector Overview Hadoop Map Reduce, Hive, Pig, Spark HDFS / S3 Text Files Hadoop Connector BSON Files Hadoop Connector MongoDB Single Node, Replica Set, Cluster Apache Hadoop / Cloudera CDH / Hortonworks HDP / Amazon EMR
- 16.Data Movement Dynamic queries to MongoDB vs. BSON snapshots in HDFS • Dynamic queries with latest data • Puts load on operational database • Snapshots move load to Hadoop • Add predictable load to MongoDB
- 17.Connector Features and Functionality • Computes splits to read data – Single Node, Replica Sets, Sharded Clusters • Mappings for Pig and Hive – MongoDB as a standard data source/destination • Support for – Filtering data with MongoDB queries – Authentication – Reading from Replica Set tags – Appending to existing collections
- 18.Data Split • Standalone and replica set – Split data into chunks • Cluster – Unsharded, same as standalone – Sharded, split per chunk – Sharded, split per shard • BSON Files – .split file stores metadata
- 19.MapReduce Configuration • MongoDB input – m ongo.job.input.form at = com .hadoop.M ongoInputForm at – m ongo.input.uri= mongodb://m'>ongodb://m