华为大数据开源战略部部长陈亮 - Apache CarbonData,实现大数据即席查询秒级响应

2020-02-27 1549浏览

  • 1.实现大数据即席查询秒级响应
  • 2.Liang Chen / 陈 亮 华为大数据开源开发部Leader Apache CarbonData PMC & CommitterEmail:chenliang613@apache.org10多年大数据和BI项目开发和实践经验,对大 数据开源技术(Hadoop,Spark,CarbonData等) 有深入理解.
  • 3.⼤数据现在和未来将深刻的改变运营商 网络增效 网络性能管理与SQM策略保障 快速决策与根因分析定位 网络问题与规划 客户关怀和CEM 360°C客户洞察 客户忠诚度维系 客户关怀与流程优化 市场分析 实时营销与推荐 客户精细分群与个性化推荐 预测与影响力分析 数据货币化 数据变现 OTT开放竞合 M2M和位置分析 ONT Smarter SoftCom 业务和运营的智能融合 2 CloudDSL/OL T MxU ADSL VDSL G.Fast Cloud OS/OpenStack OM Team Biz Customer Consumer Partners 67 4 Operations Big Data Suits apps for OM apps for Biz apps for consumer 8 API E2E ICT Resource Orchestration Engine OSS suits Big Data Suits BSS suits E2E ICT Resource Orchestration Engine RaaS NaaS OpenStack PCRF CaaS Cloud OS/OpenStack(Local Resource) + Middleware MxU CPE RRU Small Cell Small Cell RRU SDN SDN D 以太+OTN (Metro) D D 3 CloudBB 1 CloudEdg e RNC SRC SD N Controller BRAS S/ P GW GGSN FW DPI vCPE SBC NAT IT apps PaaS Cloud OS/OpenStack (Local Resource, IaaS) D Router + WDM (Backbone) DD 5 Apps & Services SD N SGSN controller MME Telco apps IT apps IMS SMS/IPTV… (SaaS) HSS Middleware (PaaS) Cloud OS/OpenStack (Local Resource, IaaS) GSM UMTS LTE Cloud OS/OpenStack 1 SDN实时大象流挖掘 2 IPRAN流量仿真 3 SON 网络自动实时优化 4 快速故障关联处理 5 小区拥塞动态控制 6 潜在离网用户维挽 7 一站式服务优化 8 开放变现
  • 4.How to choose storage for complex big data requirements?
  • 5.NoSQL Database • Key-Valuestore:low latency, <5ms • Can not support multi-dimension query
  • 6.Multi-dimensional problem • Pre-compute all aggregation combinations •Complexity:O(2^n) • Dimension < 10 • Too much space • Slow loading speed
  • 7.Shared nothing database • Parallel scan + distributed compute • Questionable scalability and fault-tolerance • Cluster size < 100 data node • Not suitable for big batch job • Can not integrate with Hadoop ecosystem
  • 8.Search engine • All column indexed • Fast searching • Simple aggregation • Designed for search but not OLAP • complexcomputation:TopN, join, multi-level aggregation • No SQL support
  • 9.SQL on Hadoop • Modern distributed architecture, scale well in computation. • Pipelinebased:'>based: