基于Hadoop Map/Reduce分布式计算能力,快速为海量数据构建ElasticSearch索引
基于Hadoop Map/Reduce分布式计算能力,快速为海量数据构建ElasticSearch索引
Java 其它杂项
共158Star
详细介绍
Introduction
The ES-Fastloader uses the fault tolerance and parallelism of Hadoop and builds individual ElasticSearch shards in multiple reducer nodes, then transfers shards to ElasticSearch cluster for serving. The loader will create a Hadoop job to read data from data files in HDFS, repartitions it on a per-node basis, and finally writes the generated indices to ES shards. In DiDi we have been using ES-Fastloader to create large-scale ElasticSearch indices from TB/PB level sequence files in Hive.
Features
- Supports batch construction of ES indexes, which can quickly process dozens of terabytes of data in 1-2 hours, and solve the low-efficiency problem when building massive ES index files.
- Support the horizontal expansion of computing power, and facilitate the expansion. By increasing the machine resources, you can further increase the index construction speed and the amount of data processed.
Requirements
- JDK: 8 or greater
- ElasticSearch: 6.6.X or greater
Developer guide
- API document wiki
- Develop document wiki
- Read core library source code
- Read main class
- Read Release notes
Contributing
Welcome to contribute by creating issues or sending pull requests. See Contributing Guide for guidelines.
Who is using ES-Fastloader?
License
ES-Fastloader is licensed under the Apache License 2.0. See the LICENSE file.