bigd : 用于从网页抓取文件的命令行工具
bigd : 用于从网页抓取文件的命令行工具,并发文件下载程序
C/C++ CLI命令行界面
共35Star
详细介绍
bigd : a command-line tool for scraping files from a webpage.
Usage examples are based on the following options:
Allowed options:
-h [ --help ] produce help message
-u [ --url ] arg page to download from
-t [ --type ] arg type of file to download
-m [ --match ] arg wildcard pattern (supersedes type)
-n [ --threads ] arg (=10) number of files to simultaneously download
-d [ --depth ] arg (=0) recursive depth
-a [ --download-archive ] arg archive file path
-f [ --folder ] arg (=./) folder of where to download content to
For example, to concurrently download files of type mp3
from a given url:
./bigd --url <URL> --type mp3
OR, note, one can provide a wildcard which has the effect of superseding the --type
parameter:
./bigd --url <URL> --match "*.mp3"
Note:
This tool works by first scraping the given URL for href
links. If a link matches a given type of content to download, it downloads it, otherwise it tries to recurse into it (provided the --depth
flag is set). Because of this, bigd works best with apache-style directory listings, and webpages with direct links to the 'type' of content that one wishes to scrape. There is no DOM emulation of a browser and no javascript emulation.
Usage notes:
- Multiple file types can be specified with a multiplicity of
--type
(e.g.-t mp3 -t jpg
etc.) - Note also that a wildcard pattern, specified with
--match
can be provided instead of--type
and also has the effect of superseding the latter (for example-m "*.jpg"
etc.). - Unless specified using the
--folder
flag, all content is downloaded to the current working directory. - A threadpool is used to concurrently scrape content (so should prove quicker than tools like wget).
- The default threading value results in simultaneous downloading of 10 files. This can be overridden via the
--threads
flag. - An optional history of downloaded content (a 'download archive') will be written to a file when specified by the
--download-archive
flag. - The download archive is also used to ensure that bigd doesn't attempt to re-download content already downloaded.
- Recursive downloading is supported with the
--depth
argument but is disabled by default (a depth of zero).
Building
Using Homebrew is the most straight-forward. Add tap and install:
brew tap benhj/bigd
brew install bigd
Or use cmake:
mkdir build
cd build
cmake ..
make
make install
Or compile directly
clang++ -std=c++11 bigd.cpp -lcurl -lboost_program_options -lboost_filesystem -lboost_system -o bigd
Contributing
Please create an issue if you find a bug / have an enhancement request / follow the usual fork, pull request methodology.
License
Adheres to The Hacky As Fuck Software License.