Diskover – File System Crawler, Storage Search Engine And Analytics Powered By Elasticsearch

Diskover – File System Crawler, Storage Search Engine And Analytics Powered By Elasticsearch

diskover is an open supply file system crawler and disk utilization software program that makes use of Elasticsearch to index and handle knowledge throughout heterogeneous storage programs. Using diskover, you’ll be able to extra successfully search and manage recordsdata and system directors are in a position to handle storage infrastructure, effectively provision storage, monitor and report on storage use, and successfully make choices about new infrastructure purchases.

As the quantity of file knowledge generated by enterprise’ continues to develop, the stress on costly storage infrastructure, customers and system directors, and IT budgets continues to develop.

Using diskover, customers can determine previous and unused recordsdata and provides higher insights into knowledge change, file duplication and wasted area.

diskover is written and maintained by Chris Park (shirosai) and runs on Linux and OS X/macOS utilizing Python 2/3.

diskover-web (diskover’s net file supervisor, analytics app, file system search engine, relaxation-api)

Kibana dashboards/saved searches/visualizations and help for Gource

Diskover Gource movies

Installation Guide


  • Linux or OS X/macOS (examined on OS X 10.11.6, Ubuntu 16.04)
  • Python 2.7. or Python 3.5./3.6. (examined on Python 2.7.14, 3.5.3, 3.6.4)
  • Python elasticsearch consumer module
  • Python requests module
  • Python scandir module
  • Python progressbar2 module
  • Python redis module
  • Python rq module
  • Elasticsearch 5 (native or AWS ES Service, examined on Elasticsearch 5.4.2, 5.6.4) Elasticsearch 6 isn’t supported but.
  • Redis (examined on 4.0.8)

Install the above Python modules utilizing pip.

Optional Installs

  • diskover-web (diskover’s net file supervisor and analytics app)
  • Redis RQ Dashboard (for monitoring redis queue)
  • sharesniffer (for scanning your community for file shares and auto-mounting for crawls)
  • Kibana (for visualizing Elasticsearch knowledge, examined on Kibana 5.4.2, 5.6.4)
  • X-Pack (Kibana plugin for graphs, studies, monitoring and http auth)
  • Gource (for Gource visualizations of diskover Elasticsearch knowledge, see movies above)


$ git clone https://github.com/shirosaidev/diskover.git
$ cd diskover

Download latest version

You must have no less than Python 2.7. or Python 3.5. and have put in required Python dependencies utilizing pip.

$ pip set up -r necessities.txt

Getting Started
Copy diskover config diskover.cfg.pattern to diskover.cfg and edit in your surroundings.
Start diskover employee bots (as many as you need, an excellent quantity is likely to be cores x 2) with:

$ cd /path/with/diskover
$ python diskover_worker_bot.py

Worker bots could be added throughout a crawl to assist with the queue. To run a employee bot in burst mode (give up in any case jobs accomplished), use the -b flag. If the queue is empty these bots will die, so use rq data or rq-dashboard to see if they’re operating. Run diskover-bot-launcher.sh to spawn and kill a number of bots.
Start diskover important job dispatcher and file tree crawler with:

$ python /path/to/diskover.py -d /rootpath/you/need/to/crawl -i diskover-indexname -a

Defaults for crawl with no flags is to index from . (present listing) and recordsdata >0 Bytes and Zero days modified time. Empty recordsdata and directores are skipped (except you utilize -s Zero and -e flags). Use -h to see cli choices.

User Guide
Read the wiki for extra documentation on methods to use diskover.




Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.