Strelka – Scanning Files At Scale With Python And ZeroMQ

0
7
Strelka - Scanning Files At Scale With Python And ZeroMQ

Strelka is an actual-time file scanning system used for risk looking, risk detection, and incident response. Based on the design established by Lockheed Martin’s Laika BOSS and comparable tasks (see: related projects), Strelka’s goal is to carry out file extraction and metadata assortment at enormous scale.
Strelka differs from its sibling tasks in just a few vital methods:

  • Codebase is Python 3 (minimal supported model is 3.6)
  • Designed for non-interactive, distributed methods (community safety monitoring sensors, reside response scripts, disk/reminiscence extraction, and so forth.)
  • Supports direct and distant file requests (Amazon S3, Google Cloud Storage, and so forth.) with elective encryption and authentication
  • Uses broadly supported networking, messaging, and knowledge libraries/codecs (ZeroMQ, protocol buffers, YAML, JSON)
  • Built-in scan consequence logging and log administration (suitable with Filebeat/ElasticStack, Splunk, and so forth.)


Frequently Asked Questions

“Who is Strelka?”
Strelka is without doubt one of the second technology Soviet house canine to attain orbital spaceflight — the title is an homage to Lockheed Martin’s Laika BOSS, one of many first public tasks of this sort and from which Strelka’s core design is predicated.

“Why would I want a file scanning system?”
File metadata is a further pillar of information (alongside community, endpoint, authentication, and cloud) that’s efficient in enabling risk looking, risk detection, and incident response and will help occasion analysts and incident responders bridge visibility gaps of their surroundings. This kind of system is particularly helpful for figuring out risk actors throughout KC3 and KC7. For examples of what Strelka can do, please learn the use cases.

“Should I switch from my current file scanning system to Strelka?”
It relies upon — we advocate reviewing the options of every and selecting probably the most applicable instrument on your wants. We consider probably the most vital motivating components for switching to Strelka are:

“Are Strelka’s scanners compatible with Laika BOSS, File Scanning Framework, or Assemblyline?”
Due to variations in design, Strelka’s scanners will not be instantly suitable with Laika BOSS, File Scanning Framework, or Assemblyline. With some effort, most scanners can probably be ported to the opposite tasks.

“Is Strelka an intrusion detection system (IDS)?”
Strelka should not be considered an IDS, however it may be used for risk detection via YARA rule matching and downstream metadata interpretation. Strelka’s design follows the philosophy established by different standard metadata assortment methods (Bro, Sysmon, Volatility, and so forth.): it extracts knowledge and leaves the choice-making as much as the person.

“Does it work at scale?”
Everyone has their very own definition of “at scale,” however we have now been utilizing Strelka and methods prefer it to scan as much as 100 million recordsdata every day for over a yr and have by no means reached a degree the place the system couldn’t scale to our wants — as file quantity and variety will increase, horizontally scaling the system ought to help you scan any variety of recordsdata.

“Doesn’t this use a lot of bandwidth?”
Yep! Strelka is not designed to function in restricted bandwidth environments, however we have now experimented with options to this and there are tips you should use to cut back bandwidth. These are what we have discovered most profitable:

  • Reduce the whole quantity of recordsdata despatched to Strelka
  • Use a monitoring system to solely ship distinctive recordsdata to Strelka (networked Redis servers are particularly helpful for this)
  • Use traffic control (tc) to form connections to Strelka

“Should I run my Strelka cluster on my Bro/Suricata network sensor?”
No! Strelka clusters run CPU-intensive processes that may negatively impression system-crucial purposes like Bro and Suricata. If you need to combine a community sensor with Strelka, then use strelka_dirstream.py. This utility is able to sending tens of millions of recordsdata per day from a single community sensor to a Strelka cluster with out impacting system-crucial purposes.

“I have other questions!”
Please file a problem or contact the venture staff at [email protected]. The venture lead can be reached on Twitter at @jshlbrd.

Installation
The beneficial working system for Strelka is Ubuntu 18.04 LTS (Bionic Beaver) — it could work with earlier variations of Ubuntu if the suitable packages are put in. We advocate utilizing the Docker container for manufacturing deployments and welcome pull requests that add directions for putting in on different working methods.

Ubuntu 18.04 LTS

  1. Update packages and set up construct packages
apt-get replace && apt-get set up --no-set up-recommends automake construct-important curl gcc git libtool make python3-dev python3-pip python3-wheel
  1. Install runtime packages
apt-get set up --no-set up-recommends antiword libarchive-dev libfuzzy-dev libimage-exiftool-perl libmagic-dev libssl-dev python3-setuptools tesseract-ocr unrar upx jq
  1. Install pip3 packages
    pip3 set up beautifulsoup4 boltons boto3 gevent google-cloud-storage html5lib inflection interruptingcow jsbeautifier libarchive-c lxml git+https://github.com/aaronst/macholibre.git olefile oletools pdfminer.six pefile pgpdump3 protobuf pyelftools pygments pyjsparser pylzma git+https://github.com/jshlbrd/pyopenssl.git python-docx git+https://github.com/jshlbrd/python-entropy.git python-keystoneclient python-magic python-swiftclient pyyaml pyzmq rarfile requests rpmfile schedule ssdeep tnefparse
  2. Install YARA
curl -OL https://github.com/VirusTotal/yara/archive/v3.8.1.tar.gz
tar -zxvf v3.8.1.tar.gz
cd yara-3.8.1/
./bootstrap.sh
./configure --with-crypto --enable-dotnet --enable-magic
make && make set up && make verify
echo "/usr/local/lib" >> /and so forth/ld.so.conf
ldconfig
  1. Install yara-python
curl -OL https://github.com/VirusTotal/yara-python/archive/v3.8.1.tar.gz  
tar -zxvf v3.8.1.tar.gz  
cd yara-python-3.8.1/  
python3 setup.py construct --dynamic-linking  
python3 setup.py set up
  1. Create Strelka directories
    mkdir /var/log/strelka/ && mkdir /decide/strelka/
  2. Clone this repository
    git clone https://github.com/goal/strelka.git /decide/strelka/
  3. Compile the Strelka protobuf
    cd /decide/strelka/server/ && protoc --python_out=. strelka.proto
  4. (Optional) Install the Strelka utilities
    cd /decide/strelka/ && python3 setup.py -q construct && python3 setup.py -q set up && python3 setup.py -q clear --all

Docker

  1. Clone this repository
    git clone https://github.com/goal/strelka.git /decide/strelka/
  2. Build the container
    cd /decide/strelka/ && docker construct -t strelka .

Quickstart
By default, Strelka is configured to make use of a minimal “quickstart” deployment that enables customers to check the system. This configuration isn’t beneficial for manufacturing deployments. Using two Terminal home windows, do the next:
Terminal 1

$ strelka.py

Terminal 2:

$ strelka_user_client.py --broker 127.0.0.1:5558 --path <path to the file to scan>
$ cat /var/log/strelka/*.log | jq .

Terminal 1 runs a Strelka cluster (dealer, Four staff, and log rotation) with debug logging and Terminal 2 is used to ship file requests to the cluster and browse the scan outcomes.

Deployment

Utilities
Strelka’s design as a distributed system creates the necessity for consumer-facet and server-facet utilities. Client-side utilities present strategies for sending file requests to a cluster and server-facet utilities present strategies for distributing and scanning recordsdata despatched to a cluster.

strelka.py
strelka.py is a non-interactive, server-facet utility that accommodates the whole lot wanted for operating a big-scale, distributed Strelka cluster. This consists of:

  • Capability to run servers in any mixture of dealer/staff
    • Broker distributes file duties to staff
    • Workers carry out file evaluation on duties
  • On-disk scan consequence logging
    • Configurable log rotation and administration
    • Compatible with exterior log shippers (e.g. Filebeat, Splunk Universal Forwarder, and so forth.)
  • Supports encryption and authentication for connections between purchasers and brokers
  • Self-healing little one processes (brokers, staff, log administration)

This utility is managed with two configuration recordsdata: etc/strelka/strelka.yml and etc/strelka/pylogging.ini.
The assist web page for strelka.py is proven beneath:

utilization: strelka.py [options]

runs Strelka as a distributed cluster.

elective arguments:
  -h, --help            present this assist message and exit
  -d, --debug           allow debug messages to the console
  -c STRELKA_CFG, --strelka-config STRELKA_CFG
                        path to strelka configuration file
  -l LOGGING_INI, --logging-ini LOGGING_INI
                        path to python logging configuration file

strelka_dirstream.py
strelka_dirstream.py is a non-interactive, consumer-facet utility used for sending recordsdata from a listing to a Strelka cluster in close to actual-time. This utility makes use of inotify to observe the listing and sends recordsdata to the cluster as quickly as attainable after they’re written.
Additionally, for choose file sources, this utility can parse metadata embedded within the file’s filename and ship it to the cluster as exterior metadata. Bro community sensors are presently the one supported file supply, however different software-particular sources will be added.
Using the utility with Bro requires no modification of the Bro supply code, nevertheless it does require the community sensor to run a Bro script that permits file extraction. We advocate utilizing our stub Bro script (and so forth/bro/extract-strelka.bro) to extract recordsdata. Other extraction scripts may even work, however they won’t parse Bro’s metadata.
This utility is managed with one configuration file: etc/dirstream/dirstream.yml.
The assist web page for strelka_dirstream.py is proven beneath:

utilization: strelka_dirstream.py [options]

sends recordsdata from a listing to a Strelka cluster in close to actual-time.

elective arguments:
  -h, --help            present this assist message and exit
  -d, --debug           allow debug messages to the console
  -c DIRSTREAM_CFG, --dirstream-config DIRSTREAM_CFG
                        path to dirstream configuration file

strelka_user_client.py
strelka_user_client.py is a person-pushed, consumer-facet utility that’s used for sending advert-hoc file requests to a cluster. This consumer ought to be used when file evaluation is required for a selected file or group of recordsdata — it’s explicitly designed for customers and shouldn’t be anticipated to carry out lengthy-lived or totally automated file requests. We advocate utilizing this utility for instance of what’s required in constructing new consumer utilities.
Using this utility, customers can ship three varieties of file requests:
The assist web page for strelka_user_client.py is proven beneath:

utilization: strelka_user_client.py [options]

sends advert-hoc file requests to a Strelka cluster.

elective arguments:
  -h, --help            present this assist message and exit
  -d, --debug           allow debug messages to the console
  -b BROKER, --broker BROKER
                        community tackle and community port of the dealer (e.g.
                        127.0.0.1:5558)
  -p PATH, --path PATH  path to the file or listing of recordsdata to ship to the
                        dealer
  -l LOCATION, --location LOCATION
                        JSON illustration of a location for the cluster to
                        retrieve recordsdata from
  -t TIMEOUT, --timeout TIMEOUT
                        period of time (in seconds) to attend till a file
                        switch occasions out
  -bpk BROKER_PUBLIC_KEY, --broker-public-key BROKER_PUBLIC_KEY
                        location of the dealer Curve public key certificates
                        (this selection allows curve encryption and have to be used
                        if the dealer has curve enabled)
  -csk CLIENT_SECRET_KEY, --client-secret-key CLIENT_SECRET_KEY
                        location of the consumer Curve secret key certificates
                        (this selection allows curve encryption and have to be used
                        if the dealer has curve enabled)
  -ug, --use-inexperienced      determines if PyZMQ inexperienced ought to be used, which may
                        enhance efficiency on the threat of message loss

generate_curve_certificates.py
generate_curve_certificates.py is a utility used for producing dealer and employee Curve certificates. This utility is required for establishing Curve encryption/authentication.
The assist web page for generate_curve_certificates.py is proven beneath:

utilization: generate_curve_certificates.py [options]

generates curve certificates utilized by brokers and purchasers.

elective arguments:
  -h, --help            present this assist message and exit
  -p PATH, --path PATH  path to retailer keys in (defaults to present working
                        listing)
  -b, --broker          generate curve certificates for a dealer
  -c, --client          generate curve certificates for a consumer
  -cf CLIENT_FILE, --client-file CLIENT_FILE
                        path to a file containing line-separated listing of
                        purchasers to generate keys for, helpful for creating many
                        consumer keys without delay

validate_yara.py
validate_yara.py is a utility used for recursively validating a listing of YARA guidelines recordsdata. This will be helpful when debugging points associated to the ScanYara scanner.
The assist web page for validate_yara.py is proven beneath:

utilization: validate_yara.py [options]

validates YARA guidelines recordsdata.

elective arguments:
  -h, --help            present this assist message and exit
  -p PATH, --path PATH  path to listing containing YARA guidelines
  -e, --error           boolean that determines if warnings ought to trigger
                        errors

Configuration Files
Strelka makes use of YAML for configuring consumer-facet and server-facet utilities. We advocate utilizing the default configurations and modifying the choices as wanted.

Strelka Configuration (strelka.py)
Strelka’s cluster configuration file is saved in and so forth/strelka/strelka.yml and accommodates three sections: daemon, distant, and scan.

Daemon Configuration
The daemon configuration accommodates 5 sub-sections: processes, community, dealer, staff, and logrotate.
The “processes” part controls the processes launched by the daemon. The configuration choices are:

  • “run_broker”: boolean that determines if the server ought to run a Strelka dealer course of (defaults to True)
  • “run_workers”: boolean that determines if the server ought to run Strelka employee processes (defaults to True)
  • “run_logrotate”: boolean that determines if the server ought to run a Strelka log rotation course of (defaults to True)
  • “worker_count”: variety of staff to spawn (defaults to 4)
  • “shutdown_timeout”: period of time (in seconds) that may elapse earlier than the daemon forcibly kills little one processes after they’ve acquired a shutdown command (defaults to 45 seconds)

The “network” part controls community connectivity. The configuration choices are:

  • “broker”: community tackle of the dealer (defaults to 127.0.0.1)
  • “request_socket_port”: community port utilized by purchasers to ship file requests to the dealer (defaults to 5558)
  • “task_socket_port”: community port utilized by staff to obtain duties from the dealer (defaults to 5559)

The “broker” part controls settings associated to the dealer course of. The configuration choices are:

  • “poller_timeout”: period of time (in milliseconds) that the dealer polls for consumer requests and employee statuses (defaults to 1000 milliseconds)
  • “broker_secret_key”: location of the dealer Curve secret key certificates (allows Curve encryption, requires purchasers to make use of Curve, defaults to None)
  • “client_public_keys”: location of the listing containing consumer Curve public key certificates (allows Curve encryption and authentication, requires purchasers to make use of Curve, defaults to None)
  • “prune_frequency”: frequency (in seconds) at which the dealer prunes useless staff (defaults to five seconds)
  • “prune_delta”: delta (in seconds) that should cross since a employee final checked in with the dealer earlier than it’s thought of useless and is pruned (defaults to 10 seconds)

The “workers” part controls settings associated to employee processes. The configuration choices are:

  • “task_socket_reconnect”: period of time (in milliseconds) that the duty socket will try and reconnect within the occasion of TCP disconnection, this may have further jitter utilized (defaults to 100ms plus jitter)
  • “task_socket_reconnect_max”: most period of time (in milliseconds) that the duty socket will try and reconnect within the occasion of TCP disconnection, this may have further jitter utilized (defaults to 4000ms plus jitter)
  • “poller_timeout”: period of time (in milliseconds) that staff ballot for file duties (defaults to 1000 milliseconds)
  • “file_max”: variety of recordsdata a employee will course of earlier than shutting down (defaults to 10000)
  • “time_to_live”: period of time (in minutes) {that a} employee will run earlier than shutting down (defaults to 30 minutes)
  • “heartbeat_frequency”: frequency (in seconds) at which a employee sends a heartbeat to the dealer if it has not acquired any file duties (defaults to 10 seconds)
  • “log_directory”: location the place employee scan outcomes are logged to (defaults to /var/log/strelka/)
  • “log_field_case”: area case (“camel” or “snake”) of the scan consequence log file knowledge (defaults to camel)
  • “log_bundle_events”: boolean that determines if scan outcomes ought to be bundled in single occasion as an array or in a number of occasions (defaults to True)

The “logrotate” part controls settings associated to the log rotation course of. The configuration choices are:

  • “directory”: listing to run log rotation on (defaults to /var/log/strelka/)
  • “compression_delta”: delta (in minutes) that should cross since a log file was final modified earlier than it’s compressed (defaults to 15 minutes)
  • “deletion_delta”: delta (in minutes) that should cross since a compressed log file was final modified earlier than it’s deleted (defaults to 360 minutes / 6 hours)

Remote Configuration
The distant configuration accommodates one sub-part: distant.
The “remote” part controls how staff retrieve recordsdata from distant file shops. Google Cloud Storage, Amazon S3, OpenStack Swift, and HTTP file shops are supported. All choices on this configuration file are optionally learn from surroundings variables if they’re “null”. The configuration choices are:

  • “remote_timeout”: period of time (in seconds) to attend earlier than timing out particular person file retrieval
  • “remote_retries”: variety of occasions particular person file retrieval will probably be re-tried within the occasion of a timeout
  • “google_application_credentials”: path to the Google Cloud Storage JSON credentials file
  • “aws_access_key_id”: AWS entry key ID
  • “aws_secret_access_key”: AWS secret entry key
  • “aws_default_region”: default AWS area
  • “st_auth_version”: OpenStack authentication model (defaults to three)
  • “os_auth_url”: OpenStack Keystone authentication URL
  • “os_username”: OpenStack username
  • “os_password”: OpenStack password
  • “os_cert”: OpenStack Keystone certificates
  • “os_cacert”: OpenStack Keystone CA Certificate
  • “os_user_domain_name”: OpenStack person area
  • “os_project_name”: OpenStack venture title
  • “os_project_domain_name”: OpenStack venture area
  • “http_basic_user”: HTTP Basic authentication username
  • “http_basic_pass”: HTTP Basic authentication password
  • “http_verify”: path to the CA bundle (file or listing) used for SSL verification (defaults to False, no verification)

Scan Configuration
The scan configuration accommodates two sub-sections: distribution and scanners.
The “distribution” part controls how recordsdata are distributed via the system. The configuration choices are:

  • “close_timeout”: period of time (in seconds) {that a} scanner can spend closing itself (defaults to 30 seconds)
  • “distribution_timeout”: period of time (in seconds) {that a} single file will be distributed to all scanners (defaults to 1800 seconds / 30 minutes)
  • “scanner_timeout”: period of time (in seconds) {that a} scanner can spend scanning a file (defaults to 600 seconds / 10 minutes, will be overridden per-scanner)
  • “maximum_depth”: most depth that little one recordsdata will probably be processed by scanners
  • “taste_mime_db”: location of the MIME database used to style recordsdata (defaults to None, system default)
  • “taste_yara_rules”: location of the listing of YARA recordsdata that accommodates guidelines used to style recordsdata (defaults to and so forth/strelka/style/)

The “scanners” part controls which scanners are assigned to every file; every scanner is assigned by mapping flavors, filenames, and sources from this configuration to the file. “scanners” should all the time be a dictionary the place the secret’s the scanner title (e.g. ScanZip) and the worth is a listing of dictionaries containing values for mappings, scanner precedence, and scanner choices.
Assignment happens via a system of optimistic and unfavourable matches: any unfavourable match causes the scanner to skip task and a minimum of one optimistic match causes the scanner to be assigned. A novel identifier (*) is used to assign scanners to all flavors. See File Distribution, Scanners, Flavors, and Tasting for extra particulars on flavors.
Below is a pattern configuration that runs the scanner “ScanHeader” on all recordsdata and the scanner “ScanRar” on recordsdata that match a YARA rule named “rar_file”.

scanners:
  'ScanHeader':
    - optimistic:
        flavors:
          - '*'
      precedence: 5
      choices:
        size: 50
  'ScanRar':
    - optimistic:
        flavors:
          - 'rar_file'
      precedence: 5
      choices:
        restrict: 1000

The “positive” dictionary determines which flavors, filenames, and sources trigger the scanner to be assigned. Flavors is a listing of literal strings whereas filenames and sources are common expressions. One optimistic match will assign the scanner to the file.
Below is a pattern configuration that exhibits how RAR recordsdata will be matched towards a YARA rule (rar_file), a MIME kind (software/x-rar), and a filename (any that finish with .rar).

scanners:
  'ScanRar':
    - optimistic:
        flavors:
          - 'software/x-rar'
          - 'rar_file'
        filename: '.rar$'
      precedence: 5
      choices:
        restrict: 1000

Each scanner additionally helps unfavourable matching via the “negative” dictionary. Negative matches happen earlier than optimistic matches, so any unfavourable match ensures that the scanner won’t be assigned. Similar to optimistic matches, unfavourable matches assist flavors, filenames, and sources.
Below is a pattern configuration that exhibits how RAR recordsdata will be positively matched towards a YARA rule (rar_file) and a MIME kind (software/x-rar), however provided that they aren’t negatively matched towards a filename (.rar$). This configuration would trigger ScanRar to solely be assigned to RAR recordsdata that don’t have the extension “.rar”.

scanners:
  'ScanRar':
    - unfavourable:
        filename: '.rar$'
      optimistic:
        flavors:
          - 'software/x-rar'
          - 'rar_file'
      precedence: 5
      choices:
        restrict: 1000

Each scanner helps a number of mappings — this makes it attainable to assign completely different priorities and choices to the scanner based mostly on the mapping variables. If a scanner has a number of mappings that match a file, then the primary mapping wins.
Below is a pattern configuration that exhibits how a single scanner can apply completely different choices relying on the mapping.

scanners:
  'ScanX509':
    - optimistic:
        flavors:
          - 'x509_der_file'
      precedence: 5
      choices:
        kind: 'der'
    - optimistic:
        flavors:
          - 'x509_pem_file'
      precedence: 5
      choices:
        kind: 'pem'

Python Logging Configuration (strelka.py)
strelka.py makes use of an ini file (and so forth/strelka/pylogging.ini) to handle cluster-stage statistics and knowledge output by the Python logger. By default, this configuration file will log knowledge to stdout and disable logging for packages imported by scanners.

DirStream Configuration (strelka_dirstream.py)
Strelka’s dirstream configuration file is saved in and so forth/dirstream/dirstream.yml and accommodates two sub-sections: processes and staff.
The “processes” part controls the processes launched by the utility. The configuration choices are:

  • “shutdown_timeout”: period of time (in seconds) that may elapse earlier than the utility forcibly kills little one processes after they’ve acquired a shutdown command (defaults to 10 seconds)

The “workers” part controls listing settings and community settings for every employee that sends recordsdata to the Strelka cluster. This part is a listing; including a number of listing/community settings makes it so a number of directories will be monitored without delay. The configuration choices are:

  • “directory”: listing that recordsdata are despatched from (defaults to None)
  • “source”: software that writes recordsdata to the listing, used to regulate metadata parsing performance (defaults to None)
  • “meta_separator”: distinctive string used to separate items of metadata in a filename, used to parse metadata and ship it together with the file to the cluster (defaults to “S^E^P”)
  • “file_mtime_delta”: delta (in seconds) that should cross since a file was final modified earlier than it’s despatched to the cluster (defaults to five seconds)
  • “delete_files”: boolean that determines if recordsdata ought to be deleted after they’re despatched to the cluster (defaults to False)
  • “broker”: community tackle and community port of the dealer (defaults to “127.0.0.1:5558”)
  • “timeout”: period of time (in seconds) to attend for a file to be efficiently despatched to the dealer (defaults to 10)
  • “use_green”: boolean that determines if PyZMQ inexperienced ought to be used (this will enhance efficiency on the threat of message loss, defaults to True)
  • “broker_public_key”: location of the dealer Curve public key certificates (allows Curve encryption, have to be used if the dealer has Curve enabled)
  • “client_secret_key”: location of the consumer Curve secret key certificates (allows Curve encryption, have to be used if the dealer has Curve enabled)

To allow Bro assist, a Bro file extraction script have to be run by the Bro software; Strelka’s file extraction script is saved in and so forth/bro/extract-strelka.bro and consists of variables that may be redefined at Bro runtime. These variables are:

  • “mime_table”: desk of strings (Bro supply) mapped to a set of strings (Bro mime_type) — this variable defines which file MIME varieties Bro extracts and is configurable based mostly on the placement Bro recognized the file (e.g. extract software/x-dosexec recordsdata from SMTP, however not SMB or FTP)
  • “filename_re”: regex sample that may extract recordsdata based mostly on Bro filename
  • “unknown_mime_source”: set of strings (Bro supply) that determines if recordsdata of an unknown MIME kind ought to be extracted based mostly on the placement Bro recognized the file (e.g. extract unknown recordsdata from SMTP, however not SMB or FTP)
  • “meta_separator”: string utilized in extracted filenames to separate embedded Bro metadata — this should match the equal worth in and so forth/dirstream/dirstream.yml
  • “directory_count_interval”: interval used to schedule how typically the script checks the file rely within the extraction listing
  • “directory_count_threshold”: int that’s used as a set off to briefly disable file extraction if the file rely within the extraction listing reaches the edge

Encryption and Authentication
Strelka has constructed-in, elective encryption and authentication for consumer connections offered by CurveZMQ.

CurveZMQ
CurveZMQ (Curve) is ZMQ’s encryption and authentication protocol. Read extra about it here.

Using Curve
Strelka makes use of Curve to encrypt and authenticate connections between purchasers and brokers. By default, Strelka’s Curve assist is setup to allow encryption however not authentication.
To allow Curve encryption, the dealer have to be loaded with a non-public key — any purchasers connecting to the dealer should have the dealer’s public key to efficiently join.
To allow Curve encryption and authentication, the dealer have to be loaded with a non-public key and a listing of consumer public keys — any purchasers connecting to the dealer should have the dealer’s public key and have their consumer key loaded on the dealer to efficiently join.
The generate_curve_certificates.py utility can be utilized to create consumer and dealer certificates.

Clusters
The following are suggestions and concerns to remember when deploying clusters.

General Recommendations
The following suggestions apply to all clusters:

  • Do not run staff on the identical server as a dealer
    • This places the well being of the complete cluster in danger if the server turns into over-utilized
  • Do not over-allocate staff to CPUs
  • Allocate a minimum of 1GB RAM per employee
    • If staff don’t have sufficient RAM, then there will probably be extreme reminiscence errors
    • Big recordsdata (particularly compressed recordsdata) require extra RAM
    • In massive clusters, diminishing returns start above 4GB RAM per employee
  • Allocate as a lot RAM as affordable to the dealer
    • ZMQ messages are saved solely in reminiscence — in massive deployments with many purchasers, the dealer could use a variety of RAM if the employees can not sustain with the variety of file duties

Sizing Considerations
Multiple variables ought to be thought of when figuring out the suitable dimension for a cluster:

  • Number of file requests per second
  • Type of file requests
    • Remote file requests take longer to course of than direct file requests
  • Diversity of recordsdata requested
    • Binary recordsdata take longer to scan than textual content recordsdata
  • Number of YARA guidelines deployed
    • Scanning a file with 50,000 guidelines takes longer than scanning a file with 50 guidelines

The finest method to correctly dimension a cluster is to begin small, measure efficiency, and scale out as wanted.

Docker Considerations
Below is a listing of concerns to remember when operating a cluster with Docker containers:

  • Share volumes, not recordsdata, with the container
    • Strelka’s staff will learn configuration recordsdata and YARA guidelines recordsdata after they startup — sharing volumes with the container ensures that up to date copies of those recordsdata on the localhost are mirrored precisely contained in the container without having to restart the container
  • Increase stop-timeout
    • By default, Docker will forcibly kill a container if it has not stopped after 10 seconds — this worth ought to be elevated to larger than the shutdown_timeout worth in and so forth/strelka/strelka.yml
  • Increase shm-size
    • By default, Docker limits a container’s shm dimension to 64MB — this will trigger errors with Strelka scanners that make the most of tempfile
  • Set logging options
    • By default, Docker has no log restrict for logs output by a container

Management
Due to its distributed design, we advocate utilizing container orchestration (e.g. Kubernetes) or configuration administration/provisioning (e.g. Ansible, SaltStack, and so forth.) methods for managing clusters.

MoreTip.com MoreTip.com

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.