Capsule8 Console Docs
Capsule8 Console Docs
Help

Getting investigations data with Presto

Overview

Capsule8’s recommended deployment for non-cloud environments is to use HDFS (Hadoop Distributed File System) for storage and Presto for the querying engine.

Configuration

Investigations is configured in the /etc/capsule8/capsule8-sensor.yaml file. By default, the Process Events, Sensor, and Container Events are enabled. Which means that if no table key is provided, those tables will be turned on automatically with a default row size – unique for each MetaEvent type. To configure additional tables, specify them directly.

This is a complete example with every MetaEvent type configured to write to HDFS:

Investigations:
      reporting_interval: 5m
    	sinks:
    		- name: "[The address of the name node here]:9000/[directory on hdfs to store data, absolute path]"
  	      backend: hdfs
          automated: true
  	      credentials: 
  	      blob_storage_hdfs_user: "[hadoop username that has write access]"
  	      blob_storage_create_buckets_enabled: true
      flight_recorder:
          enabled: true
    	    Tables:
						- name: "shell_commands"
							enabled: true
						- name: "tty_data"
							enabled: true
						- name: "file_events"
							enabled: false
						- name: "connections"
							enabled: true
						- name: "sensor_metadata"
							enabled: true
						- name: "alerts"
							enabled: true
						- name: "sensors"
							enabled: true
						- name: "process_events"
							enabled: true
						- name: "container_events"
							enabled: true

Storage Solutions

This section provides guides to aid in the installation/setup of storage solutions for Capsule8’s investigations data.

HDFS

Credentials

Currently, only insecure HDFS is supported. Only the username of a user that has write access to the directory – that would store Investigations data – is required. In addition to this, every sensor writing to HDFS will need to be able to access the namenodes on ports 8020/9000 and all of the datanodes on port 50010 and 50020.

Sensor Configuration

To write MetaEvents to HDFS, you can configure it with a address of the name node with port and a user:

Investigations:
      reporting_interval: 5m
    	sinks:
         - name: "[The address of the name node here]:9000/[directory on hdfs to store data, absolute path]"
           backend: hdfs
           automated: true
           credentials: 
               blob_storage_hdfs_user: "[hadoop username that has write access]"
               blob_storage_create_buckets_enabled: true

Create Bucket

It is highly recommended to have blob_storage_create_buckets_enabled: true set for HDFS. This is because of the hierarchical nature of HDFS vs. the flat nature of blob storage. If a table subdirectory or partition folder does not exist, it will fail to write.

Automatic

The following settings will ensure that folders are created in HDFS if they do not exist. In /etc/capsule8/capsule8-sensor.yaml enable the blob_storage_create_buckets_enabled field. See the example configuration below.

blob_storage_create_buckets_enabled: true
blob_storage_hdfs_user: <hdfs user>

Query Solutions

This section provides guides to aid in the installation/setup of query solutions with Capsule8’s investigations.

Presto: Manual

Create and Configure Bucket

See HDFS in Storage Solutions

HDFS Configuration

For HDFS see Presto’s guide

Example Queries

Queries are run using SQL syntax. This section provides a few example queries that might be of use during an investigation. For a complete reference of all the available fields that can be queried, see the MetaEvents section at the end of this guide.

Who Has Run a Command Through Sudo?

SELECT from_unixtime(process_events.unix_nano_timestamp /
1000000000) as timestamp,
pid, path, username, login_username
FROM process_events where event_type = 0 and username !=
login_username;

Which Programs and their Users Connected to a Given IP?

SELECT DISTINCT from_unixtime(connections.unix_nano_timestamp
/ 1000000000) AS timestamp,
sensors.hostname,
process_events.path,
container_events.container_name,
container_events.image_name,
connections.dst_addr,
connections.dst_port
FROM connections, sensors, container_events, process_events
WHERE connections.process_c8id = process_events.process_c8id
AND container_events.process_c8id = ## Storage solutionsprocess_events.proce

ss_c8id
AND connections.dst_addr = '$DESTINATION_IP';

What Containers or Images Ran on my Cluster and where?

SELECT sensors.hostname
  container_events.image_name, from_unix_time(container_events.unix_nano_timestamp / 1000000000) as timestamp
FROM sensors, container_events;

Get All Alerts that Are Part of an Incident

SELECT *
FROM alerts where incident_id = '$INCIDENT_ID';

Get All Shell Commands That are Part of an Incident

SELECT from_unixtime(shell_commands.unix_nano_timestamp /
1000000000) AS timestamp,
   sensors.hostname,
   array_join(shell_commands.program_arguments, ' ') as args,
   shell_commands.username
FROM shell_commands
JOIN sensors ON sensors.sensor_id = shell_commands.sensor_id
WHERE shell_commands.incident_id = '$INCIDENT_ID';