Apache Spot Open Data Models

Many organizations have built threat detection capabilities leveraging myriad vendor solutions. This approach leads to many silos of data corresponding to each vendor and often results in storing multiple copies of the same data, as each vendor's capability operates independently from the others. There is no single vendor able to cost-effectively store and analyze all the data required to detect threats and facilitate incident investigations and remediation.

Apache Spot ODM brings together all security-related data (event, user, network, endpoint, etc.) into a singular view that can be used to detect threats more effectively than ever before. This consolidated view can be leveraged to create new analytic models that were not previously possible and to provide needed context at the event level to effectively determine whether or not there is a threat. The Apache Spot ODM enables the sharing and reuse of threat detection models, algorithms and analytics, because of a shared, open data model.

The open data model (ODM) provides a common taxonomy for describing security telemetry data used to detect threats. It uses schemas, data structures, file formats and configurations in the underlying Hadoop platform for collecting, storing and analyzing security telemetry data at scale. Spot defines relationships amongst the various security data types for joining log data with user, network and endpoint entity data.

The Apache Spot ODM enables organizations to:

  • Store one copy of the security telemetry data and apply UNLIMITED analytics
    • Leverage out-of-the-box analytics powered by machine learning to detect threats in DNS, Flow and Proxy data
    • Build custom analytics to your desired specification
    • Plug-in third-party vendor analytics that interoperate with the ODM
  • Share and/or reuse threat detection models, algorithms, ingest pipelines, visualizations and analytics across the Apache Spot community, due to a common data model.
  • Leverage all your security telemetry data to establish the context needed to better detect threats
    • Security logs
    • User, endpoint and network entity data
    • Threat intelligence data
  • Avoid "lock-in" to a specific technology and gain needed analytic flexibility resultant from a shared, open data model.

Data Models

In order to provide a framework for effectively analyzing data for cyber threats, it is necessary to collect and analyze standard security event logs/alerts and contextual data regarding the entities referenced in these logs/alerts. The most common entities include network, user and endpoint, but there are others such as file and certificate.

In the diagram below, the raw event tells us that user "jsmith" successfully logged in to an Oracle database from the IP address 10.1.1.3. Based on the raw event only, we don't know if this event is a legitimate threat or not. After injecting user and endpoint context, the enriched event tells us this event is a potential threat that requires further investigation.

Data Models

Based on the need to collect and analyze both security event logs/alerts and contextual data, support for the following types of security information are included in the Spot open data model:

Security event logs/alerts

  • This data type includes event logs from common data sources used to detect threats and includes network flows, operating system logs, IPS/IDS logs, firewall logs, proxy logs, web logs, DLP logs, etc.

Network context data

  • This data type includes information about the network, which can be gleaned from Whois servers, asset databases and other similar data sources.

User context data

  • This data type includes information from user and identity management systems including Active Directory, Centrify, and other similar systems.

Endpoint context data

  • This data includes information about endpoint systems (servers, workstations, routers, switches, etc.) and can be sourced from asset management systems, vulnerability scanners, and endpoint management/detection/response systems such as Webroot, Tanium, Sophos, Endgame, CarbonBlack and others.

Threat intelligence context data

  • This data includes contextual information about URLs, domains, websites, files and others.

Vulnerability context data

  • This data includes contextual information about vulnerabilities and is typically sources from vulnerability management systems (i.e. Qualys, Tenable, etc.).

Roadmap Items:

  • File context data
  • Certificate context data

Naming Convention

A naming convention is needed for the open data model to represent attributes across vendor products and technologies. The naming convention is composed of prefixes (net, http, src, dst, etc.) and common attribute names (ip4, user_name, etc.). It is common to use multiple prefixes in combination with an attribute. The following examples are provided to illustrate the naming convention.

src_ip4

  • "src" - this prefix indicates the attribute pertains to details about the "source" entity referenced in the event (src_ip4, src_user_name, src_host, etc.)
  • "ip4" - this attribute name corresponds to an IP address (version 4)
  • Summary: This attribute represents the source ip address (version 4) within the referenced event

prx_browser

  • "prx" - this prefix indicates the attribute pertains to a "Proxy" event
  • "browser" -this attribute name corresponds to the "browser" referenced within the event
  • Summary: This attribute represents the browser (i.e. "Mozilla", "Internet Explorer", etc.) referenced in the Proxy event

dvc_host

  • "dvc" - This prefix indicates the attribute pertains to the "Device" that is the source of the event
  • "host" - This attribute name corresponds to the "hostname"
  • Summary: This attribute represents the hostname of the device where the event was generated

Prefixes

Prefix Description
src Corresponds to the "source" fields within a given event (i.e. source address)
dst Corresponds to the "destination" fields within a given event (i.e. destination address)
dvc Corresponds to the "device" applicable fields within a given event (i.e. device address) and represent where the event originated
fwd Forwarded from device
request Corresponds to requested values (vs. those returned, i.e. "requested URI")
response Corresponds to response value (vs. those requested)
file Corresponds to the "file" fields within a given event (i.e. file type)
user Corresponds to user attributes (i.e. name, id, etc.)
xlate Corresponds to translated values within a given event (i.e. src_xlate_ip for "translated source ip address"
in Ingress
out Egress
new New value
orig Original value
app Corresponds to values associated with application events
net Corresponds to values associated with network attributes (direction, flags)
end Corresponds to values associated with endpoint attributes
dns Corresponds to attributes within the DNS protocol
prx Corresponds to attributes within Proxy events
av Corresponds to attributes within Antivirus events
http Corresponds to attributes within the HTTP protocol
smtp Corresponds to attributes within the SMTP protocol
ftp Corresponds to attributes within the FTP protocol
snmp Corresponds to attributes within the SNMP protocol
tls Corresponds to attributes within the TLS protocol
ssh Corresponds to attributes within the SSH protocol
dhcp Corresponds to attributes within the DHCP protocol
irc Corresponds to attributes within the IRC protocol
flow Corresponds to attributes within FLOW events
ti Corresponds to attributes within Threat Intelligence context data
vuln Corresponds to attributes within vulnerability management data

Security Event Log/Alert Data Model

The data model for security event logs/alerts is detailed in the below. The attributes are categorized as follows:

Common

  • Attributes that are common across many device types

Device

  • Attributes that are applicable to the device that generated the event

Network

  • Attributes that are applicable to the network components of the event

File

  • Attributes that are applicable to file objects referenced in the event

Endpoint

  • Attributes that are applicable to the endpoints referenced in the event

User

  • Attributes that are applicable to the user referenced in the event

Proxy

  • Attributes that are applicable to proxy events

Antivirus

  • Attributes that are applicable to antivirus events

Vulnerability

  • Attributes that are applicable to vulnerability management events

Protocol

  • DNS - attributes that are specific to the DNS protocol
  • HTTP - attributes that are specific to the HTTP protocol
  • ….SMTP, SSH, TLS, DHCP, IRC, SNMP and FTP

Note: The model will evolve to include reserved attributes for additional device types that are not currently represented. The model can currently be extended to support ANY attribute for ANY device type by following the guidance outlined in the section titled "Extensibility of Data Model".

Category Attribute Data type Description Sample Values
Common event_time long timestamp of event (UTC) 1472653952
begintime long timestamp 1472653952
endtime long timestamp 1472653952
event_insertime long timestamp 1472653952
lastupdatetime long timestamp 1472653952
duration float Time duration (milliseconds) 2345
event_id string Unique identifier for event x:2388
name string Name of event "Successful login …"
org string Organization "HR" or "Finance" or "CustomerA"
type string Type information "Informational", "image/gif"
n_proto string Network protocol of event TCP, UDP, ICMP
a_proto string Application protocol of event HTTP, NFS, FTP
msg string Message (details of action taken on object) Some long string
mac string MAC address 94:94:26:3:86:16
severity string Severity of event High, 10, 1
raw string Raw text message of entire event Complete copy of log entry
risk Floating point Risk score 95.67
code string Response or error code 404
category string Event category /Application/Start
query string Query (DNS query, URI query, SQL query, etc.) Select * from table
service string (i.e. service name, type of service) sshd
state string State of object Running, Paused, stopped
in_bytes int Bytes in 1025
out_bytes int Bytes out 9344
xref string External reference to public description http://www.oracle.com/technetwork/java/javase/2col/6u85-bugfixes-2298235.html
version string Version 5.4
api string API label "somestring"
parameter string Parameter label "somestring"
action string Action label "somestring"
proc string Process label "somestring"
app string Application label "somestring"
disposition string Disposition label "somestring"
prevalence string Prevalence label "somestring"
confidence string Confidence label "somestring"
sensitivity string Sensitivity label "somestring"
count int Generic count 20
company string Company label "somestring"
additional_attrs String (JSON Map) Custom event attributes "building":"729","cube":"401"
totrust string Coming soon Coming soon
fromtrust string Coming soon Coming soon
rule string Coming soon Coming soon
threat string Coming soon Coming soon
pcap_id int Coming soon Coming soon
Device dvc_time long UTC timestamp from device where event/alert originates or is received 1472653952
dvc_ip4/dvc_ip6 long IP address of device Integer representation of 10.1.1.1
dvc_group string Device group label "somestring"
dvc_server string Server label "somestring"
dvc_host string Hostname of device Integer representation of 10.1.1.1
dvc_domain string Domain of dvc "somestring"
dvc_type string Device type that generated the log Unix, Windows, Sonicwall
dvc_vendor string Vendor Microsoft, Fireeye
dvc_fwd_ip4/fwd_ip6 long Forwarded from device Integer representation of 10.1.1.1
dvc_version string Version "3.2.2"
Network src_ip4/src_ip6 bigint Source ip address of event Integer representation of 10.1.1.1
src_host string Source FQDN of event test.companyA.com
src_domain string Domain name of source address companyA.com
src_port int Source port of event 1025
src_country_code string Source country code cn
src_country_name string Source country name China
src_region string Source region string
src_city string Source city Shenghai
src_lat int Source latitude 90
src_long int Source longitude 90
dst_ip4/dst_ip6 bigint Destination ip address of event Integer representation of 10.1.1.1
dst_host string Destination FQDN of event test.companyA.com
dst_domain string Domain name of destination address companyA.com
dst_port int Destination port of event 80
dst_country_code string Source country code cn
dst_country_name string Source country name China
dst_region string Source region string
dst_city string Source city Shenghai
dst_lat int Source latitude 90
dst_long int Source longitude 90
src_asn int Autonomous system number 33
dst_asn int Autonomous system number 33
net_direction string Direction In, inbound, outbound, ingress, egress
net_flags string TCP flags .AP.SF
File file_name string Filename from event output.csv
file_path string File path /root/output.csv
file_atime bigint Timestamp (UTC) of file access 1472653952
file_acls string File permissions rwx-rwx-rwx
file_type string Type of file ".doc"
file_size int Size of file in bytes 1244
file_desc string Description of file Project Plan for Project xyz
file_hash string Hash of file
file_hash_type string Type of hash MD5, SHA1,SHA256
Endpoint end_object string File/Process/Registry File, Registry, Process
end_action string Action taken on object (open/delete/edit) Open, Edit
end_msg string Message (details of action taken on object) Some long string
end_app string Application Microsoft Powerpoint
end_location string Location Atlanta, GA
end_proc string Process SSHD
User user_name (Src_user_name, dst_user_name) string username from event jsmith
user_email string Email address test@companyA.com
user_id string userid 234456
user_loc string location Herndon, VA
user_desc string Description of user "somestring"
DNS dns_class string DNS class 1
dns_len int DNS frame length 188
dns_query string Requested DNS query test.test.com
dns_response_code string Response code 0x00000001
dns_answers string Response to DNS Query 178.2.1.99
dns_type int DNS query type 1
Proxy prx_category string Event category SG-HTTP-SERVICE
prx_browser string Web browser Internet Explorer
prx_code string Error or response code 404
prx_referrer string Referrer www.usatoday.com
prx_host string Requested URI /wcm/assets/images/imagefileicon.gif
prx_filter_rule string Applied filter or rule Internet, Rule 6
prx_filter_result string Result of applied filter or rule Proxied, Blocked
prx_query string URI query ?func=S_senseHTML&Page=a26815a313504697a126279
prx_action string Action taken on object TCP_HIT, TCP_MISS, TCP_TUNNELED
prx_method string HTTP method GET, CONNECT, POST
prx_type string Type of request image/gif
HTTP http_request_method string HTTP method GET, CONNECT, POST
http_request_uri string Requested URI /wcm/assets/images/imagefileicon.gif
http_request_body_len int Length of request body 98
http_request_user_name string username from event jsmith
http_request_password string Password from event abc123
http_request_proxied string Proxy request label "somestring"
http_request_headers MAP HTTP request headers request_headers['HOST'] request_headers['USER-AGENT'] request_headers['ACCEPT']
http_response_status_code int HTTP response status code 404
http_response_status_msg string HTTP response status message "Not found"
http_response_body_len int Length of response body 98
http_response_info_code int HTTP response info code 100
http_response_info_msg string HTTP response info message "somestring"
http_response_resp_fuids string Response FUIDS "somestring"
http_response_mime_types string Mime types "cgi,bat,exe"
http_response_headers MAP Response headers response_headers['SERVER'] response_headers['SET-COOKIE'] response_headers['DATE']
SMTP smtp_trans_depth int Depth of email into SMTP exchange 2
smtp_headers_helo string Helo header "somestring"
smtp_headers_mailfrom string Mailfrom header "somestring"
smtp_headers_rcptto string Rcptto header "somestring"
smtp_headers_date string Header date "somestring"
smtp_headers_from string From header "somestring"
smtp_headers_to string To header "somestring"
smtp_headers_reply_to string Reply to header "somestring"
smtp_headers_msg_id string Message ID "somestring"
smtp_headers_in_reply_to string In reply to header "somestring"
smtp_headers_subject string Subject "somestring"
smtp_headers_x_originating_ip4 bigint Originating IP address 1203743731
smtp_headers_first_received string First to receive message "somestring"
smtp_headers_second_received string Second to receive message "somestring"
smtp_last_reply string Last reply in message chain "somestring"
smtp_path string Path of message "somestring"
smtp_user_agent string User agent "somestring"
smtp_tls boolean Indication of TLS use 1
smtp_is_webmail boolean Indication of webmail 0
FTP ftp_user_name string Username "somestring"
ftp_password string Password "somestring"
ftp_command string FTP command "somestring"
ftp_arg string Argument "somestring"
ftp_mime_type string Mime type "somestring"
ftp_file_size int File size 1024
ftp_reply_code int Reply code 3
ftp_reply_msg string Reply message "somestring"
ftp_data_channel_passive boolean Passive data channel? 1
ftp_data_channel_rsp_p string "somestring"
ftp_cwd string Current working directory "somestring"
ftp_cmdarg_ts float Coming soon
ftp_cmdarg_cmd string Command "somestring"
ftp_cmdarg_arg string Command argument "somestring"
ftp_cmdarg_seq int Sequence 2
ftp_pending_commands string Pending commands "somestring"
ftp_is_passive boolean Passive mode enabled 0
ftp_fuid string Coming soon "somestring"
ftp_last_auth_requested string Coming soon "somestring"
SNMP snmp_version string Coming soon "somestring"
snmp_community string Coming soon "somestring"
snmp_get_requests int Coming soon Coming soon
snmp_get_bulk_requests int Coming soon Coming soon
snmp_get_responses int Coming soon Coming soon
snmp_set_requests int Coming soon Coming soon
snmp_display_string string Coming soon Coming soon
snmp_up_since float Coming soon Coming soon
TLS tls_version string Coming soon Coming soon
tls_cipher string Coming soon Coming soon
tls_curve string Coming soon Coming soon
tls_server_name string Coming soon Coming soon
tls_resumed boolean Coming soon Coming soon
tls_next_protocol string Coming soon Coming soon
tls_established boolean Coming soon Coming soon
tls_cert_chain_fuids string Coming soon Coming soon
tls_client_cert_chain_fuids string Coming soon Coming soon
tls_subject string Coming soon Coming soon
tls_issuer string Coming soon Coming soon
SSH ssh_version string Coming soon Coming soon
ssh_auth_success boolean Coming soon Coming soon
ssh_client string Coming soon Coming soon
ssh_server string Coming soon Coming soon
ssh_cipher_algorithm string Coming soon Coming soon
ssh_mac_algorithm string Coming soon Coming soon
ssh_compression_algorithm string Coming soon Coming soon
ssh_key_exchange_algorithm string Coming soon Coming soon
ssh_host_key_algorithm string Coming soon Coming soon
DHCP dhcp_assigned_ip4 bigint Coming soon Coming soon
dhcp_mac string Coming soon Coming soon
dhcp_lease_time double Coming soon Coming soon
IRC irc_user string Coming soon Coming soon
irc_nickname string Coming soon Coming soon
irc_command string Coming soon Coming soon
irc_value string Coming soon Coming soon
irc_additional_data string Coming soon Coming soon
Flow flow_in_packets int Coming soon Coming soon
flow_out_packets int Coming soon Coming soon
flow_conn_state string Coming soon Coming soon
flow_history string Coming soon Coming soon
flow_src_dscp string Coming soon Coming soon
flow_dst_dscp string Coming soon Coming soon
flow_input string Coming soon Coming soon
flow_output string Coming soon Coming soon
Vulnerability vuln_id string Unique vulnerability identifier 10748
vuln_type string Vulnerability title (i.e. Wireshark Multiple Vulnerabilities)
vuln_status string Vulnerability type (Potential, Confirmed, etc.)
vuln_severity string Vulnerability severity (Critical, High, etc.)
created bigint Timestamp of vulnerability identification
Antivirus av_riskname string Coming soon Coming soon
av_actualaction string Coming soon Coming soon
av_requestedaction string Coming soon Coming soon
av_secondaryaction string Coming soon Coming soon
av_downloadsite string Coming soon Coming soon
av_downloadedby string Coming soon Coming soon
av_tracking_status string Coming soon Coming soon
av_firstseen bigint Coming soon Coming soon
application_hash string Coming soon Coming soon
application_hash_type string Coming soon Coming soon
application_name string Coming soon Coming soon
application_version string Coming soon Coming soon
application_type string Coming soon Coming soon
av_categoryset string Coming soon Coming soon
av_categorytype string Coming soon Coming soon
av_threat_count int Coming soon Coming soon
av_infected_count int Coming soon Coming soon
av_omitted_count int Coming soon Coming soon
av_scanid int Coming soon Coming soon
av_startmessage string Coming soon Coming soon
av_stopmessage string Coming soon Coming soon
av_totalfiles int Coming soon Coming soon
av_signatureid string Coming soon Coming soon
av_signaturestring string Coming soon Coming soon
av_signaturesubid string Coming soon Coming soon
av_intrusionurl string Coming soon Coming soon
av_intrusionpayloadurl string Coming soon Coming soon
objectname string Coming soon Coming soon

Note, it is not necessary to populate all of the attributes within the model. For attributes not populated in a single security event log/alert, contextual data may not be available. For example, the sample event below can be enriched with contextual data about the referenced endpoints (10.1.1.1 and 192.168.10.10), but not a user, because username is not populated.

{
"date":"12/12/2015",
"time":"23:14:56",
"source_ip":"10.1.1.1",
"source_port":1025,
"protocol":"tcp",
"destination_ip":"192.168.10.10",
"destination_port":443,
"bytes":"1183"
}

Context Models

The recommended approach for populating the context models (user, endpoint, network, threat intelligence, etc.) involves consuming information from the systems most capable or providing the needed context. Populating the user context model is best accomplished by leveraging user/identity management systems such as Active Directory or Centrify and populating the model with details such as the user's full name, job title, phone number, manager's name, physical address, entitlements, etc. Similarly, an endpoint model can be populated by consuming information from endpoint/asset management systems (Tanium, Webroot, etc.), which provide information such as the services running on the system, system owner, business context, etc.

User Context Model

Attribute Data Type Description Sample Values
dvc_time bigint Timestamp from when the user context information is obtained 1472653952
user_created bigint Timestamp from when user was created 1472653952
user_changed bigint Timestamp from when user was updated 1472653952
user_last_logon bigint Timestamp from when user last logged on 1472653952
user_logon_count int Number of times account has logged on 232
user_last_reset bigint Timestamp from when user last reset password 1472653952
user_expiration bigint Date/time when user expires 1472653952
user_id string Unique user id 1234
user_image binary Image/picture of user
user_name string Username in event log/alert jsmith
user_name_first string First name John
user_name_middle string Middle name Henry
user_name_last string Last name Smith
user_name_mgr string Manager's name Ronald Reagan
user_phone string Phone number 703-555-1212
user_email string Email address jsmith@company.com
user_code string Job code 3455
user_loc string Location US
user_departm string Department IT
user_dn Distinguished name "CN=scm-admin-mej-test2-adk,OU=app-admins,DC=ad,DC=halxg,DC=companya,DC=com"
user_ou string Organizational unit EAST
user_empid string Employee ID 12345
user_title string Job Title Director of IT
user_groups array (Comma separated) Groups to which the user belongs "Domain Admins", "Domain Users"
dvc_type string Device type that generated the user context data Active Directory
dvc_vendor string Vendor Microsoft
user_risk Floating point Risk score 95.67
dvc_version string Version 8.1.2
additional_attrs map Additional attributes of user Key value pairs

Endpoint Context Model

Abbreviation Data Type Description Sample Values
dvc_time bigint Timestamp from when the endpoint context information is obtained 1472653952
end_ip4 bigint IP address of endpoint Integer representation of 10.1.1.1
end_ip6 bigint IP address of endpoint Integer representation of 10.1.1.1
end_os string Operating system Redhat Linux 6.5.1
end_os_version string Version of OS 5.4
end_os_sp string Service pack SP 2.3.4.55
end_tz string timezone EST
end_hotfixes array (Comma separated) Applied hotfixes 993.2
end_disks array (Comma separated) Available disks \\Device\\HarddiskVolume1, \\Device\\HarddiskVolume2
end_removables array (Comma separated) Removable media devices USB Key
end_nics array (Comma separated) Network interfaces fe10::28f4:1a47:658b:d6e8, fe82::28f4:1a47:658b:d6e8
end_drivers array (Comma separated) Installed kernel drivers ntoskrnl.exe, hal.dll
end_users array (Comma separated) Local user accounts administrator, jsmith
end_host string Hostname of endpoint tes1.companya.com
end_mac string MAC address of endpoint fe10::28f4:1a47:658b:d6e8
end_owner string Endpoint owner (name) John Smith
end_vulns array (Comma separated) Vulnerability identifiers (CVE identifier) CVE-123, CVE-456
end_loc string Location US
end_departm string Department name IT
end_company string Company name CompanyA
end_regs array (Comma separated) Applicable regulations HIPAA, SOX
end_svcs array (Comma separated) Services running on system Cisco Systems, Inc. VPN Service, Adobe LM Service
end_procs array (Comma separated) Processes svchost.exe, sppsvc.exe
end_criticality string Criticality of device Very High
end_apps array (Comma separated) Applications running on system Microsoft Word, Chrome
end_desc string Endpoint descriptor Some string
dvc_type string Device type that generated the log Microsoft Windows 7
dvc_vendor string Vendor Endgame
dvc_version string Version 2.1
end_architecture string CPU architecture x86
end_uuid string Universally unique identifier a59ba71e-18b0-f762-2f02-0deaf95076c6
end_risk Floating point Risk score 95.67
end_memtotal int Total memory (bytes) 844564433
additional_attrs map Additional attributes Key value pairs

Vulnerability Context Model

Attribute Data Type Description Sample Values
vuln_id string Unique vulnerability identifier 10748
vuln_title string Vulnerability title "Wireshark Multiple Vulnerabilities"
vuln_description string Vulnerability description
vuln_solution string Vulnerability remediation description "Patch: The following URLs provide patch procedures .."
vuln_type string Vulnerability type Potential, Confirmed, etc.
vuln_category string Vulnerability category Ubuntu, Windows, etc.
vuln_status string Vulnerability status Active, Fixed, etc.
vuln_severity string Vulnerability severity Critical, High, Medium, etc.
vuln_created bigint Vulnerability creation timestamp timestamp
vuln_updated bigint Vulnerability updated timestamp timestamp
additional_attrs map Additional attributes Key value pairs

Network Context Model

Attribute Data Type Description Sample Values
net_domain_name string Domain name
net_registry_domain_id string Registry Domain ID
net_registrar_whois_server string Registrar WHOIS Server
net_registrar_url string Registrar URL
net_update_date bigint UTC timestamp
net_creation_date bigint Creation Date
net_registrar_registration_expiration_date bigint Registrar Registration Expiration Date
net_registrar string Registrar
net_registrar_iana_id string Registrar IANA ID
net_registrar_abuse_contact_email string Registrar Abuse Contact Email
net_registrar_abuse_contact_phone string Registrar Abuse Contact Phone
net_domain_status string Domain Status
net_registry_registrant_id string Registry Registrant ID
net_registrant_name string Registrant Name
net_registrant_organization string Registrant Organization
net_registrant_street string Registrant Street
net_registrant_city string Registrant City
net_registrant_state_province string Registrant State/Province
net_registrant_postal_code string Registrant Postal Code
net_registrant_country string Registrant Country
net_registrant_phone string Registrant Phone
net_registrant_email string Registrant Email
net_registry_admin_id string Registry Admin ID
net_name_servers string Name Server
net_dnssec string DNSSEC
net_risk Floating point Risk score 95.67

Threat Intelligence Context Model

Attribute Data Type Description
ti_source String TI Provider, Open Source List, Internally Developed, LE Tip, Other
ti_provider_id String Anomali, CrowdStrike, Mandiant, Alienvault OTX, USCERT, etc
ti_indicator_id String Unique IQ from the provider
ti_indicator_desc String Full Text descriptor and links of the Indicator and associated information
ti_date_added UTC Timestamp Date first added by the provider
ti_date_modified UTC Timestamp Date last updated by the provider.
ti_risk_impact String Likely Targets what function within the organization?
ti_severity String Nation State, Targeted, Advanced, Commodity, Other
ti_category String Ecrime, Hacktivism, Geo Pollitical, Foreign Intelligence Service
ti_campaign_name String Internal Campaign designation
ti_deployed_location array (Comma separated) Where this indicator should be matched for applicability (Core, Perimeter, Network, Endpoint, Logs, ALL, etc)
ti_associated_incidents String Known Associated Incident ID's
ti_adversarial_identification_group String Adversary Group designation usually provided by the provider.
ti_adversarial_identification_tactics String Known Adversary Tactics as indicated by the source provider.
ti_adversarial_identification_reports String Linked Adversary reports.
ti_phase String Discovery, Weaponization, Delivery, C2, Exploitation, Actions on Objectives, etc
ti_indicator_cve String MITRE CVE Link(s)
ti_indicator_ip4 array CIDR noted IPv4 Address Indicated by Threat Intelligence
ti_indicator_ip6 array IPv6 Address Indicated by Threat Intelligence
ti_indicator_domain String Domain Name(s)
ti_indicator_hostname String Host or Subdomain Name(es)
ti_indicator_email array (Comma separated) Email addresses associated with Indicator
ti_indicator_url array (Comma separated) URL(s) associated with indicator
ti_indicator_uri array (Comma separated) URI(s) associated with indicator
ti_indicator_file_hash String File Hash Value associated with the indicator.
ti_indicator_file_path String File Path Value associated with the indicator.
ti_indicator_mutex String MUTEX Value associated with the indicator.
ti_indicator_md5 String MD5 Hash Sum Value
ti_indicator_sha1 String SHA1 Hash Sum Value
ti_indicator_sha256 String SHA256 Hash Sum Value
ti_indicator_device_path String Device Path Value associated with the indicator.
ti_indicator_drive String Drive Value associated with the indicator.
ti_indicator_file_name String File Name Value associated with the indicator.
ti_indicator_file_extension String File Extension Value associated with the indicator.
ti_indicator_file_size String File Size Value associated with the indicator.
ti_indicator_file_created bigint Date File value associated with the indicator was created.
ti_indicator_file_accessed bigint Date File value associated with the indicator was last accessed.
ti_indicator_file_changed bigint Date File value associated with the indicator was last changed.
ti_indicator_file_entropy String Calculated entropy value associated with the file indicated.
ti_indicator_file_attributes array (Comma separated) Read Only, System, Hidden, Directory, Archive, Device, Temporary, SparseFile, Compressed, Encrypted, Index, Deleted, etc
ti_indicator_user_name String username associated with the indicator.
ti_indicator_security_id String if known securityID associated with the indicator.
ti_indicator_pe_info array (Comma separated) Subsystem, BaseAddress, PETImeStamp, Expert, JumpCodes, DetectedAnomalies, DigitalSignatures,VersionInfo, ResourceInfo,Imported Modules
ti_indicator_pe_type array (Comma separated) Executable, DLL, Invalid, Unknown, Native, Windows_GUI, OS2, POSIX, EFI, etc
ti_indicator_strings array (Comma separated) Any strings associated with the file indicated that might be useful in identification or further indicator development or adversary identification.
ti_indicator_org String Name of the business that owns the IP address associated with the indicator
ti_indicator_reg_name String Name of the person who registered the domain
ti_indicator_reg_email String Email address of the person who registered the domain
ti_indicator_reg_org String Name of the organisation that registered the domain
ti_indicator_reg_phone String Phone number associated with the domain registered
ti_tags String Additional comments/associations from the feed
ti_threat_type String malware, compromised, apt, c2, etc...

Extensibility of Data Models

The aforementioned data model can be extended to accommodate custom attributes by embedding key-value pairs within the log/alert/context entries.

Each model will support an additional attribute by the name of additional_attrs whose value would be a JSON string. This JSON string will contain a Map (and only a Map) of additional attributes that can't be expressed in the specified model description. Regardless of the type of these additional attributes, they will always be interpreted as String. It's up to the user, to translate them to appropriate types, if necessary, in the analytics layer. It is also the user's responsibility to populate the aforementioned attribute as a Map, by presumably parsing out these attributes from the original message.

For example, if a user wanted to extend the user context model to include a string attribute for "Desk Location" and "City", the following string would be set for additional_attrs:

Attribute key Attribute value
additional_attrs {"dsk_location":"B3-F2-W3", "city":"Palo Alto"}

Something similar can be done for endpoint context model, security event log/alert model and other entities.

Note: This UDF library can be used for converting to/from JSON.

Model Relationships

The relationships between the data model entities are illustrated below.

Model Relationship

Data Formats

The following data formats are recommended for use with the Spot open data model.

Avro

Avro is the recommended data format due to its schema representation, compatibility checks, and interoperability with Hadoop. Avro supports a pure JSON representation for readability and ease of use but also a binary representation of the data for efficient storage. Avro is the optimal format for streaming-based analytic use cases.

A sample event and corresponding schema representation are detailed below.

{
"event_time":1469562994,
"net_src_ip4":"192.168.1.1",
"net_src_host":"test1.companyA.com",
"net_src_port":1029,
"net_dst_ip4":"192.168.21.22",
"net_dst_host":"test3.companyB.com",
"net_dst_port":443,
"dvc_type":"sshd",
"category":"auth",
"a_proto":"sshd",
"msg":"user:jsmith successfully logged in to test3.companyA.com from 192.168.1.1",
"user_name":"jsmith",
"Severity":3
}

Schema

{
"type": "record",
"doc":"This event records SSHD activity",
"name": "auth",
"fields" :
[
{"name":"event_time", "type":"long", "doc":"Stop time of event""},
{"name":"net_src_ip4", "type":"long", "doc":"Source IP Address"},
{"name":"net_src_host", "type":"string","doc":"Source hostname},
{"name":"net_src_port", "type":"int","doc":"Source port"},
{"name":"net_dst_ip4", "type":"long", "doc"::"Destination IP Address"},
{"name":"net_dst_host", "type":"string", "doc":"Destination IP Address"},
{"name":"net_dst_port", "type":"int", "doc":"Destination port"},
{"name":"dvc_type", "type":"string", "doc":"Source device type"},
{"name":"category", "type":"string","doc":"category/type of event message"},
{"name":"a_proto", "type":"string","doc":"Application or network protocol"},
{"name":"msg", "type":"string","doc":"event message"},
{"name":"severity", "type":"int","doc":"severity of event on scale of 1-10"},
]
}

Parquet

Parquet is a columnar storage format that offers the benefits of compression and efficient columnar data representation and is optimal for batch analytic use cases. More information on parquet can be found here.

It should be noted that conversion from Avro to Parquet is supported. This allows for data collected and analyzed for stream-based use cases to be easily converted to Parquet for longer-term batch analytics.

ODM Resultant Capability - A Singular View

The resultant capability provided by the Spot ODM is the ability to bring together all the security relevant data from the entities referenced (event, user, network, endpoint, etc.) into a singular view that can be used to detect threats more effectively than ever before. The singular view can be leveraged to create new analytic models that were not previously possible and to provide needed context at the event level to effectively determine whether or not there is a threat.

Example - Advanced Threat Modeling

In this example, the ODM is leveraged to build an "event" table for a threat model that uses attributes native to the ODM and derived attributes, which are calculations based on the aggregate data stored in the model. In this context, an "event" table is defined by the attributes to be evaluated for predictive power in identifying threats and the actual attribute values (i.e rows in the table). In the example below, the event table is composed of the following attributes, which are then leveraged to identify threats via a Risk Score analytic model:

  • "net_src_ipv4" - This attribute is native to the security event log component of the ODM and represents the source IP address of the corresponding table row
  • "os" - This attribute is native to the endpoint context component of the ODM and represents the operating system of the endpoint system in the table row
  • SUM (in_bytes + out_bytes) for the last 7 days - "in_bytes" and "out_bytes" are native to the security event log component of the ODM. This derived attribute represents a summation of bytes between the source address and destination domain over the last 7 days
  • "net_dst_domain" - This attribute is native to the security event log component of the ODM and represents the destination domain
  • Days since "creation_date" - "creation_date" is native to the network context component of the ODM and represents the date the referenced domain was registered. This derived attribute calculates the days since the domain was created/registered.
net_src_ipv4 os net_dst_domain Days since "creation_date" SUM (in_bytes + out_bytes) Risk Score (1-100)
10.1.1.10 Microsoft dajdkwk.com 39 3021 MB 99
192.168.8.9 Redhat usatoday.com 3027 2 MB 2
172.16.32.3 Apple box.com 1532 76 MB 10
192.168.4.4 Microsoft kzjkeljr.ru 3 0.9 MB 92

The "Risk Score" attribute represents potential output from a threat detection model based on the attributes and values represented in the "event" table and is provided as an example of what is enabled by the ODM. Can you tell which attributes and values hold predictive power for threat detection?

Example - Singular Data View for Complete Context

The table below demonstrates a logical, "denormalized" view of what is offered by the ODM. In this example, the raw DNS event is mapped to the ODM, which is enriching the DNS event with Endpoint and Network context needed to make a proper threat determination. For large datasets, this type of view is not performant or reasonable to provide with databases upon which legacy security analytic technologies are built. However, this singular/denormalized data representation is feasible with Spot.

RAW DNS EVENT

1463702961,169,10.0.0.101,172.16.36.157,www.kzjkeljr.ru,1,0x00000001,0,49.52.46.49

DNS EVENT + ODM

ODM Attribute Value Description ODM Context Attributes
event_time 1463702961 UTC timestamp of DNS query
length 169 DNS Frame length
net_dst_ip4 10.1.0.11 Destination address (DNS server) Endpoint Context os="Redhat 6.3" host="dns.companyA.com" mac="94:94:26:3:86:16" departm="IT" regs="PCI" vulns="CVE-123, CVE-456,..." ….
net_src_ip4 172.16.32.17 Source address (DNS query initiator) Endpoint Context os="Microsoft Windows 7" host="jsmith.companyA.com" mac="94:94:26:3:86:17" departm="FCE" regs="Corporate" apps="Office 365, Visio 12.2, Chrome 52.0.3…." vulns="CVE-123, CVE-456,..." ….
dns_query www.kzjkeljr.ru DNS query Network Context domain_name="kzjkeljr.ru" Creation_date"2016-08-30" registrar_registration_expiration_date="2016-09-30" registration_country="Russia" ….
dns_class 1 DNS query class
dns_code 0x00000001 DNS response code
dns_answer 49.52.46.49 A record, DNS query response

More Info

Apache Incubator

Apache Spot is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of all newly accepted projects until a further review indicates that the infrastructure, communications, and decision making process have stabilized in a manner consistent with other successful ASF projects. While incubation status is not necessarily a reflection of the completeness or stability of the code, it does indicate that the project has yet to be fully endorsed by the ASF.

The contents of this website are © 2020 Apache Software Foundation under the terms of the Apache License v2. Apache Spot and its logo are trademarks of the Apache Software Foundation.

Top