Security-Datasets

Re-play Security Events

1,666

245

1,666

View on GitHub

Top Related Projects

Azure-Sentinel

5,146

Cloud-native SIEM for intelligent security analytics for your entire enterprise.

MISP

5,775

MISP (core software) - Open Source Threat Intelligence and Sharing Platform

TheHive

3,712

TheHive: a Scalable, Open Source and Free Security Incident Response Platform

Quick Overview

The OTRF/Security-Datasets repository is a collection of security-related datasets and logs for research, threat hunting, and data analysis. It provides a variety of data sources, including Windows event logs, network traffic, and application logs, to help security professionals and researchers analyze and understand different security scenarios.

Pros

Diverse collection of security-related datasets from various sources
Well-organized and categorized for easy navigation and access
Regularly updated with new datasets and contributions from the community
Includes detailed metadata and context for each dataset

Cons

Some datasets may be large and require significant storage and processing power
Not all datasets are consistently formatted, which may require additional preprocessing
Limited documentation on how to effectively use or analyze some of the datasets
Some datasets may be outdated or no longer relevant to current security landscapes

Getting Started

To get started with the OTRF/Security-Datasets repository:

Clone the repository:

git clone https://github.com/OTRF/Security-Datasets.git

Navigate to the desired dataset folder:

cd Security-Datasets/datasets/<category>/<dataset_name>

Read the README.md file in the dataset folder for specific information about the dataset and how to use it.
Download or access the dataset files as needed for your analysis or research.

Note: Some datasets may require additional tools or software for processing and analysis. Refer to the dataset-specific documentation for more information.

Competitor Comparisons

Azure-Sentinel

5,146

Cloud-native SIEM for intelligent security analytics for your entire enterprise.

Pros of Azure-Sentinel

Comprehensive cloud-native SIEM and SOAR solution
Extensive integration with Azure services and third-party tools
Active development and regular updates from Microsoft

Cons of Azure-Sentinel

Requires Azure subscription and associated costs
Steeper learning curve for users unfamiliar with Azure ecosystem
Limited customization options compared to open-source alternatives

Code Comparison

Security-Datasets:

{
  "title": "Windows Security Event Log",
  "description": "Windows Security events collected from a Windows workstation",
  "platform": "Windows",
  "log_source": "Security",
  "log_name": "Security.evtx",
  "file_type": "evtx"
}

Azure-Sentinel:

id: 123456789
name: Suspicious PowerShell Command Line
description: Detects suspicious PowerShell command line parameters
severity: Medium
requiredDataConnectors:
  - connectorId: WindowsSecurityEvents
    dataTypes:
      - SecurityEvent
queryFrequency: 1h
queryPeriod: 1h

The Security-Datasets repository focuses on providing sample datasets for security analysis, while Azure-Sentinel offers a complete SIEM solution with detection rules, analytics, and automation capabilities. Security-Datasets is more suitable for research and testing, whereas Azure-Sentinel is designed for production environments and real-time threat detection.

security_content

1,465

Splunk Security Content

Pros of security_content

Extensive collection of pre-built detection rules and analytics
Regular updates and contributions from the Splunk security community
Includes machine learning models for advanced threat detection

Cons of security_content

Primarily focused on Splunk-specific content and formats
May require more setup and configuration for non-Splunk environments
Less diverse in terms of raw datasets compared to Security-Datasets

Code Comparison

Security-Datasets example (YAML):

title: Windows Security Event Log
platform: Windows
log_source:
  product: Windows
  service: Security

security_content example (YAML):

name: Detect Suspicious Process Creation
search: |
  index=windows sourcetype=WinEventLog:Security EventCode=4688
  | stats count by NewProcessName

Both repositories use YAML for configuration, but security_content focuses on Splunk search queries, while Security-Datasets provides more general metadata about datasets.

Summary

Security-Datasets offers a broader range of security-related datasets across various platforms, making it more versatile for different security tools and environments. security_content, on the other hand, provides a rich set of pre-built detection rules and analytics specifically tailored for Splunk environments, making it more immediately actionable for Splunk users but potentially less flexible for other platforms.

MISP

5,775

MISP (core software) - Open Source Threat Intelligence and Sharing Platform

Pros of MISP

Comprehensive threat intelligence platform with extensive sharing capabilities
Active community and regular updates
Supports various data formats and integrations with other security tools

Cons of MISP

Steeper learning curve due to its complexity
Requires more resources to set up and maintain
May be overkill for smaller organizations or simpler use cases

Code Comparison

MISP (Python):

from pymisp import PyMISP
misp = PyMISP('https://misp.example.com', 'YOUR_API_KEY')
event = misp.new_event(info='Suspicious Activity', distribution=0, threat_level_id=2, analysis=0)

Security-Datasets (No specific code, as it's a collection of datasets):

# No direct code comparison available
# Security-Datasets provides pre-formatted datasets for analysis

Summary

MISP is a powerful threat intelligence platform with extensive features, while Security-Datasets offers pre-formatted datasets for security analysis. MISP provides more comprehensive capabilities but requires more setup and maintenance. Security-Datasets is simpler to use but lacks the advanced sharing and analysis features of MISP. The choice between them depends on specific organizational needs and resources available for threat intelligence management.

TheHive

3,712

TheHive: a Scalable, Open Source and Free Security Incident Response Platform

Pros of TheHive

Comprehensive incident response platform with case management features
Integrates with other security tools and supports automation workflows
Active community and regular updates

Cons of TheHive

Steeper learning curve and more complex setup
Requires more resources to run and maintain
Focused on incident response rather than providing diverse security datasets

Code Comparison

TheHive (Python API example):

from thehive4py.api import TheHiveApi
from thehive4py.models import Case

api = TheHiveApi('http://localhost:9000', 'api_key')
case = Case(title='Suspicious Activity', description='Investigating unusual network traffic')
response = api.create_case(case)

Security-Datasets (Sample dataset usage):

import pandas as pd

df = pd.read_csv('https://raw.githubusercontent.com/OTRF/Security-Datasets/master/datasets/atomic/windows/process_creation/process_creation_win_mshta_javascript.csv')
suspicious_processes = df[df['process_name'] == 'mshta.exe']

While TheHive focuses on incident response management with API interactions, Security-Datasets provides ready-to-use datasets for analysis and testing. TheHive is more suitable for operational security teams, whereas Security-Datasets is ideal for researchers and analysts looking for pre-compiled security data.

detection-rules

2,379

Pros of detection-rules

Focuses on providing ready-to-use detection rules for Elastic Security
Offers a comprehensive set of rules covering various attack techniques
Includes a rule testing framework for validation and quality assurance

Cons of detection-rules

Limited to Elastic Security ecosystem, less versatile for other platforms
Requires Elastic Stack knowledge for optimal use and customization
May have a steeper learning curve for users unfamiliar with Elastic products

Code Comparison

detection-rules:

name: Suspicious Process Creation in Unusual Location
type: eql
risk_score: 50
description: Detects process creation in unusual directories
query: |
  process where event.type == "creation" and
    not process.executable : ("C:\\Windows\\*", "C:\\Program Files\\*")

Security-Datasets:

{
  "EventID": 1,
  "Image": "C:\\Users\\Admin\\AppData\\Local\\Temp\\suspicious.exe",
  "CommandLine": "C:\\Users\\Admin\\AppData\\Local\\Temp\\suspicious.exe -enc payload",
  "ParentImage": "C:\\Windows\\System32\\cmd.exe"
}

Summary

detection-rules provides a robust set of detection rules specifically for Elastic Security, with built-in testing capabilities. Security-Datasets offers a broader collection of security-related datasets for various platforms and use cases. While detection-rules is more focused and integrated with Elastic products, Security-Datasets provides greater flexibility for different security tools and analysis approaches.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Security Datasets

The Security Datasets project is an open-source initiatve that contributes malicious and benign datasets, from different platforms, to the infosec community to expedite data analysis and threat research.

Docs

https://securitydatasets.com

Goals

Provide open portable datasets to expedite the development of data analytics.
Facilitate and expedite adversary techniques simulation.
Allow security analysts around the world to test their skills with real data.
Improve the testing and validation of detection analytics in an easier, practical, modular and more affordable way.
Enable data scientists to have labeled and unlabeled data for initial research and features development.
Help the community map datasets to other open source projects such as Sigma, Atomic Red Team, Threat Hunter Playbook (Jupyter Notebooks) and MITRE ATT&CK.
Provide datasets for other social/community events such as Capture The Flags (CTFs) or hackathons to encourage collaboration.

Projects Using `Security Datasets`

ThreatHunter-Playbook

Authors

Roberto Rodriguez @Cyb3rWard0g
Jose Luis Rodriguez @Cyb3rPandaH

Contributing

Help us build the largest library of datasets for the InfoSec community!. Learn more about how you could do it here!

License: GPL-3.0

Security Datasets's GNU General Public License

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Getting Started

Competitor Comparisons

Pros of Azure-Sentinel

Cons of Azure-Sentinel

Code Comparison

Pros of security_content

Cons of security_content

Code Comparison

Summary

Pros of MISP

Cons of MISP

Code Comparison

Summary

Pros of TheHive

Cons of TheHive

Code Comparison

Pros of detection-rules

Cons of detection-rules

Code Comparison

Summary

Convert designs to code with AI

README

Security Datasets

Docs

Goals

Projects Using Security Datasets

Authors

Contributing

License: GPL-3.0

Top Related Projects

Convert designs to code with AI

Projects Using `Security Datasets`