Top Related Projects
Cloud-native SIEM for intelligent security analytics for your entire enterprise.
Splunk Security Content
MISP (core software) - Open Source Threat Intelligence and Sharing Platform
TheHive: a Scalable, Open Source and Free Security Incident Response Platform
Quick Overview
The OTRF/Security-Datasets repository is a collection of security-related datasets and logs for research, threat hunting, and data analysis. It provides a variety of data sources, including Windows event logs, network traffic, and application logs, to help security professionals and researchers analyze and understand different security scenarios.
Pros
- Diverse collection of security-related datasets from various sources
- Well-organized and categorized for easy navigation and access
- Regularly updated with new datasets and contributions from the community
- Includes detailed metadata and context for each dataset
Cons
- Some datasets may be large and require significant storage and processing power
- Not all datasets are consistently formatted, which may require additional preprocessing
- Limited documentation on how to effectively use or analyze some of the datasets
- Some datasets may be outdated or no longer relevant to current security landscapes
Getting Started
To get started with the OTRF/Security-Datasets repository:
-
Clone the repository:
git clone https://github.com/OTRF/Security-Datasets.git
-
Navigate to the desired dataset folder:
cd Security-Datasets/datasets/<category>/<dataset_name>
-
Read the README.md file in the dataset folder for specific information about the dataset and how to use it.
-
Download or access the dataset files as needed for your analysis or research.
Note: Some datasets may require additional tools or software for processing and analysis. Refer to the dataset-specific documentation for more information.
Competitor Comparisons
Cloud-native SIEM for intelligent security analytics for your entire enterprise.
Pros of Azure-Sentinel
- Comprehensive cloud-native SIEM and SOAR solution
- Extensive integration with Azure services and third-party tools
- Active development and regular updates from Microsoft
Cons of Azure-Sentinel
- Requires Azure subscription and associated costs
- Steeper learning curve for users unfamiliar with Azure ecosystem
- Limited customization options compared to open-source alternatives
Code Comparison
Security-Datasets:
{
"title": "Windows Security Event Log",
"description": "Windows Security events collected from a Windows workstation",
"platform": "Windows",
"log_source": "Security",
"log_name": "Security.evtx",
"file_type": "evtx"
}
Azure-Sentinel:
id: 123456789
name: Suspicious PowerShell Command Line
description: Detects suspicious PowerShell command line parameters
severity: Medium
requiredDataConnectors:
- connectorId: WindowsSecurityEvents
dataTypes:
- SecurityEvent
queryFrequency: 1h
queryPeriod: 1h
The Security-Datasets repository focuses on providing sample datasets for security analysis, while Azure-Sentinel offers a complete SIEM solution with detection rules, analytics, and automation capabilities. Security-Datasets is more suitable for research and testing, whereas Azure-Sentinel is designed for production environments and real-time threat detection.
Splunk Security Content
Pros of security_content
- Extensive collection of pre-built detection rules and analytics
- Regular updates and contributions from the Splunk security community
- Includes machine learning models for advanced threat detection
Cons of security_content
- Primarily focused on Splunk-specific content and formats
- May require more setup and configuration for non-Splunk environments
- Less diverse in terms of raw datasets compared to Security-Datasets
Code Comparison
Security-Datasets example (YAML):
title: Windows Security Event Log
platform: Windows
log_source:
product: Windows
service: Security
security_content example (YAML):
name: Detect Suspicious Process Creation
search: |
index=windows sourcetype=WinEventLog:Security EventCode=4688
| stats count by NewProcessName
Both repositories use YAML for configuration, but security_content focuses on Splunk search queries, while Security-Datasets provides more general metadata about datasets.
Summary
Security-Datasets offers a broader range of security-related datasets across various platforms, making it more versatile for different security tools and environments. security_content, on the other hand, provides a rich set of pre-built detection rules and analytics specifically tailored for Splunk environments, making it more immediately actionable for Splunk users but potentially less flexible for other platforms.
MISP (core software) - Open Source Threat Intelligence and Sharing Platform
Pros of MISP
- Comprehensive threat intelligence platform with extensive sharing capabilities
- Active community and regular updates
- Supports various data formats and integrations with other security tools
Cons of MISP
- Steeper learning curve due to its complexity
- Requires more resources to set up and maintain
- May be overkill for smaller organizations or simpler use cases
Code Comparison
MISP (Python):
from pymisp import PyMISP
misp = PyMISP('https://misp.example.com', 'YOUR_API_KEY')
event = misp.new_event(info='Suspicious Activity', distribution=0, threat_level_id=2, analysis=0)
Security-Datasets (No specific code, as it's a collection of datasets):
# No direct code comparison available
# Security-Datasets provides pre-formatted datasets for analysis
Summary
MISP is a powerful threat intelligence platform with extensive features, while Security-Datasets offers pre-formatted datasets for security analysis. MISP provides more comprehensive capabilities but requires more setup and maintenance. Security-Datasets is simpler to use but lacks the advanced sharing and analysis features of MISP. The choice between them depends on specific organizational needs and resources available for threat intelligence management.
TheHive: a Scalable, Open Source and Free Security Incident Response Platform
Pros of TheHive
- Comprehensive incident response platform with case management features
- Integrates with other security tools and supports automation workflows
- Active community and regular updates
Cons of TheHive
- Steeper learning curve and more complex setup
- Requires more resources to run and maintain
- Focused on incident response rather than providing diverse security datasets
Code Comparison
TheHive (Python API example):
from thehive4py.api import TheHiveApi
from thehive4py.models import Case
api = TheHiveApi('http://localhost:9000', 'api_key')
case = Case(title='Suspicious Activity', description='Investigating unusual network traffic')
response = api.create_case(case)
Security-Datasets (Sample dataset usage):
import pandas as pd
df = pd.read_csv('https://raw.githubusercontent.com/OTRF/Security-Datasets/master/datasets/atomic/windows/process_creation/process_creation_win_mshta_javascript.csv')
suspicious_processes = df[df['process_name'] == 'mshta.exe']
While TheHive focuses on incident response management with API interactions, Security-Datasets provides ready-to-use datasets for analysis and testing. TheHive is more suitable for operational security teams, whereas Security-Datasets is ideal for researchers and analysts looking for pre-compiled security data.
Pros of detection-rules
- Focuses on providing ready-to-use detection rules for Elastic Security
- Offers a comprehensive set of rules covering various attack techniques
- Includes a rule testing framework for validation and quality assurance
Cons of detection-rules
- Limited to Elastic Security ecosystem, less versatile for other platforms
- Requires Elastic Stack knowledge for optimal use and customization
- May have a steeper learning curve for users unfamiliar with Elastic products
Code Comparison
detection-rules:
name: Suspicious Process Creation in Unusual Location
type: eql
risk_score: 50
description: Detects process creation in unusual directories
query: |
process where event.type == "creation" and
not process.executable : ("C:\\Windows\\*", "C:\\Program Files\\*")
Security-Datasets:
{
"EventID": 1,
"Image": "C:\\Users\\Admin\\AppData\\Local\\Temp\\suspicious.exe",
"CommandLine": "C:\\Users\\Admin\\AppData\\Local\\Temp\\suspicious.exe -enc payload",
"ParentImage": "C:\\Windows\\System32\\cmd.exe"
}
Summary
detection-rules provides a robust set of detection rules specifically for Elastic Security, with built-in testing capabilities. Security-Datasets offers a broader collection of security-related datasets for various platforms and use cases. While detection-rules is more focused and integrated with Elastic products, Security-Datasets provides greater flexibility for different security tools and analysis approaches.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Security Datasets
The Security Datasets
project is an open-source initiatve that contributes malicious and benign datasets, from different platforms, to the infosec community to expedite data analysis and threat research.
Docs
Goals
- Provide open portable datasets to expedite the development of data analytics.
- Facilitate and expedite adversary techniques simulation.
- Allow security analysts around the world to test their skills with real data.
- Improve the testing and validation of detection analytics in an easier, practical, modular and more affordable way.
- Enable data scientists to have labeled and unlabeled data for initial research and features development.
- Help the community map datasets to other open source projects such as Sigma, Atomic Red Team, Threat Hunter Playbook (Jupyter Notebooks) and MITRE ATT&CK.
- Provide datasets for other social/community events such as Capture The Flags (CTFs) or hackathons to encourage collaboration.
Projects Using Security Datasets
Authors
- Roberto Rodriguez @Cyb3rWard0g
- Jose Luis Rodriguez @Cyb3rPandaH
Contributing
Help us build the largest library of datasets for the InfoSec community!. Learn more about how you could do it here!
License: GPL-3.0
Top Related Projects
Cloud-native SIEM for intelligent security analytics for your entire enterprise.
Splunk Security Content
MISP (core software) - Open Source Threat Intelligence and Sharing Platform
TheHive: a Scalable, Open Source and Free Security Incident Response Platform
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot