GridStratLLM Dataset
This dataset contains coordinated cyberattacks generated using the GridStratLLM agent framework against a hardware-in-the-loop testbed of a distributed generation environment.
It covers one normal operation and five attack datasets, each using a different LLM.
Every dataset captures network traffic, process data from SCADA, log messages, and metadata from the attack scripts.
Paper: https://doi.org/10.1145/3765611.3815147
GridStratLLM source code: https://github.com/nbke/GridStratLLM
Each dataset directory contains:
attack_session_llm.parquet: LLM prompts, plans, chain-of-thought reasoning, token usage
attack_worker.parquet: Network interface info (MAC, IP, interface name)
packet_metadata.parquet: Packet metadata (timestamps, addresses, ports, protocol)
packets.pcap: Raw packet capture
attack_datamod_history.parquet: Packet modification log with delta values
attack_exec_steps.parquet: Attack execution timeline per worker
process_data.parquet: WinCC SCADA data
logs.parquet: Logs from PLC 1512 and PLC 1516
See network.json for a list of network devices. modbus.json contains a mapping of Modbus registers to signal names.
s7_connections.json contains all signals transmitted via S7.
Parquet Files
attack_session_llm.parquet
The structure of the plan column is explained in appendix E of the paper.
coordinated is true if an attack session uses multiple attack workers.
The column all_messages may be NULL due to a data capture issue.
packet_metadata.parquet
The entries in packet_metadata.parquet are in the same order as packets.pcap.
If the UUID of a packet (id in packet_metadataparquet) is contained in the packet_id column in attack_datamod_history.parquet, then the packet originates from an attack script.
SCADA process data: process_data.parquet
PV:
- Control Signals: on_off
- Monitor Signals: temp_air, poa_direct, wind_speed, poa_diffuse, cell_temperature, inverter_ac_power, inverter_dc_power
Wind:
- Control Signals: blade_rotation, rotation_speed
- Monitor Signals: power, height, pressure, wind_speed_a, wind_speed_b, temperature_a, temperature_b
Battery:
- Control Signals: on_off, target_power
- Monitor Signals: current, voltage, temperature, state_of_charge, actual_charge_power
Log messages: logs.parquet
Log messages are in German and only sent when the signal value changes. Example:
Wertänderung "SysLogDaten".Inverter_ac_power Altwert: 239,0 aktueller Wert: 20,0 CPU:SECCPU16
DuckDB File: merged_datasets.duckdb
The combined merged_datasets.duckdb file contains all data from the Parquet files plus raw packet data.
Differences from the Parquet files:
process_data is split into four tables: wind_process_data, pv_process_data, battery_process_data, demand_process_data (one column per signal instead of JSON values).
attack_session_llm is renamed to attack_session.
attack_exec_steps is renamed to exec_steps and setup_duration is stored as an interval instead of a bigint.
packet_metadata is renamed to packets. The packets from the PCAP files are stored in the raw_packet BLOB column.
The l2_flow_id, l3_flow_id, and l4_flow_id columns are omitted, which are always null in the Parquet files.
id columns use uuid type instead of blob.
- Categorical columns (
model_name, transport, state, kind, etc.) use DuckDB enum types instead of string.
Column name prefix for process data tables:
C_: Control signals (commands sent to power plants)
M_: Monitor signals (measured values sent to SCADA)
Funding
This research is supported in part by funding from the topic Engineering Secure Systems of the Helmholtz Association (HGF) and by KASTEL Security Research Labs (structure 46.23.02).