<?xml version="1.0" encoding="UTF-8" ?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-05-25T14:53:16Z</responseDate><request identifier="10.35097/bx5337kcykte438h" metadataPrefix="datacite" verb="GetRecord">https://www.radar-service.eu/oai/OAIHandler</request><GetRecord><record><header><identifier>10.35097/bx5337kcykte438h</identifier><datestamp>2026-05-19T10:30:36Z</datestamp><setSpec>radar4kit</setSpec></header><metadata><resource xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://datacite.org/schema/kernel-4" xsi:schemaLocation="http://datacite.org/schema/kernel-4 https://schema.datacite.org/meta/kernel-4.6/metadata.xsd">
  <identifier identifierType="DOI">10.35097/bx5337kcykte438h</identifier>
  <creators>
    <creator>
      <creatorName>Kellerer, Nicolai</creatorName>
      <givenName>Nicolai</givenName>
      <familyName>Kellerer</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0009-0008-1298-4787</nameIdentifier>
      <affiliation>Institut für Automation und angewandte Informatik (IAI), Karlsruher Institut für Technologie (KIT)</affiliation>
    </creator>
    <creator>
      <creatorName>Hagenmeyer, Veit</creatorName>
      <givenName>Veit</givenName>
      <familyName>Hagenmeyer</familyName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="http://orcid.org/">0000-0002-3572-9083</nameIdentifier>
      <affiliation>Institut für Automation und angewandte Informatik (IAI), Karlsruher Institut für Technologie (KIT)</affiliation>
    </creator>
  </creators>
  <titles>
    <title>Dataset: GridStratLLM: Agent Framework for Coordinated Cyberattacks on the Smart Grid with Large Language Models</title>
  </titles>
  <publisher>Karlsruhe Institute of Technology</publisher>
  <dates>
    <date dateType="Created">2026</date>
  </dates>
  <publicationYear>2026</publicationYear>
  <subjects>
    <subject>Computer Science</subject>
    <subject>Attack Plan</subject>
    <subject>LLM</subject>
    <subject>Smart Grid</subject>
    <subject>Modbus</subject>
    <subject>S7</subject>
    <subject>Data Modification</subject>
  </subjects>
  <resourceType resourceTypeGeneral="Dataset"/>
  <rightsList>
    <rights rightsURI="info:eu-repo/semantics/openAccess">Open Access</rights>
    <rights schemeURI="https://spdx.org/licenses/" rightsIdentifierScheme="SPDX" rightsIdentifier="CC-BY-4.0" rightsURI="https://creativecommons.org/licenses/by/4.0/legalcode">Creative Commons Attribution 4.0 International</rights>
  </rightsList>
  <contributors>
    <contributor contributorType="RightsHolder">
      <contributorName>Kellerer, Nicolai</contributorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="https://orcid.org/">0009-0008-1298-4787</nameIdentifier>
    </contributor>
    <contributor contributorType="RightsHolder">
      <contributorName>Hagenmeyer, Veit</contributorName>
      <nameIdentifier nameIdentifierScheme="ORCID" schemeURI="https://orcid.org/">0000-0002-3572-9083</nameIdentifier>
    </contributor>
  </contributors>
  <descriptions>
    <description descriptionType="Abstract">A new cybersecurity threat emerges: Recent Large Language Models (LLMs) with advanced reasoning and tool calling enable even attackers lacking expert knowledge to coordinate large-scale attacks on Smart Grids (SG).&#13;
These LLMs can orchestrate multiple malware instances, select appropriate signals and deltas, and execute data-modification attacks on the S7 and Modbus protocols.&#13;
Thereby, the automatically generated attack progresses towards the targeted unsafe state and evades detection by the Intrusion Detection System (IDS).&#13;
To assess this emerging threat, we introduce GridStratLLM, a novel agent framework for coordinated attacks on industrial networks.&#13;
Furthermore, we evaluate attack plans generated by four frontier Large Language Models using the open-source Network Security Monitor (NSM) Zeek and a commercial NSM.&#13;
Finally, we contribute a dataset recorded in a Hardware-in-the-Loop (HIL) testbed to support the training of IDS solutions against these attacks.&#13;
The dataset is 24 hours and 11 minutes long, containing 436 attacks with 212 coordinated attacks.</description>
    <description descriptionType="TechnicalInfo"># GridStratLLM Dataset&#13;
&#13;
This dataset contains coordinated cyberattacks generated using the GridStratLLM agent framework against a hardware-in-the-loop testbed of a distributed generation environment.&#13;
It covers one normal operation and five attack datasets, each using a different LLM.&#13;
Every dataset captures network traffic, process data from SCADA, log messages, and metadata from the attack scripts.&#13;
&#13;
Paper: https://doi.org/10.1145/3765611.3815147&#13;
GridStratLLM source code: https://github.com/nbke/GridStratLLM&#13;
&#13;
Each dataset directory contains:&#13;
- `attack_session_llm.parquet`: LLM prompts, plans, chain-of-thought reasoning, token usage&#13;
- `attack_worker.parquet`: Network interface info (MAC, IP, interface name)&#13;
- `packet_metadata.parquet`: Packet metadata (timestamps, addresses, ports, protocol)&#13;
- `packets.pcap`: Raw packet capture&#13;
- `attack_datamod_history.parquet`: Packet modification log with delta values&#13;
- `attack_exec_steps.parquet`: Attack execution timeline per worker&#13;
- `process_data.parquet`: WinCC SCADA data&#13;
- `logs.parquet`: Logs from PLC 1512 and PLC 1516&#13;
&#13;
See `network.json` for a list of network devices. `modbus.json` contains a mapping of Modbus registers to signal names.&#13;
`s7_connections.json` contains all signals transmitted via S7.&#13;
&#13;
## Parquet Files&#13;
&#13;
### attack_session_llm.parquet&#13;
&#13;
The structure of the `plan` column is explained in appendix E of the paper.&#13;
`coordinated` is true if an attack session uses multiple attack workers.&#13;
The column `all_messages` may be NULL due to a data capture issue.&#13;
&#13;
### packet_metadata.parquet&#13;
&#13;
The entries in packet_metadata.parquet are in the same order as packets.pcap.&#13;
If the UUID of a packet (id in packet_metadataparquet) is contained in the packet_id column in attack_datamod_history.parquet, then the packet originates from an attack script.&#13;
&#13;
### SCADA process data: `process_data.parquet`&#13;
&#13;
PV:&#13;
- Control Signals: on_off&#13;
- Monitor Signals: temp_air, poa_direct, wind_speed, poa_diffuse, cell_temperature, inverter_ac_power, inverter_dc_power&#13;
&#13;
Wind:&#13;
- Control Signals: blade_rotation, rotation_speed&#13;
- Monitor Signals: power, height, pressure, wind_speed_a, wind_speed_b, temperature_a, temperature_b&#13;
&#13;
Battery:&#13;
- Control Signals: on_off, target_power&#13;
- Monitor Signals: current, voltage, temperature, state_of_charge, actual_charge_power&#13;
&#13;
### Log messages: `logs.parquet`&#13;
&#13;
Log messages are in German and only sent when the signal value changes. Example:&#13;
```&#13;
Wertänderung "SysLogDaten".Inverter_ac_power Altwert: 239,0 aktueller Wert: 20,0 CPU:SECCPU16&#13;
```&#13;
&#13;
## DuckDB File: `merged_datasets.duckdb`&#13;
&#13;
The combined `merged_datasets.duckdb` file contains all data from the Parquet files plus raw packet data.&#13;
Differences from the Parquet files:&#13;
&#13;
- `process_data` is split into four tables: `wind_process_data`, `pv_process_data`, `battery_process_data`, `demand_process_data` (one column per signal instead of JSON `values`).&#13;
- `attack_session_llm` is renamed to `attack_session`.&#13;
- `attack_exec_steps` is renamed to `exec_steps` and `setup_duration` is stored as an `interval` instead of a bigint.&#13;
- `packet_metadata` is renamed to `packets`. The packets from the PCAP files are stored in the `raw_packet` BLOB column.&#13;
   The `l2_flow_id`, `l3_flow_id`, and `l4_flow_id` columns are omitted, which are always null in the Parquet files.&#13;
- `id` columns use `uuid` type instead of `blob`.&#13;
- Categorical columns (`model_name`, `transport`, `state`, `kind`, etc.) use DuckDB `enum` types instead of `string`.&#13;
&#13;
Column name prefix for process data tables:&#13;
- `C_`: Control signals (commands sent to power plants)&#13;
- `M_`: Monitor signals (measured values sent to SCADA)&#13;
&#13;
# Funding&#13;
&#13;
This research is supported in part by funding from the topic Engineering Secure Systems of the Helmholtz Association (HGF) and by KASTEL Security Research Labs (structure 46.23.02).</description>
  </descriptions>
  <relatedIdentifiers>
    <relatedIdentifier relatedIdentifierType="URL" relationType="IsIdenticalTo">https://publikationen.bibliothek.kit.edu/1000193145</relatedIdentifier>
  </relatedIdentifiers>
  <sizes>
    <size>6,5 GB</size>
  </sizes>
  <formats>
    <format>application/x-tar</format>
  </formats>
</resource></metadata></record></GetRecord></OAI-PMH>