Persistenter Identifikator:
(DOI) 10.22000/53
Alternativer Identifier:
-
Verwandter Identifier:
-
Ersteller/Autor:
Lindlar, Michelle [TIB] https://orcid.org/0000-0003-3709-5608
Tunnat, Yvonne [ZBW]
Carl, Wilson [OPF]
Beitragende:
-
Titel:
Synthetic PDF Testset for File Format Validation
Weitere Titel:
-
Beschreibung:
(Abstract) This data set presents a corpus of light-weight files designed to test the validation criteria of JHOVE's PDF module against "well-formedness". Test cases are based on structural requirements for PDF files as per ISO 32000-1:2008 standard. The basis for all test files is a single page, one line document with no special features such as linearization. While such a light-weight document only allows to check against a fragment of standard requirements, the focus was put on basic structure violations at the header, trailer, document catalog, page tree node and cross-reference levels. The test set also checks for basic violations at the page node, page resource and stream object level. The accompanying spreadsheet briefly categorizes and describes the test set and includes the outcome when running the testset against JHOVE 1.16, PDF-hul 1.8 as well as Adobe Acrobat Professional XI Pro (11.0.15). The spreadsheet also includes a codecov coverage statistic for the test set in relation to the JHOVE 1.16, PDF-hul 1.8 module. Further information can be found in the paper "A PDF Test-Set for Well-Formedness Validation in JHOVE - The Good, the Bad and the Ugly", published in the proceedings of the 14th International Conference on Digital Preservation (Kyoto, Japan, September 25-29 2017). While the spreadsheet only contains results of running the test set against JHOVE, it can be used as a ground truth for any file format validation process.
Schlagworte:
PDF, file format validation, digital preservation, ISO 32000-1:2008
Zugehörige Informationen:
-
Sprache:
Englisch
Herausgeber:
Michelle Lindlar, Yvonne Tunnat
Erstellungsjahr:
Fachgebiet:
Software Technology
Objekttyp:
Text
Datenquelle:
-
Verwendete Software:
-
Datenverarbeitung:
-
Erscheinungsjahr:
Lizenz:
CC BY-SA 4.0 Attribution-ShareAlike
Rechteinhaber:
Michelle Lindlar
Förderung:
-
Name Speichervolumen Metadaten Upload

Zugriffe der letzten sechs Monate

Aufrufe der Datenpaket-Seite

502


Downloads des Datenpakets

10


Gesamtstatistik

Zeitraum Aufrufe der Datenpaket-Seite Datenpaket heruntergeladen
Dez 2019 20 3
Nov 2019 75 2
Okt 2019 74 0
Sep 2019 98 2
Aug 2019 132 3
Jul 2019 103 0
Vorher 1266 46
Gesamt 1768 56
Status:
PUBLISHED
Eingestellt von:
lindlar
Erstellt am:
Archivierungsdatum:
2017-11-05
Archivgröße:
613,9 kB
Archiversteller:
lindlar
Archiv-Prüfsumme:
b19d4d5668bc8cb3cc8a1a3b0280e246 (MD5)
Ende des Embargo-Zeitraums:
-
Datenpaket Status:
PUBLISHED
Dieses Datenpaket zitieren:
Lindlar, Michelle; Tunnat, Yvonne; Carl, Wilson (2017): Synthetic PDF Testset for File Format Validation. Michelle Lindlar, Yvonne Tunnat.
DOI:
Export: