Alternativer Identifier:
-
Verwandter Identifier:
-
Ersteller/in:
Hertlein, Felix [Institut für Angewandte Informatik und Formale Beschreibungsverfahren]

Naumann, Alexander [Naumann, Alexander]

Philipp, Patrick [Philipp, Patrick]
Beitragende:
-
Titel:
Inv3D: a high-resolution 3D invoice dataset for template-guided single-image document unwarping - Validation split
Weitere Titel:
-
Beschreibung:
(Abstract) Numerous business workflows involve printed forms, such as invoices or receipts, which are often manually digitalized to persistently search or store the data. As hardware scanners are costly and inflexible, smartphones are increasingly used for digitalization. Here, processing algorithms need to deal with prevailing environmental factors, such as shadows or crumples. Current state-of-the-art approaches learn supervised image dewarping models based on pairs of raw images and rectification meshes. The available results show promising predictive accuracies for dewarping, but generated errors still lead to sub-optimal information retrieval. In this paper, we explore the potential of improving dewarping models using additional, structured information in the form of invoice templates. We provide two core contributions: (1) a novel dataset, referred to as Inv3D, comprising synthetic and real-world high-resolution invoice images with structural templates, rectification meshes, and a multiplicity of per-pixel supervision signals and (2) a novel image dewarping algorithm, which extends the state-of-the-art approach GeoTr to leverage structural templates using attention. Our extensive evaluation includes an implementation of DewarpNet and shows that exploiting structured templates can improve the performance for image dewarping. We report superior performance for the proposed algorithm on our new benchmark for all metrics, including an improved local distortion of 26.1 %. We made our new dataset and all code publicly available at https://felixhertlein.github.io/inv3d.
(Technical Remarks) Each sample contains the following files: "flat_document.png" (2200x1700x3, uint8, 0-255), showcasing a document in perfect condition. "flat_information_delta.png" displays all texts which represent invoice data (2200x1700x3, uint8, 0-255). "flat_template.png" is an empty invoice template (2200x1700x3, uint8, 0-255). "flat_text_mask.png" visually presents all texts shown in the given document (2200x1700x3, uint8, 0-255). "warped_angle.png" shows warping-induced x- and y-axis angle (1600x1600x2, float32, -Pi to Pi). "warped_albedo.png" is an albedo map (1600x1600x3, uint8, 0-255). "warped_BM.npz" stores backward mapping, i. e. the realtive pixel shift from warped to normalized image for each pixel shifts (1600x1600x2, float32, 0-1). "warped_curvature.npz" has pixel-wise curvature of the warped document (1600x1600x1, float32, 0-inf). "warped_depth.npz" holds per-pixel depth between camera and document (1600x1600x3, float32, 0-inf). "warped_document.png" displays the warped document (1600x1600x3, uint8, 0-255). "warped_normal.npz" contains warped document normals (1600x1600x3, float32, -inf to inf). "warped_recon.png" features a chess-textured warped document (1600x1600x3, uint8, 0-255). "warped_text_mask.npz" is a boolean text pixel mask (1600x1600x1, bool8, True/False). "warped_UV.npz" stores warped texture coordinates (1600x1600x3, float32, 0-1). "warped_WC.npz" includes document coordinates in the 3D space (1600x1600x3, float32, -inf to inf). For more details see https://github.com/FelixHertlein/inv3d-generator. Released under CC BY-NC-SA 4.0. Excluded files are listed in 'restricted-license-files.txt' (located in record with DOI 10.35097/1730, "Inv3D: a high-resolution 3D invoice dataset for template-driven Single-Image Document Unwarping - Metadata"). These are for academic use only.
Schlagworte:
Document Unwarping
Illumination Correction
Template
OCR
Transformer
Instance Segmentation
Zugehörige Informationen:
-
Sprache:
-
Erstellungsjahr:
Fachgebiet:
Computer Science
Objekttyp:
Dataset
Datenquelle:
-
Verwendete Software:
-
Datenverarbeitung:
-
Erscheinungsjahr:
Lizenz:
CC BY-NC-SA 4.0 ausgenommen 'restricted-license-files.txt'. Diese sind nur zur akademischen Nutzung.
Rechteinhaber/in:
Hertlein, Felix

Naumann, Alexander

Philipp, Patrick
Förderung:
-
Name Speichervolumen Metadaten Upload Aktion

Zugriffe der letzten sechs Monate

Aufrufe der Datenpaket-Seite

340


Downloads des Datenpakets

31


Gesamtstatistik

Zeitraum Aufrufe der Datenpaket-Seite Datenpaket heruntergeladen
Feb. 2024 31 4
Jan. 2024 63 9
Dez. 2023 38 2
Nov. 2023 50 5
Okt. 2023 59 7
Sep. 2023 99 4
Vorher 0 0
Gesamt 340 31
Status:
Publiziert
Eingestellt von:
kitopen
Erstellt am:
Archivierungsdatum:
2023-09-01
Archivgröße:
128,1 GB
Archiversteller:
kitopen
Archiv-Prüfsumme:
cd9c13b52a7f74b8e4bf55fb7b318719 (MD5)
Ende des Embargo-Zeitraums:
-