The choice of file formats should be taken into consideration at the beginning of a research project, since the long-term preservation and reusability of datasets depends on the use of suitable file formats.
File formats define the syntax (rules for the arrangement of characters) and semantics (meaning of characters) of a file. Therefore, knowledge of the file format is essential to the ability to interpret the information stored in a file correctly.
File formats are usually developed for a specific purpose, for example for storing text or image information or executable applications.
File formats can be protected by copyright ("proprietary") and thus promote dependencies on certain programs or software manufacturers. In open formats, on the other hand, the specifications are made public. Open formats can be implemented by free and open source software, reducing the commitment to specific programs or software vendors.
Choosing a file format
By choosing a suitable file format, you can ensure the preservation and reusability of your data records. Ideally, you should choose a format that meets the requirements of long-term preservation (e.g. good documentation, open standard) and optimum reusability (e.g. high degree of distribution of the required software).
In general, the format choice should correspond to national and international requirements of the respective discipline.
With RADAR you can archive and publish research data in any file format. However, in order to ensure the best possible conditions for long-term preservation and reusability of your data, we recommend that you use suitable formats.
|Data type||Recommended formats||Other suitable formats||Unsuitable formats|
|Text Document||XML-based formats such as Microsoft Office XML (.docx) and Open Office XML (.sxw), Open Document Format (.odt) and structured text/markup (.xml, .sgml, .html, .dtd, .xsd and others)||Portable Document Format PDF, PDF/A-1, PDF/A-2 (.pdf), Rich Text Format (.rtf) Reiner Text, Plain Text (.txt)||Microsoft Word (.doc)|
|Tables||Comma Separated Values (.csv), XML-based formats such as Microsoft Office XML (.xlsx)||Portable Document Format PDF, PDF/A-1, PDF/A-2 (.pdf)||Microsoft Excel (.xls)|
|Databases||ANSI SQL (.sql), Comma Separated Values (.csv)|
|Statistical Information||SPSS Portable (.por) SAS transport (.sas) STATA (.dta)|
|Raster Graphic||TIFF v. 6, uncompressed (.tif, .tiff) GeoTIFF (.geotiff for georeferenced images) Adobe Digital Negative (.dng; for raw data from cameras)||Portable Network Graphics (.png) Joint Photographic Expert Group (.jpeg, .jpg) Graphics Interchange Format (.gif) Bit-Mapped Graphics Format (Microsoft) (.bmp) Photoshop (Adobe) (.psd) CorelPaint (.cpt) JPEG2000 (.jp2, .jpx) RAW image format (.nef, .crw and others)|
|Vector Graphic||Scalable Vector Graphics (.svg)||Portable Document Format PDF, PDF/A-1, PDF/A-2 (.pdf)|
|Video||Lossless AVI (.avi) MPEG-1, MPEG-2 (.mpg, .mpeg) MPEG-4, H264 (.mp4) FLV (.flv)|
|Audio||WAVE (.wav) AIFF (.aiff)|
|Computer Aided Design (CAD)||AutoCAD DWG (Version 2000), DXF (.dfx, Release 12/14)|
|Geographic Information||MapInfo Interchange Format (.mif, .mid), alternatively Esri Shapefile (.shp + .shx + .dbf) for vector data, GeoTIFF for grid data|
|Virtual Reality, 3D||X3D (.x3d) OBJ (.obj) COLLADA (.dae) PLY (.ply)||Virtual Reality Modeling Language (.vrml and .avi, .mpg, .jpeg) Universal 3D Format (.u3d and .avi, .mpg, .jpeg) STL (.stl and .jpeg) DXF (.dfx and .jpeg)|
|Nuclear Magnetic Resonance (NMR)||device-specific data formats (Varian/Agilent, Bruker, Jeol)||Often subject-specific/proprietary formats are used. These are specified, for example, by a discipline-specific guideline/manufacturer. It is therefore necessary to use them in the relevant department.|
|Difference Gel Electrophoresis (2D-DIGE)||TIFF (.tif, .tiff) Gel Image (.gel)||Often subject-specific/proprietary formats are used. These are specified, for example, by a discipline-specific guideline/manufacturer. It is therefore necessary to use them in the relevant department.|
The suggestions are based on international standards such as recommendations of the Library of Congress and current developments in the field of research data management and are regularly reviewed and updated.