Alternativer Identifier:
(KITopen-DOI) 10.5445/IR/1000142435
Verwandter Identifier:
-
Ersteller/in:
Demir, Nurullah [Demir, Nurullah]

Große-Kampmann, Matteo [Große-Kampmann, Matteo]

Urban, Tobias [Urban, Tobias]

Wressnegger, Christian [Wressnegger, Christian]

Holz, Thorsten [Holz, Thorsten]

Pohlmann, Norbert [Pohlmann, Norbert]
Beitragende:
-
Titel:
Reproducibility and Replicability of Web Measurement Studies
Weitere Titel:
-
Beschreibung:
(Abstract) Web measurement studies can shed light on not yet fully understood phenomena and thus are essential for analyzing how the modern Web works. This often requires building new and adjusting existing crawling setups, which has led to a wide variety of analysis tools for different (but related) aspects. If these efforts are not sufficiently documented, the reproducibility and replicability of the measurements may suffer---two properties that are crucial to sustainable research. In this paper, we survey 117 recent research papers to derive best practices for Web-based measurement studies and specify criteria that need to be met in practice. When applying these criteria to the surveyed papers, we find that the experimental setup and other aspects essential to reproducing and replicating results are often missing. We underline the criticality of this finding by performing a large-scale Web measurement study on4.5 million pages with 24 different measurement setups to demonstrate the influence of the individual criteria. Our experiments show that slight differences in the experimental setup directly affect the overall results and must be documented accurately and carefully.
(Technical Remarks) This dataset holds additional material to the paper "Reproducibility and Replicability of Web Measurement Studies" submitted to the ACM Web Conference 2022. It contains the measurement data (requests, responses, visited URLs, cookies, and LocalStorage objects) we have collected from 25 different profiles. All data is in CSV format (exported from the Google BigQuery service) and can be imported into any database. Table sizes (according to Google BigQuery): Cookies: 2.8 GB LocalStorage: 6 GB Requests: 626.6 GB Responses: 501.6 GB URL: 38 MB Visits: 935 MB Note: Although our paper does not include the analysis for the collected Cookie and LocalStorage objects, we publish them for further studies. You can find further information about our study on [our repository in GitHub](https://github.com/awareseven/Reproducibility-and-Replicability-of-Web-Measurement-Studies).
Schlagworte:
Web measurements
reproducibility
replicability
security
privacy
Zugehörige Informationen:
-
Sprache:
-
Erstellungsjahr:
Fachgebiet:
Computer Science
Objekttyp:
Dataset
Datenquelle:
-
Verwendete Software:
-
Datenverarbeitung:
-
Erscheinungsjahr:
Lizenz:
Rechteinhaber/in:
Demir, Nurullah

Große-Kampmann, Matteo

Urban, Tobias

Wressnegger, Christian

Holz, Thorsten

Pohlmann, Norbert
Förderung:
-
Name Speichervolumen Metadaten Upload Aktion

Zugriffe der letzten sechs Monate

Aufrufe der Datenpaket-Seite

69


Downloads des Datenpakets

0


Gesamtstatistik

Zeitraum Aufrufe der Datenpaket-Seite Datenpaket heruntergeladen
Feb. 2024 12 0
Jan. 2024 11 0
Dez. 2023 10 0
Nov. 2023 21 0
Okt. 2023 10 0
Sep. 2023 5 0
Vorher 35 1
Gesamt 104 1
Status:
Publiziert
Eingestellt von:
kitopen
Erstellt am:
Archivierungsdatum:
2023-06-24
Archivgröße:
294,1 GB
Archiversteller:
kitopen
Archiv-Prüfsumme:
60b4b49da8d51e4b4bd136f93a31bc2f (MD5)
Ende des Embargo-Zeitraums:
-