Data Documentation

Contact

Data documentation explains how data were collected (context of data collection), what they mean, what are their content and structure, and specifies any manipulations that may have taken place. Good documentation should be considered best practice when creating, organising, and managing data, and is equally important for data preservation. Data files must be clean and clear, i.e., interpretable and searchable by other researchers

The data should be accompanied by adequate documentation that explains the context of data collection (history, aim, and hypotheses), methodology, and any material relevant to the data. This includes not only the general information about the study itself (metadata) compatible with international standards (DDI, in order to share this metadata) but also to give as much material as possible to make the study as suitable as possible for secondary use. This kind of accompanying material could include:

Methodological reports
Codebooks
Questionnaires
Coding instructions
Interviewer guides
Database dictionaries
Bibliographies of publications related to the data
Links to online tools

Study level

Study level metadata is a subset of core information about the conducted research. It provides information about the research context, data collection methods, preparation of data for analysis, analytical methods, and the overviews of results. Study level documentation should be in line with the DDI metadata standard (The Data Documentation Initiative) used for describing social science data and should include the following:

The context of data collection: project history, aims, objectives, and hypotheses
Data collection methods: data collection protocols, sampling design, instruments used, hardware and software used, data scale and resolution, temporal coverage and geographic coverage, and digitization or transcription methods
Structure of data files, number of cases, records, variables and relationships between files
Data sources used and provenance of materials, e.g., for transcribed or derived data
Data validation, checking, proofing, cleaning and other quality assurance procedures carried out, such as checking for equipment and transcription errors, calibration procedures, data capture resolution and repetitions, or editing, proofing or quality control of materials
Modifications made to data over time since their original creation and identification of different versions of datasets
For time series or longitudinal surveys, changes made to methodology, variable content, question text, variable labeling, measurements or sampling
Information on data confidentiality, access, and use conditions, where applicable

Dataset level

Dataset metadata could be embedded in data files in the form of variable and code descriptions in database/data-tables or the form of headers in transcript files. Such information includes:

Names, labels and descriptions for variables, records and their values
Explanation of codes and classification schemes used
Codes of, and reasons for, missing values
Derived data created after collection, with code, algorithm or command file used to create them
Weighting and grossing variables created and how they should be used
Data list describing cases, individuals or items studied, for example for logging qualitative interviews
Additional information about dataset could be recorded in an external structured document if needed

Location

Institute of Economic Sciences

Zmaj Jovina 12, 11000 Belgrade, Serbia

Work Hours

Mon-Fri: 9 am - 3 pm
Weekends: Closed

Phone & Email

Tel. + 381 64 12 33 254

data-centar@ien.bg.ac.rs

Get Data

Deposit Data

Data Management

Resources

About DCS

Contact