Preparation for data acquisition

Quality data preparation for storage in a digital archive is a complex process that requires complete knowledge of the data structure with which it is being handled.

According to a list of specific guidelines for archiving, datasets must be cleaned and understandable to other researchers. The term “cleaned” represents corrected data – data cleared of invalid, incomplete, incorrectly formatted or double records. Also, the accompanying documentation should contain as many materials as possible, so that the study/research would be appropriate for secondary, further use. Supplementary materials may be methodological reports, codebooks, questionnaires, code instructions, instructions to interviewers, dictionary databases, bibliographies of publications that have emerged as a result of analysis of the specific dataset, as well as used links and online tools. The basic dataset, with additional materials, aims to make it easier for the further users – researchers to get familiar with the data and their basic purpose.

Instructions and tips for data acquisition

  • Use consistent names that will reflect the content of files, including the year(s) to which the data relates; avoid spaces and special characters;
  • If the data is sensitive or limited, indicate it in the file name;
  • Remove all direct identifiers from your data set (names, addresses, phone numbers, and all other variables that would allow individuals to be identified);
  • Name variables clearly and logically, briefly explain their significance and link them to relevant questions and questionnaires, briefly explain and which response options were available;
  • Ensure that variables are not repeated, paying special attention to derived variables;
  • Replace the missing values ​​with the explicit code (e.g., 88- ‘not known’), the missing values ​​should not appear as an empty record or as standard unknown value in the statistical program you are using;
  • Check the frequencies and if you find inconsistencies or abnormalities in your data, correct them or remove that variable;
  • Prepare instruments (questionnaires, instructions to interviewers, etc.) used in the collection of data, in special files, as well as all materials sent in advance to respondents (e.g. call letters), all materials presented to respondents during interviews (e.g. showcards … ) and any instructions or materials used by the examiners (e.g. explanations, frequently asked questions, etc.);
  • Prepare the appropriate documentation that should contain the following information: the context of the data (project history, goals, research design, target group, cause and size of the sample unit in the analysis of data collection methods (CATI, CAPI, mail, web, etc.) response rate and time and spatial structure of files, cases, links between files (if applicable), validation, proofing, cleaning and other quality procedures that ensure the confidentiality of information and data, the approach and use of the requirements of the weight factor of the recorded and derived variables generated after the process collection, code, algorithm, or file command that was used when creating them.

Data formats that are most likely to be available in the future are preferred. In other words, non-native formats, openly documented, non-encrypted, and uncompressed, i.e., those who are commonly used in the research community. Therefore, the following data is considered suitable for deposit:

  • Tabular data : SPSS portable format (.por), SPSS (.sav), Stata (.dta), Excel or other spreadsheet format files, which can be converted to tab- or comma-delimited text;
  • Text: Adobe Portable Document Format (PDF/A, PDF) (.pdf), plain text data, ASCII (.txt);
  • Audio: Waveform Audio Format (WAV) (.wav) from Microsoft, Audio Interchange File Format (AIFF) (.aif) from Apple, FLAC (.flac);
  • Images: TIFF (.tif)ideally version 6 uncompressed, JPEG (.jpeg, .jpg) only when created in this format, Adobe Portable Document Format (PDF/A, PDF) (.pdf), RAW image format (.raw), Photoshop files (.psd);
  • Video: MPEG-4 (.mpg4), motion JPEG 2000 (.mj2);
  • Compressed files: are accepted as long as they can be uncompressed by using open and freely available software, such as 7-Zip or Winzip.


