Integrate

What is it?

You might want to integrate a dataset that you discovered and that fits to your own data to verify your results, as a starting point for an integrative study or just to test a new hypothesis for a follow-up study. Integration is the merging of multiple datasets from different sources, like your recently collected data with former data from other owners, resulting in a new, bigger dataset. You can manually integrate data or save time by using automatic integration procedures. This requires the use of a common syntax and terminology right from the start (see Fact-Sheets ‘Collect’ and ‘Describe’). When other authors’ data are re-used, it is fundamental to provide credit to the data creators through a robust data citation practice which works best when data are equipped with a persistent identifier (PID).

DLC_Integrate.png

How to do it?

  1. For an efficient data integration good data management planning is the key. Make data management easy by using existing tools and workbenches for data collection, assurance, description, submission and discovery.
  2. Understand the data and assess suitability for the required purpose. With visualization and aggregation tools (like the GFBio VAT system), you can create geographic maps or descriptive statistics for a better understanding.
  3. Use agreed terms, e.g. from the GFBio Terminology Service during the collection and description stages. This will enrich your data semantically and facilitate the integration process later on.
  4. Ensure that formats and parameters are compatible (date, resolution, metric units).
  5. Document the relationships among data sets from different sources.
  6. Bear in mind using unique identifiers to prevent duplication of the data set.
  7. Document the data integration process (script, workflow) and describe it in the metadata.
  8. Document your data analysis.
  9. Establish a proper citation of the data sets you used (data provenance)!
  10. Keep data quality and metadata quality in mind.  

Who does it?

Data re-users in general, e.g. modelers or researchers performing integrative or comparative studies.

Key Elements

  • Proper data management practices in preceding data life cycle stages
  • Documentation of the integration and analysis workflows
  • Data citation
  • Data provenance
  • Data and metadata quality

GFBio Services

Data Visualization and Analysis

  • GFBio VAT System for visualization, aggregation and transformation
    • Upload own data and compare them to others
    • Get data you found with the GFBio Search via the basket option, and further explore them with the VAT system
    • Graphical overlays
    • Standard GIS operations
    • Statistical analysis
    • Option to access data with R statistical software

Terminology Service

Useful Links

http://www.dataone.org/education-modules (DataONE - Education Modules)
https://www.dataone.org/best-practices/document-integration-multiple-datasets (DataONE - Integration)
http://www.dcc.ac.uk/sites/default/files/documents/DC%20101%20Transform.pdf (DCC - Digital Curation)
http://ukdataservice.ac.uk/media/104397/data_citation_online.pdf (Economic & Social Research Council)

Discover ← → Analyze

120px-By-nc.svg

Recommended citation:
German Federation for Biological Data (2021). GFBio Training Materials: Data Life Cycle Fact-Sheet: Data Life Cycle: Integrate. Retrieved 16 Dec 2021 from https://www.gfbio.org/training/materials/data-lifecycle/integrate.