Analyze

What is it?

You analyse your data to gather information from a single or an integrated dataset. To analyse your data and use GFBio analysis tools, you have to upload your data. Then, they can be statistically explored through visualisation, overlay, transformation, statistical analysis or modelling. This set of actions should provide new insight into research questions, leading to conclusions that can help decision makers to act upon. Software and hardware used for data analysis, as well as data analysis
vary among disciplines.
Despite enormous differences in characteristics, quantity and format, research data from various scientific fields can be categorized in three basic types of data:
Raw Data: initial unprocessed data. Examples are measured data without corrections, photos or samples.
Primary Data: processed data. They can result from the conversion or correction of measurements. Other examples are specific data compilations or annotations.
Secondary data: Data are secondary data when they are re-used. These data were originally collected for a different purpose and used in a new context.

DLC_Analyze.png

How to do it?

  1. Reproducibility is a crucial element for verification of results or for data reuse. Therefore, it is important to document the workflows of analysis and visualisation (e.g. computer scripts or text file notes).
  2. Be as detailed as possible when writing the process metadata! Which results are due to which step.
  3. Find appropriate software that is ideally open-source (e.g. R). Ask yourself what you want to find out and which tools you need. Which software has proven its worth in similar studies and fits best to your expertise and time budget?
  4. If pre-processing is required, state why. Normalize or transform raw data in meaningful values (algorithms).
  5. Describe how the algorithm is applied.
  6. Discover new patterns or outliers (in large datasets) via data mining and through plots (statistical or graphical output possible).
  7. Decide which statistical model fits the data best and perform your analysis.

Who does it?

Every kind of researcher dealing with biodiversity related data (data producers and data re-users).

Key Elements

GFBio Services

Data Visualization and Analysis

Useful Links

https://kepler-project.org (Scientific workflow interface)
http://www.vistrails.org/index.php/Main_Page (Scientific workflow interface)
http://cran.r-project.org (R open-source software for statistics)
http://www.rstudio.com/products/rstudio/download (R Studio open-source tool for R)

Integrate ← → Publish

120px-By-nc.svg

Recommended citation:
German Federation for Biological Data (2021). GFBio Training Materials: Data Life Cycle Fact-Sheet: Data Life Cycle: Analyze. Retrieved 16 Dec 2021 from https://www.gfbio.org/training/materials/data-lifecycle/analyze.