QaqC-tutorial



Welcome to the QaqC Dashboard - your one-stop tool for validating, cleaning, and exploring phenotypic datasets before statistical analysis. This tutorial will walk you through each step of the app with detailed instructions and visual guides.



———————————– Step 1: Upload the Dataset ———————————


This is where everything starts. You’ll load your phenotypic CSV file into the app, preview the data, and check for basic issues like missing values or duplicates.

📥 Purpose:

🎯 What You Can Do:

🖱️ How to Use:


1. Click “Browse…” to upload your .csv file. Make sure the file has column headers in the first row.


2. After loading, the dataset preview appears on the right side under Raw Data Preview.

upload the csv file

3. Click “Find Missing Values” to highlight any missing entries (NA or blank).

find missing in csv file

4. To check for duplicates: Select which column(s) to check, Click “Find Duplicates”, and You can then choose to Remove Duplicates if needed.

duplicates

5. Reset Data will clear your current session and allow re-upload.

reset the file upload

🧠 Tip:





———————————– Step 2: Column Summary Tab ———————————–



🔍 Purpose: Get descriptive statistics for any numeric column — a fast way to assess data spread, potential outliers, and normality.

🎯 What You Can Do:

🖱️ How to Use:

  1. Choose a trait/column from the “Select Column to Summarize” dropdown.
  2. The Summary Statistics table will update instantly.
  3. Click “Show Definitions” to view explanations of each metric.

📘 Metric Definitions (examples):

column summary





——————————– 🚦 Step 3: Visualize & Detect Outliers ——————————–

This step includes three tabs — Histogram, Boxplot, and Studentized Residual Plot — that help detect and visualize unusual patterns or outlier values.


📉 Tab 1: Histogram

🎯 What You Can Do:

🖱️ How to Use:

  1. Select a trait.
  2. Optionally enable “Flag SD-based Outliers” and set bin size (if needed).
  3. Compare Raw vs Filtered histograms.
    • Filtered histogram excludes flagged outliers.
  4. Below the plots, see a summary table of outlier rows.
  5. Use the download buttons to save:
    • Raw Plot, Filtered Plot, or Filtered CSV.

histogram


📦 Tab 2: Boxplot

🎯 What You Can Do:

🖱️ How to Use:

  1. Select a trait.
  2. Optionally enable “Flag IQR-based Outliers”.
  3. Compare Raw vs Filtered boxplots.
  4. Below the plots, see a summary table of outlier rows.
  5. Customize plot aesthetics if desired.
  6. Download plots and cleaned data.

boxplot


📈 Tab 3: Studentized Residual Plot

🎯 What You Can Do:

🖱️ How to Use:

  1. Choose a Response Variable (e.g., yield).
  2. Select Predictors (e.g., plot, block, entry).
  3. Set a threshold (default = 4).
  4. View the Raw vs Filtered studentized residual plots.
    • Red dots = outliers
  5. Below the plots, see a summary table of outlier rows.
  6. Export plot or cleaned dataset.

st residual plot





——————————– Step 4: Pairwise & Overall Relationships ——————————–


We use two tabs here: Scatter Plot (pairwise) and Correlation Heatmap (overall).

🔹 Tab 1: Scatter Plot

🎯 What You Can Do:

🖱️ How to Use:

  1. From the left panel, select a Response Variable (e.g., yield).
  2. Select a Predictor Variable (e.g., plot).
  3. (Optional) Check “Add Linear Model” to overlay a regression line.
  4. (Optional) Add a Correlation Ellipse to visualize spread.
  5. Compare Raw vs Filtered scatterplots.
  6. Review model results under each plot.

📘 Interpretation Example:

scatter plot



🔹 Tab 2: Correlation Heatmap

🎯 What You Can Do:

🖱️ How to Use:

  1. Select multiple variables in the Select Variables panel.
  2. Choose a correlation method:
    • Pearson: linear correlations
    • Spearman: rank-based correlations
    • Kendall: concordance between ranks
  3. (Optional) Enable outlier detection with SD threshold.
  4. Compare Raw vs Filtered heatmaps.
  5. Hover over heatmap cells to see exact correlation values.
  6. Download as plot or filtered dataset.

📘 Interpretation Example:

heatmap





——————————📑 Step 5: QA/QC Report ——————————

This is the final step — it brings everything together into a downloadable report.

🎯 What You Can Do:

🖱️ How to Use:

  1. In the Mirror Settings panel, check which sections you want to include.
    • Example: Histogram + Boxplot + Residuals.
  2. The report preview updates live under Everything currently visible across tabs.
  3. Click Download HTML to export the full report.

📘 Why It Matters:

qaqc report





—————————— 🎉 Wrapping Up ——————————

Congratulations — you’ve just completed the QaqC Dashboard tutorial! 🚀

By now, you should be able to:

💡 Why This Matters

Every cleaned dataset you generate through QaqC is more:

🚀 Next Steps



——————————📬 Support & Contact ——————————

If you run into issues while using the QaqC Dashboard, please don’t hesitate to reach out. We’re here to help!

Gurminder Singh (Developer / Tutorial Author) 📧 g.singh@ndsu.edu

Richard Horsley (Department Head, Project Lead) 📧 richard.horsley@ndsu.edu

Ana Heilman-Morales (Director, NDSU Agricultural Data Analytics, Project Lead) 📧 ana.heilman.morales@ndsu.edu

NDSU Big Data Team (Technical Support) 📧 ndsu.bigdata@ndsu.edu

💡 Tip: When emailing, please include a brief description of the problem and, if possible, a screenshot of the error or the dataset structure you are working with. This helps us respond more effectively.