Package 'Fluidigm'

Title: Handling Fluidigm Data
Description: Designed to streamline the process of analyzing genotyping data from Fluidigm machines, this package offers a suite of tools for data handling and analysis. It includes functions for converting Fluidigm data to format used by 'PLINK', estimating errors, calculating pairwise similarities, determining pairwise similarity loci, and generating a similarity matrix.
Authors: Daniel Fischer [aut, cre] , Helena Johansson [ctb, fnd], Robert Ekblom [aut]
Maintainer: Daniel Fischer <[email protected]>
License: GPL-3
Version: 0.2
Built: 2025-01-07 03:59:35 UTC
Source: https://github.com/fischuu/fluidigm

Help Index


Run plink to Calculate Pairwise Similarities

Description

This function serves as a wrapper to the 'PLINK' software, which is a free, open-source whole genome association analysis toolset. It specifically uses 'PLINK' to calculate pairwise similarities between genotypes.

Usage

calculatePairwiseSimilarities(
  file,
  db = NA,
  map = NA,
  out = NA,
  sexing = TRUE,
  verbose = TRUE,
  verbosity = 1
)

Arguments

file

A string representing the path to the filtered ped/map file pair (without ped/map file extension).

db

A string representing the path to an existing genotype database. If not provided, the function will proceed with the existing data.

map

A string representing the filepath to PlateDnoY.map file. If not provided, the function will use the map file with the same name as the ped file.

out

A string representing the path to the output. If not provided, the output will be written to a file with the same name as the input file, appended with "_oDB".

sexing

A logical value indicating whether the function should try to perform sexing. Default is FALSE.

verbose

A logical value indicating whether the output should be verbose. Default is TRUE.

verbosity

An integer representing the level of verbosity. Set to a higher number for more detailed output. Default is 1.

Details

The function first checks the input parameters and sets default values if necessary. It then constructs and executes a PLINK command to merge the genotype output with the existing genotype database if one is provided. Finally, it calculates pairwise similarities for all samples (and database individuals) using another PLINK command. If the 'sexing' parameter is set to TRUE, the function will also attempt to determine the sex of the individuals.

Value

A list containing the following elements: gensim, a matrix indicating if genotypes are called correctly for replicates and/or if genotypes are missing summs, a matrix with summary statistics

References

  • Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ & Sham PC (2007) PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81.

  • Purcell, Shaun. PLINK. https://zzz.bwh.harvard.edu/plink/

Examples

## Not run: 

  outdir <- tempdir()

  calculatePairwiseSimilarities(file=file.path(outdir, "example_data.csv.GOOD"))

## End(Not run)

Estimate Errors in 'PLINK” ped files

Description

This function processes the 'PLINK' ped files and estimates errors. It also performs sex assignment and species marker analysis if required.

Usage

estimateErrors(
  file,
  outdir = NA,
  db = NA,
  appendSamplesToDB = FALSE,
  keep.rep = 1,
  y.marker = NA,
  x.marker = NA,
  sp.marker = NA,
  plots = TRUE,
  neg_controls = NA,
  allele_error = 5,
  marker_dropout = 15,
  no_marker = 50,
  male.y = 3,
  male.hetX = 0,
  female.y = 0,
  female.Xtot = 8,
  female.hetXtot = 3,
  warning.noYtot = 2,
  warning.noHetXtot = 3,
  sexing = FALSE,
  verbose = TRUE,
  verbosity = 1
)

Arguments

file

A string. Path to the ped input file.

outdir

A string specifying the output folder. If left empty the original folder path of the input file will be used.

db

A string. Name of the used database. Default is NA.

appendSamplesToDB

A logical. Should new samples be added to database? Default is FALSE.

keep.rep

A numeric. Keep only this n-fold replicates, default n=1.

y.marker

A vector. Y markers for sexing. Default is NA.

x.marker

A vector. X markers for sexing. Default is NA.

sp.marker

A vector. Markers used for species-identification. Default is NA.

plots

A logical. Should plots be created? Default is TRUE.

neg_controls

A vector. Names of negative controls. Default is NA.

allele_error

A numeric. Threshold for RERUN on Allele errors. Default is 5.

marker_dropout

A numeric. Threshold for RERUN on Marker dropout. Default is 15.

no_marker

A numeric. Number of markers. Default is 50.

male.y

A numeric. Threshold for sexing, male y-chromosome markers. Default is 3.

male.hetX

A numeric. Threshold for sexing, heterozygote x-chr markers. Default is 0.

female.y

A numeric. Threshold for sexing, female y-chromosome markers. Default is 0.

female.Xtot

A numeric. Threshold for sexing, total female x-chr markers. Default is 8.

female.hetXtot

A numeric. Threshold for sexing, heterozygote x-chr markers. Default is 3.

warning.noYtot

A numeric. Threshold for sexing, when should warning be triggered. Default is 2.

warning.noHetXtot

A numeric. Threshold for sexing, when should warning be triggered. Default is 3.

sexing

A logical. Should sexing be performed? Default is FALSE.

verbose

A logical or numeric. Should the output be verbose? Default is TRUE.

verbosity

A numeric. Level of verbosity, set to higher number for more details. Default is 1.

Details

This function processes the PLINK ped files and estimates errors. It checks if the first and second run of each sample have the same genotype and if both replicates are identical. It also performs sex assignment based on the provided Y and X markers. If species marker is provided, it performs species marker analysis. The function creates a consensus PED for all "GOOD" samples and exports it along with a .map file without the Y-markers. It also creates a database file, if provided.

Value

A list containing the following elements: gensim, a matrix indicating if genotypes are called correctly for replicates and/or if genotypes are missing summs, a matrix with summary statistics

Examples

## Not run: 
    # outdir is here the output directory from the fluidigm2PLINK function

    # Estimate the errors with sexing applied
    estimateErrors(file=file.path(outdir, "example_data.csv.ped"),
                   keep.rep = 2)

    # Estimate the errors and apply sexing with y and x markers defined
    estimateErrors(file=file.path(outdir, "example_data.csv.ped"),
                   keep.rep = 2,
                   sexing=TRUE,
                   y.marker = c("Y_scaffoldY158711_762",
                                "Y_scaffoldY42647_3017",
                                "Y_scaffoldY42656_3986"),
                   x.marker = c("X_scaffold11905_7659",
                                "X_scaffold17088_4621",
                                "X_scaffold1915_14108",
                                "X_scaffold4825_648",
                                "X_scaffold5374_1437",
                                "X_scaffold10171:3154"))

## End(Not run)

Fluidigm

Description

Designed to streamline the process of analyzing genotyping data from Fluidigm machines, this package offers a suite of tools for data handling and analysis. It includes functions for converting Fluidigm data to format used by 'PLINK', estimating errors, calculating pairwise similarities, determining pairwise similarity loci, and generating a similarity matrix.

Title

Comprehensive Analysis of Fluidigm Genotyping Data

Description

A suite of tools designed to streamline the process of analyzing genotyping data from Fluidigm machines. It includes functions for converting Fluidigm data to 'PLINK' format, estimating errors, calculating pairwise similarities, determining pairwise similarity loci, and generating a similarity matrix.

Details

The package provides a comprehensive analysis pipeline for Fluidigm genotyping data. It starts by converting the raw data from the Fluidigm machine into a format that can be used with the 'PLINK' software. It then estimates errors in the data, calculates pairwise similarities between genotypes, determines pairwise similarity loci, and generates a similarity matrix. The package is designed to make it easier and more efficient for researchers to extract meaningful insights from their genotyping studies.

References

  • Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ & Sham PC (2007) PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81.

  • Purcell, Shaun. PLINK. https://zzz.bwh.harvard.edu/plink/

Author(s)

Maintainer: Daniel Fischer [email protected] (ORCID)

Authors:

Other contributors:

  • Helena Johansson [contributor, funder]


Run the Fluidigm Analysis Script Together

Description

This function serves as a wrapper for the entire analysis pipeline. It takes a Fluidigm input file and performs several operations including conversion to 'PLINK' format, error estimation, calculation of pairwise similarities, determination of pairwise similarity loci, and calculation of the similarity matrix.

Usage

fluidigmAnalysisWrapper(
  file,
  out = NA,
  outdir = NA,
  db = NA,
  appendSamplesToDB = FALSE,
  map = NA,
  keep.rep = 1,
  neg_controls = NA,
  y.marker = NA,
  x.marker = NA,
  sp.marker = NA,
  plots = TRUE,
  allele_error = 5,
  marker_dropout = 15,
  no_marker = 50,
  male.y = 3,
  male.hetX = 0,
  female.y = 0,
  female.Xtot = 8,
  female.hetXtot = 3,
  warning.noYtot = 2,
  warning.noHetXtot = 3,
  rearrange = TRUE,
  group = NA,
  fixNames = TRUE,
  sexing = TRUE,
  similarity = 0.85,
  verbose = TRUE,
  verbosity = 1,
  missing.geno = "0 0",
  overwrite = FALSE
)

Arguments

file

A string specifying the path to the Fluidigm input file.

out

A string specifying the output file name. If left empty, the original basename of the input file will be used.

outdir

A string specifying the output folder. If left empty the original folder path of the input file will be used.

db

A string specifying the filepath to the database file. If not provided, the function will proceed with the existing data.

appendSamplesToDB

A logical indicating whether new samples should be added to the database. Default is FALSE.

map

A string specifying the filepath to the PlateDnoY.map file. If not provided, the function will use the map file with the same name as the ped file.

keep.rep

A numeric value indicating the number of replicates to keep. Default is 1.

neg_controls

A vector specifying the names of negative controls. Default is NA.

y.marker

A vector specifying the Y markers for sexing. Default is NA.

x.marker

A vector specifying the X markers for sexing. Default is NA.

sp.marker

A vector specifying the markers used for species identification. Default is NA.

plots

A logical indicating whether plots should be created. Default is TRUE.

allele_error

A numeric value specifying the threshold for RERUN on Allele errors. Default is 5.

marker_dropout

A numeric value specifying the threshold for RERUN on Marker dropout. Default is 15.

no_marker

A numeric value specifying the number of markers. Default is 50.

male.y

A numeric value specifying the threshold for sexing, male y-chromosome markers. Default is 3.

male.hetX

A numeric value specifying the threshold for sexing, heterozygote x-chr markers. Default is 0.

female.y

A numeric value specifying the threshold for sexing, female y-chromosome markers. Default is 0.

female.Xtot

A numeric value specifying the threshold for sexing, total female x-chr markers. Default is 8.

female.hetXtot

A numeric value specifying the threshold for sexing, heterozygote x-chr markers. Default is 3.

warning.noYtot

A numeric value specifying the threshold for sexing, when should warning be triggered. Default is 2.

warning.noHetXtot

A numeric value specifying the threshold for sexing, when should warning be triggered. Default is 3.

rearrange

A logical indicating whether the ped/map output should be rearranged in order of provided map file. Default is TRUE.

group

A string specifying the sample identifier for statistics. Default is NA.

fixNames

A logical indicating whether whitespaces from sample names should be automatically removed. Default is TRUE.

sexing

A logical indicating whether sexing should be performed. Default is FALSE.

similarity

Similarity threshold. Default: 0.85.

verbose

A logical or numerical value indicating whether the output should be verbose. Default is TRUE.

verbosity

A numerical value indicating the level of verbosity. Set to a higher number for more details. Default is 1.

missing.geno

A character string specifying how missing values should be coded. Default is "0 0".

overwrite

A logical indicating wheter the original map file should be overwritten or not. Default FALSE

Details

The function first checks the input parameters and sets default values if necessary. It then runs the following functions in order:

  • fluidigm2PLINK: Converts the Fluidigm data to PLINK format.

  • estimateErrors: Estimates errors in the PLINK ped files.

  • calculatePairwiseSimilarities: Calculates pairwise similarities between genotypes.

  • getPairwiseSimilarityLoci: Determines the loci of pairwise similarities.

  • similarityMatrix: Calculates the similarity matrix. The function prints a completion message when all operations are done.

Value

A list containing the following elements: gensim, a matrix indicating if genotypes are called correctly for replicates and/or if genotypes are missing summs, a matrix with summary statistics

References

  • Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P, de Bakker PIW, Daly MJ & Sham PC (2007) PLINK: a toolset for whole-genome association and population-based linkage analysis. American Journal of Human Genetics, 81.

  • Purcell, Shaun. PLINK. https://zzz.bwh.harvard.edu/plink/

See Also

Examples

## Not run: 
  fluidigmAnalysisWrapper(file="path/to/your/file.csv", map="path/to/your/mapfile.map")

## End(Not run)

Get Pairwise Similarity Loci

Description

This function is a wrapper to a perl script that determines the loci of pairwise similarities. It is designed to work with data from the 'PLINK' software, which is commonly used in bioinformatics for whole genome association analysis.

Usage

getPairwiseSimilarityLoci(file, verbose = TRUE, verbosity = 1)

Arguments

file

A string. This is the path to the previously created PLINK ped file (with .ped extension). The ped file contains genotype information in a format that can be used for further analysis.

verbose

A logical. If TRUE, the function will print detailed messages during its execution to help you understand what it's doing at each step. Default is TRUE.

verbosity

An integer. This parameter controls the level of verbosity. The higher the number, the more detailed the messages. Default is 1.

Details

This function first checks the input parameters. It then constructs the output file names and the command to run the perl script. The command is executed using the system function. If the 'verbose' parameter is set to TRUE, the function will print a message when it has finished running.

The perl script was written by Doug Scofield, see references.

Value

This function does not return a value in the R environment. Instead, it creates an output file with the '.pairs' extension in the same directory as the input file. This output file contains the results of the pairwise similarity loci analysis.

References

The original code this function is based on can be found at: GitHub https://github.com/douglasgscofield/bioinfo/blob/main/scripts/plink-pairwise-loci.pl

Examples

## Not run: 
  outdir <- tempdir()
  getPairwiseSimilarityLoci(file = file.path(outdir, "example_data.csv.GOOD"))

## End(Not run)

Calculate the similarity matrix

Description

Performs pairwise similarity analysis on genotypic data.

Usage

similarityMatrix(
  file = NA,
  mibs.file = NA,
  pairs.file = NA,
  ped.file = NA,
  group = NA,
  plots = TRUE,
  similarity = 0.85,
  verbose = TRUE,
  verbosity = 1
)

Arguments

file

Input file path. Default: NA.

mibs.file

MIBS input file path. Default: NA.

pairs.file

PAIRS input file path. Default: NA.

ped.file

PED input file path. Default: NA.

group

Sample identifier for statistics. Default: NA.

plots

Should plots be created? Default: TRUE.

similarity

Similarity threshold. Default: 0.85.

verbose

Should output be verbose? Default: TRUE.

verbosity

Verbosity level. Default: 1.

Details

Reads genotype data, performs pairwise similarity calculations, generates plots, and outputs data for further analysis.

Value

Does not return a value. Creates output files in the same directory as the input files.

Examples

## Not run: 
     similarityMatrix(file = file.path(outdir, "example_data.csv.GOOD"))

## End(Not run)