Package 'GenomicTools.fileHandler'

Title: File Handlers for Genomic Data Analysis
Description: A collection of I/O tools for handling the most commonly used genomic datafiles, like fasta/-q, bed, gff, gtf, ped/map and vcf.
Authors: Daniel Fischer
Maintainer: Daniel Fischer <[email protected]>
License: GPL (>= 2)
Version: 0.2.2
Built: 2024-10-25 03:46:33 UTC
Source: https://github.com/fischuu/genomictools.filehandler

Help Index


R Package To Handle Files From Genomic Data GenomicTools.fileHandler is a loose collection of I/O Functions Needed in Genomic Data Analysis

Description

Package: GenomicTools.fileHandler
Type: Package
Version: 0.2
Date: 2024-01-16
License: GPL
LazyLoad: yes

Author(s)

Daniel Fischer

Maintainer: Daniel Fischer <[email protected]>


Example Gene Annotation in Bed-Format

Description

This file contains some example lines to represent a typical bed file that can be used to try the corresponding functions.

Format

A file with three column Chr, Start and End.

Details

The file is locate din the /extdata folder of the package and is accessible after installation via system.file("extdata","example.bed", package="GenomicTools.fileHandler")

Author(s)

Daniel Fischer


Example Sequencing Reads in fasta-Format

Description

This file contains some example reads to represent a typical fasta file that can be used to try the corresponding functions.

Details

The file is locate din the /extdata folder of the package and is accessible after installation via system.file("extdata","example.fasta", package="GenomicTools.fileHandler")

Author(s)

Daniel Fischer


Example Sequencing Reads in fastq-Format

Description

This file contains some example reads to represent a typical fastq file that can be used to try the corresponding functions.

Details

The file is locate din the /extdata folder of the package and is accessible after installation via system.file("extdata","example.fastq", package="GenomicTools.fileHandler")

Author(s)

Daniel Fischer


Example Gene Annotation in gff-Format

Description

This file contains some example gene annotations to represent a typical gff file that can be used to try the corresponding functions.

Details

The file is locate din the /extdata folder of the package and is accessible after installation via system.file("extdata","example.gff", package="GenomicTools.fileHandler")

Author(s)

Daniel Fischer


Example Gene Annotation in gtf-Format

Description

This file contains some example gene annotations to represent a typical gtf file that can be used to try the corresponding functions.

Details

The file is locate din the /extdata folder of the package and is accessible after installation via system.file("extdata","example.gtf", package="GenomicTools.fileHandler")

Author(s)

Daniel Fischer


Example Variant data in ped/map-Format

Description

This file contains some example variants to represent a typical ped/map file pair that can be used to try the corresponding functions.

Details

The file is locate din the /extdata folder of the package and is accessible after installation via system.file("extdata","example.ped", package="GenomicTools.fileHandler")

Author(s)

Daniel Fischer


Example Variant data in vcf-Format

Description

This file contains some example variants to represent a typical vcf file that can be used to try the corresponding functions.

Details

The file is locate din the /extdata folder of the package and is accessible after installation via system.file("extdata","example.vcf", package="GenomicTools.fileHandler")

Author(s)

Daniel Fischer


Example Gene Annotation in zipped gtf-Format

Description

This file contains some example gene annotations to represent a typical zipped gtf file that can be used to try the corresponding functions.

Details

The file is locate din the /extdata folder of the package and is accessible after installation via system.file("extdata","example2.gtf.gz", package="GenomicTools.fileHandler")

Author(s)

Daniel Fischer


Exporting a Bed File.

Description

This function exports a standard bed file.

Usage

exportBed(x, file = NULL, header = FALSE)

Arguments

x

data.frame

file

Character, specifies filename/path

header

Logical, shall a header be written

Details

This function exports a data.frame to a standard bed file. If no file name is given, the variable name will be used instead.

Value

A bed file

Author(s)

Daniel Fischer

Examples

novelBed <- data.frame(Chr=c(11,18,3),
                      Start=c(72554673, 62550696, 18148822),
                      End=c(72555273, 62551296, 18149422),
                      Gene=c("LOC1", "LOC2", "LOC3"))

# Create a temporary file to where the output of the function is stored
myfile <- file.path(tempdir(), "myLocs.bed")

exportBed(novelBed, file=myfile)
exportBed(novelBed, file=myfile, header=TRUE)

Exporting a Fasta File.

Description

This function exports a standard fasta file.

Usage

exportFA(fa, file = NULL)

Arguments

fa

fasta object

file

Character, specifies filename/path

Details

This function exports a fasta object to a standard fasta file. If no file name is given, the variable name will be used instead.

Value

A fasta file

Author(s)

Daniel Fischer

Examples

# Define here the location on HDD for the example file
   fpath <- system.file("extdata","example.fasta", package="GenomicTools.fileHandler")
 # Import the example fasta file  
   fastaFile <- importFA(file=fpath)
   newFasta <- fastaFile[1:5]
   
myfile <- file.path(tempdir(), "myLocs.fa")
   
  exportFA(newFasta, file=myfile)

Exporting a GTF File.

Description

This function exports a standard gtf file.

Usage

exportGTF(x, file)

Arguments

x

gtf-object

file

Character, specifies filename/path

Details

This function exports a gtf-object to a standard gtf file.

Value

A gtf file

Author(s)

Daniel Fischer


Importing a Bed File.

Description

This function imports a standard bed file

Usage

importBed(file, header = FALSE, sep = "\t")

Arguments

file

Specifies the filename/path

header

Logical, is a header present

sep

Column separator

Details

This function imports a standard bed-file into a data.frame. It is basically a convenience wrapper around read.table. However, if no header lines is given, this function automatically assigns the column names, as they are given in the bed-specification on the Ensembl page here: https://www.ensembl.org/info/website/upload/bed.html

Value

A data.frame

Author(s)

Daniel Fischer

See Also

[exportBed], [read.table]

Examples

# Define here the location on HDD for the example file
   fpath <- system.file("extdata","example.bed", package="GenomicTools.fileHandler")
 # Import the example bed file  
   bedFile <- importBed(file=fpath)

Import a Tab Delimited Blast Output File

Description

This function imports a tab delimited blast output.

Usage

importBlastTab(file)

Arguments

file

Filename

Details

This function imports a tab delimited blast output file, currently the same as read.table

Value

A data.frame

Author(s)

Daniel Fischer


Importing a Fasta File.

Description

This function imports a standard fasta file

Usage

importFA(file, verbose = FALSE)

Arguments

file

Specifies the filename/path

verbose

Logical, verbose function output

Details

This function imports a standard fasta file. Hereby, it does not matter if the identifier and sequence are alternating or not, as the rows starting with '>' are used as identifer.

The example file was downloaded from here and was then further truncated respective transformed to fasta format:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/sequence_read/

Value

An object of class fa containing the sequences. The names correspond to the sequence names given in the fasta file.

Author(s)

Daniel Fischer

See Also

print.fa, summary.fa

Examples

# Define here the location on HDD for the example file
   fpath <- system.file("extdata","example.fasta", package="GenomicTools.fileHandler")
 # Import the example fasta file  
   fastaFile <- importFA(file=fpath)

Import from FeatureCounts

Description

This functions imports the output from FeatureCounts

Usage

importFeatureCounts(file, skip = 0, headerLine = 2)

Arguments

file

Character, file name

skip

Number of lines to skip from txt file

headerLine

Linenumber that contains the header information

Details

FeatureCounts produces two files, the txt that contain the expression values and then the summary that containts all the information about the mapping statistics. This function imports both and stores them in a corresponding list.

Value

A list with expValues, geneInfo and summary

Author(s)

Daniel Fischer

Examples

# Define here the location on HDD for the example file
   fpath <- system.file("extdata","featureCountsExample.txt", package="GenomicTools.fileHandler")
 # Import the example featureCounts file  
   fcFile <- importFeatureCounts(file=fpath)

Importing a Fastq File.

Description

This function imports a standard fastq file

Usage

importFQ(file)

Arguments

file

Specifies the filename/path

Details

This function imports a standard fastq file that consists out of blocks of four lines per entry

Value

An object of class fq containing the sequences and the quality meausure. The names correspond to the sequence names given in the fasta file.

Author(s)

Daniel Fischer

See Also

print.fq, summary.fq

Examples

# Define here the location on HDD for the example file
   fpath <- system.file("extdata","example.fastq", package="GenomicTools.fileHandler")
 # Import the example fastq file  
   fastqFile <- importFQ(file=fpath)

importGFF

Description

Import a GFF file

Usage

importGFF(
  file,
  skip = "auto",
  nrow = -1,
  use.data.table = TRUE,
  level = "gene",
  features = NULL,
  num.features = c("FPKM", "TPM"),
  print.features = FALSE,
  merge.feature = NULL,
  merge.all = TRUE,
  class.names = NULL,
  verbose = TRUE
)

Arguments

file

file or folder

skip

numeric, lines to skip

nrow

numeric, lines to read

use.data.table

logical

level

Character, read level, default: "gene"

features

features to import

num.features

names of the numeric features

print.features

Logical, print available features

merge.feature

Character, merge multiple samples to dataset

merge.all

Logical, shall all samples be merged together

class.names

Definition of class name sin V9

verbose

Logical, verbose function output

Details

This function imports a standard gff file.

Value

A gff object

Author(s)

Daniel Fischer

Examples

# Define here the location on HDD for the example file
   fpath <- system.file("extdata","example.gff", package="GenomicTools.fileHandler")
 # Import the example gff file  
   importGFF(fpath)

importGFF3

Description

Import a GFF3 file

Usage

importGFF3(gff, chromosomes)

Arguments

gff

file or folder

chromosomes

The chromosome to import

Details

This function imports a standard gff3 file.

Value

A gff object

Author(s)

Daniel Fischer


Import a GTF File

Description

This function imports a gtf file.

Usage

importGTF(
  file,
  skip = "auto",
  nrow = -1,
  use.data.table = TRUE,
  level = "gene",
  features = NULL,
  num.features = c("FPKM", "TPM"),
  print.features = FALSE,
  merge.feature = NULL,
  merge.all = TRUE,
  class.names = NULL,
  verbose = TRUE
)

Arguments

file

file or folder

skip

numeric, lines to skip

nrow

numeric, lines to read

use.data.table

logical

level

Character, read level, default: "gene"

features

features to import

num.features

names of the numeric features

print.features

Logical, print available features

merge.feature

Character, merge multiple samples to dataset

merge.all

Logial, shall all samples be merged

class.names

Vector with class names

verbose

Logical, verbose function output

Details

This function imports a gtf file. The features names to be imported are defined in features, several features are then provided as vector. A list of available feature can beprinted, by setting print.features=TRUE.

The skip option allows to skip a given number of rows, the default is, however, auto. In that case, all rows that start with the # symbol are skipped.

In case a set of expression values given in gtf format should be imported and to be merged into a single data table, the feature that should be used for merging can be provided to the merge.feature option. In that case the function expects a folder in file and it will import all gtfs located in that folder and merges them according to the merge.feature option. With the option class.names a vector of prefixes for the merged features can be provided. If this is kept empty, then the filenames of the gtf will be used instead (without gtf extension).

By default the function imprts all features in column 9 as string character. However, for common labels (FPKM and TPM) the class type is set automatically to numeric. Additional numerical feature names can be defined with the num.feature option.

Value

A gtf object

Author(s)

Daniel Fischer

Examples

# Define here the location on HDD for the example file
   fpath <- system.file("extdata","example.gtf", package="GenomicTools.fileHandler")
 # Same file, but this time as gzipped version
   fpath.gz <- system.file("extdata","example2.gtf.gz", package="GenomicTools.fileHandler")
   
 # Import the example gtf file  
   importGTF(fpath, level="transcript", features=c("gene_id","FPKM"))
   
 ## Not run: 
 # For the current you need to have zcat installed (should be standard on a Linux system)
   importGTF(fpath.gz, level="transcript", features=c("gene_id","FPKM"))
   
## End(Not run)

importPED

Description

Import a PED/MAP file pair

Usage

importPED(
  file,
  n,
  snps = NULL,
  which,
  split = "\t| +",
  sep = ".",
  na.strings = "0",
  lex.order = FALSE,
  verbose = TRUE
)

Arguments

file

ped filename

n

Number of samples to read

snps

map filename

which

Names of SNPS to import

split

Columns separator in ped file

sep

Character that separates Alleles

na.strings

Definition for missing values

lex.order

Logical, lexicographical order

verbose

Logical, verbose output

Details

This function is to a large extend taken from snpStat::read.pedmap, but here is internally the data.table::fread function used that resulted in much faster file processing.

To import the data, the ped file can be provided to the file option and the map file to the snps option. If no option is given to snps and the file option is provided without any file extension, then the ped/map extension are automaticall added

Value

a pedmap object

Author(s)

Daniel Fischer

Examples

# Define here the location on HDD for the example file
   pedPath <- system.file("extdata","example.ped", package="GenomicTools.fileHandler")
   mapPath <- system.file("extdata","example.map", package="GenomicTools.fileHandler")
 # Import the example ped/map files  
   importPED(file=pedPath, snps=mapPath)

importSTARLog

Description

Import the Log-File from STAR

Usage

importSTARLog(
  dir,
  recursive = TRUE,
  log = FALSE,
  finalLog = TRUE,
  verbose = TRUE
)

Arguments

dir

The directory name

recursive

Logical, check for sub-directories

log

boolean, import also log file

finalLog

boolean, import also final_log file

verbose

Logical, talkactive function feedback

Details

This function imports the Log file from STAR

Value

a data frame

Author(s)

Daniel Fischer


importVCF

Description

Import a VCF function

Usage

importVCF(
  file,
  na.seq = "./.",
  simplify = TRUE,
  getInfo = FALSE,
  formatFields = NULL
)

Arguments

file

The file name

na.seq

The missing value definition

simplify

Logical

getInfo

Logical

formatFields

Vector with names

Details

This function imports a VCF file.

In case the logicl flag 'phased' is set to TRUE then the genotypes are expected to be in the format 0|0, otherwise they are expected to be like 0/1 . If the flag simplify is set genotypes like 0/2 or 1/2 will be set to 0,1,2 coding and multi-alternatives are ignored.

If you would like to extract in addition to the genotype information further any other data from th vcf file formatted in the FORMAT field, you can specify their names in the formatFields option. Currently, it only accepts a single value.

The example file was downloaded from here:

ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/release/2010_07/exon/snps/

Value

A vcf object

Author(s)

Daniel Fischer

Examples

# Define here the location on HDD for the example file
   fpath <- system.file("extdata","example.vcf", package="GenomicTools.fileHandler")
 # Import the example vcf file  
   importVCF(fpath)

importXML

Description

Import an Blast XML file

Usage

importXML(folder, seqNames = NULL, which = NULL, idTH = 0.8, verbose = TRUE)

Arguments

folder

Character, folder path

seqNames

Names of sequences

which

Which sequences to import

idTH

Use the threshold as cut-off

verbose

Logical, verbose output

Details

This function imports XML files as provided as Blast output, it is mainly aimied to import the output of the hoardeR package

Value

An XML object

Author(s)

Daniel Fischer


plotTotalReads

Description

Plot the total reads

Usage

plotTotalReads(STARLog)

Arguments

STARLog

A STARLog object

Details

This function plots the total reads from a STARlog object

Part of the diagnostic plot series for of the STARLog. The function accepts also a list of STARLogs and creates then comparative boxplots

Value

A plot

Author(s)

Daniel Fischer


plotUniquelyMappedReads

Description

Plot the uniquely mapped reads

Usage

plotUniquelyMappedReads(STARLog)

Arguments

STARLog

A STARLog object

Details

This function plots the percenage of uniquely reads from a STARlog object

Part of the diagnostic plot series for of the STARLog. The function accepts also a list of STARLogs and creates then comparative boxplots

Value

A plot

Author(s)

Daniel Fischer


prereadGTF

Description

Preread a gtf file and prints features of it for importing it.

Usage

prereadGTF(file, nrow = 1000, skip = "auto")

Arguments

file

Filename

nrow

Number of rows to read

skip

Rows to skip from top

Details

This function reads in a gtf file and prints its features for the import step.

By default this function only imports the first 1000 rows, in case all rows should be imported set nrow=-1.

The number to skip in the beginning can be adjusted by the skip option. The default is here auto so that the function can identify the correct amount of header rows. Hence, this option should be changed only, if there is a good reason.

Value

A list of available features

Author(s)

Daniel Fischer


Print a bed Object

Description

Prints a bed object.

Usage

## S3 method for class 'bed'
print(x, n = 6, ...)

Arguments

x

Object of class bed.

n

Number of lines to print

...

Additional parameters

Details

The print function displays a bed object

Author(s)

Daniel Fischer


Print a fa Object

Description

Prints a fa object.

Usage

## S3 method for class 'fa'
print(x, n = 2, seq.out = 50, ...)

Arguments

x

Object of class fa.

n

Number of sequences to display

seq.out

Length of the subsequence to display

...

Additional parameters

Details

The print function displays a fa object

Author(s)

Daniel Fischer


Print a featureCounts Object

Description

Prints an featureCounts object.

Usage

## S3 method for class 'featureCounts'
print(x, ...)

Arguments

x

Object of class featureCounts.

...

Additional parameters

Details

The print function displays a featureCounts object

Author(s)

Daniel Fischer


Print a fq Object

Description

Prints a fq object.

Usage

## S3 method for class 'fq'
print(x, n = 2, seq.out = 50, print.qual = TRUE, ...)

Arguments

x

Object of class fq.

n

Number of sequences to display

seq.out

Length of the subsequence to display

print.qual

Logical, shall the quality measures also be printed

...

Additional parameters

Details

The print function displays a fa object

Author(s)

Daniel Fischer


Print a gtf Object

Description

Prints a gtf object.

Usage

## S3 method for class 'gtf'
print(x, n = 6, ...)

Arguments

x

Object of class gtf.

n

Number of lines to print

...

Additional parameters

Details

The print function displays a bed object

Author(s)

Daniel Fischer


Print a pedMap Object

Description

Prints an pedMap object.

Usage

## S3 method for class 'pedMap'
print(x, n = 6, m = 6, ...)

Arguments

x

Object of class pedMap.

n

Number of samples to display

m

Number of columns to display

...

Additional parameters

Details

The print function displays a pedMap object

Author(s)

Daniel Fischer


Print a vcf Object

Description

Prints an vcf object.

Usage

## S3 method for class 'vcf'
print(x, n = 6, m = 6, fullHeader = FALSE, ...)

Arguments

x

Object of class vcf.

n

Number of samples to display

m

Number of columns to display

fullHeader

Logical, shall the whole header be printed

...

Additional parameters

Details

The print function displays a vcf object

Author(s)

Daniel Fischer


Summary of a bed Object

Description

Summarizes a bed object.

Usage

## S3 method for class 'bed'
summary(object, ...)

Arguments

object

Object of class bed.

...

Additional parameters

Details

The summary function displays an informative summary of a bed object

Author(s)

Daniel Fischer


Summary of a fa Object

Description

Summarizes a fa object.

Usage

## S3 method for class 'fa'
summary(object, ...)

Arguments

object

Object of class fa.

...

Additional parameters

Details

The summary function displays an informative summary of a fa object

Author(s)

Daniel Fischer


Summary of a featureCounts Object

Description

Summarizes a featureCounts object.

Usage

## S3 method for class 'featureCounts'
summary(object, ...)

Arguments

object

Object of class featureCounts.

...

Additional parameters

Details

The summary function displays an informative summary of a featureCounts object

Author(s)

Daniel Fischer


Summary of a fq Object

Description

Summarizes a fq object.

Usage

## S3 method for class 'fq'
summary(object, ...)

Arguments

object

Object of class fq.

...

Additional parameters

Details

The summary function displays an informative summary of a fq object

Author(s)

Daniel Fischer


Summary of a gtf Object

Description

Summarizes a gtf object.

Usage

## S3 method for class 'gtf'
summary(object, ...)

Arguments

object

Object of class gtf.

...

Additional parameters

Details

The summary function displays an informative summary of a gtf object

Author(s)

Daniel Fischer


Summary of a STARLog Object

Description

Summarizes a STARLog object.

Usage

## S3 method for class 'STARLog'
summary(object, ...)

Arguments

object

Object of class STARLog.

...

Additional parameters

Details

The summary function displays an informative summary of a STARLog object

Author(s)

Daniel Fischer