Title: | File Handlers for Genomic Data Analysis |
---|---|
Description: | A collection of I/O tools for handling the most commonly used genomic datafiles, like fasta/-q, bed, gff, gtf, ped/map and vcf. |
Authors: | Daniel Fischer |
Maintainer: | Daniel Fischer <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.2.2 |
Built: | 2024-10-25 03:46:33 UTC |
Source: | https://github.com/fischuu/genomictools.filehandler |
Package: | GenomicTools.fileHandler |
Type: | Package |
Version: | 0.2 |
Date: | 2024-01-16 |
License: | GPL |
LazyLoad: | yes |
Daniel Fischer
Maintainer: Daniel Fischer <[email protected]>
This file contains some example lines to represent a typical bed file that can be used to try the corresponding functions.
A file with three column Chr, Start and End.
The file is locate din the /extdata folder of the package and is accessible after installation via
system.file("extdata","example.bed", package="GenomicTools.fileHandler")
Daniel Fischer
This file contains some example reads to represent a typical fasta file that can be used to try the corresponding functions.
The file is locate din the /extdata folder of the package and is accessible after installation via
system.file("extdata","example.fasta", package="GenomicTools.fileHandler")
Daniel Fischer
This file contains some example reads to represent a typical fastq file that can be used to try the corresponding functions.
The file is locate din the /extdata folder of the package and is accessible after installation via
system.file("extdata","example.fastq", package="GenomicTools.fileHandler")
Daniel Fischer
This file contains some example gene annotations to represent a typical gff file that can be used to try the corresponding functions.
The file is locate din the /extdata folder of the package and is accessible after installation via
system.file("extdata","example.gff", package="GenomicTools.fileHandler")
Daniel Fischer
This file contains some example gene annotations to represent a typical gtf file that can be used to try the corresponding functions.
The file is locate din the /extdata folder of the package and is accessible after installation via
system.file("extdata","example.gtf", package="GenomicTools.fileHandler")
Daniel Fischer
This file contains some example variants to represent a typical ped/map file pair that can be used to try the corresponding functions.
The file is locate din the /extdata folder of the package and is accessible after installation via
system.file("extdata","example.ped", package="GenomicTools.fileHandler")
Daniel Fischer
This file contains some example variants to represent a typical vcf file that can be used to try the corresponding functions.
The file is locate din the /extdata folder of the package and is accessible after installation via
system.file("extdata","example.vcf", package="GenomicTools.fileHandler")
Daniel Fischer
This file contains some example gene annotations to represent a typical zipped gtf file that can be used to try the corresponding functions.
The file is locate din the /extdata folder of the package and is accessible after installation via
system.file("extdata","example2.gtf.gz", package="GenomicTools.fileHandler")
Daniel Fischer
This function exports a standard bed file.
exportBed(x, file = NULL, header = FALSE)
exportBed(x, file = NULL, header = FALSE)
x |
data.frame |
file |
Character, specifies filename/path |
header |
Logical, shall a header be written |
This function exports a data.frame to a standard bed file. If no file name is given, the variable name will be used instead.
A bed file
Daniel Fischer
novelBed <- data.frame(Chr=c(11,18,3), Start=c(72554673, 62550696, 18148822), End=c(72555273, 62551296, 18149422), Gene=c("LOC1", "LOC2", "LOC3")) # Create a temporary file to where the output of the function is stored myfile <- file.path(tempdir(), "myLocs.bed") exportBed(novelBed, file=myfile) exportBed(novelBed, file=myfile, header=TRUE)
novelBed <- data.frame(Chr=c(11,18,3), Start=c(72554673, 62550696, 18148822), End=c(72555273, 62551296, 18149422), Gene=c("LOC1", "LOC2", "LOC3")) # Create a temporary file to where the output of the function is stored myfile <- file.path(tempdir(), "myLocs.bed") exportBed(novelBed, file=myfile) exportBed(novelBed, file=myfile, header=TRUE)
This function exports a standard fasta file.
exportFA(fa, file = NULL)
exportFA(fa, file = NULL)
fa |
fasta object |
file |
Character, specifies filename/path |
This function exports a fasta object to a standard fasta file. If no file name is given, the variable name will be used instead.
A fasta file
Daniel Fischer
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.fasta", package="GenomicTools.fileHandler") # Import the example fasta file fastaFile <- importFA(file=fpath) newFasta <- fastaFile[1:5] myfile <- file.path(tempdir(), "myLocs.fa") exportFA(newFasta, file=myfile)
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.fasta", package="GenomicTools.fileHandler") # Import the example fasta file fastaFile <- importFA(file=fpath) newFasta <- fastaFile[1:5] myfile <- file.path(tempdir(), "myLocs.fa") exportFA(newFasta, file=myfile)
This function exports a standard gtf file.
exportGTF(x, file)
exportGTF(x, file)
x |
gtf-object |
file |
Character, specifies filename/path |
This function exports a gtf-object to a standard gtf file.
A gtf file
Daniel Fischer
This function imports a standard bed file
importBed(file, header = FALSE, sep = "\t")
importBed(file, header = FALSE, sep = "\t")
file |
Specifies the filename/path |
header |
Logical, is a header present |
sep |
Column separator |
This function imports a standard bed-file into a data.frame. It is basically a convenience wrapper around read.table
. However,
if no header lines is given, this function automatically assigns the column names, as they are given in the bed-specification on the
Ensembl page here: https://www.ensembl.org/info/website/upload/bed.html
A data.frame
Daniel Fischer
[exportBed], [read.table]
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.bed", package="GenomicTools.fileHandler") # Import the example bed file bedFile <- importBed(file=fpath)
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.bed", package="GenomicTools.fileHandler") # Import the example bed file bedFile <- importBed(file=fpath)
This function imports a tab delimited blast output.
importBlastTab(file)
importBlastTab(file)
file |
Filename |
This function imports a tab delimited blast output file, currently the same as read.table
A data.frame
Daniel Fischer
This function imports a standard fasta file
importFA(file, verbose = FALSE)
importFA(file, verbose = FALSE)
file |
Specifies the filename/path |
verbose |
Logical, verbose function output |
This function imports a standard fasta file. Hereby, it does not matter if the identifier and sequence are alternating or not, as the rows starting with '>' are used as identifer.
The example file was downloaded from here and was then further truncated respective transformed to fasta format:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/HG00096/sequence_read/
An object of class fa
containing the sequences. The names correspond to the sequence names given in the fasta file.
Daniel Fischer
print.fa, summary.fa
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.fasta", package="GenomicTools.fileHandler") # Import the example fasta file fastaFile <- importFA(file=fpath)
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.fasta", package="GenomicTools.fileHandler") # Import the example fasta file fastaFile <- importFA(file=fpath)
This functions imports the output from FeatureCounts
importFeatureCounts(file, skip = 0, headerLine = 2)
importFeatureCounts(file, skip = 0, headerLine = 2)
file |
Character, file name |
skip |
Number of lines to skip from txt file |
headerLine |
Linenumber that contains the header information |
FeatureCounts produces two files, the txt that contain the expression values and then the summary that containts all the information about the mapping statistics. This function imports both and stores them in a corresponding list.
A list with expValues, geneInfo and summary
Daniel Fischer
# Define here the location on HDD for the example file fpath <- system.file("extdata","featureCountsExample.txt", package="GenomicTools.fileHandler") # Import the example featureCounts file fcFile <- importFeatureCounts(file=fpath)
# Define here the location on HDD for the example file fpath <- system.file("extdata","featureCountsExample.txt", package="GenomicTools.fileHandler") # Import the example featureCounts file fcFile <- importFeatureCounts(file=fpath)
This function imports a standard fastq file
importFQ(file)
importFQ(file)
file |
Specifies the filename/path |
This function imports a standard fastq file that consists out of blocks of four lines per entry
An object of class fq
containing the sequences and the quality meausure. The names correspond to the sequence names given in the fasta file.
Daniel Fischer
print.fq, summary.fq
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.fastq", package="GenomicTools.fileHandler") # Import the example fastq file fastqFile <- importFQ(file=fpath)
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.fastq", package="GenomicTools.fileHandler") # Import the example fastq file fastqFile <- importFQ(file=fpath)
Import a GFF file
importGFF( file, skip = "auto", nrow = -1, use.data.table = TRUE, level = "gene", features = NULL, num.features = c("FPKM", "TPM"), print.features = FALSE, merge.feature = NULL, merge.all = TRUE, class.names = NULL, verbose = TRUE )
importGFF( file, skip = "auto", nrow = -1, use.data.table = TRUE, level = "gene", features = NULL, num.features = c("FPKM", "TPM"), print.features = FALSE, merge.feature = NULL, merge.all = TRUE, class.names = NULL, verbose = TRUE )
file |
file or folder |
skip |
numeric, lines to skip |
nrow |
numeric, lines to read |
use.data.table |
logical |
level |
Character, read level, default: "gene" |
features |
features to import |
num.features |
names of the numeric features |
print.features |
Logical, print available features |
merge.feature |
Character, merge multiple samples to dataset |
merge.all |
Logical, shall all samples be merged together |
class.names |
Definition of class name sin V9 |
verbose |
Logical, verbose function output |
This function imports a standard gff file.
A gff object
Daniel Fischer
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.gff", package="GenomicTools.fileHandler") # Import the example gff file importGFF(fpath)
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.gff", package="GenomicTools.fileHandler") # Import the example gff file importGFF(fpath)
Import a GFF3 file
importGFF3(gff, chromosomes)
importGFF3(gff, chromosomes)
gff |
file or folder |
chromosomes |
The chromosome to import |
This function imports a standard gff3 file.
A gff object
Daniel Fischer
This function imports a gtf file.
importGTF( file, skip = "auto", nrow = -1, use.data.table = TRUE, level = "gene", features = NULL, num.features = c("FPKM", "TPM"), print.features = FALSE, merge.feature = NULL, merge.all = TRUE, class.names = NULL, verbose = TRUE )
importGTF( file, skip = "auto", nrow = -1, use.data.table = TRUE, level = "gene", features = NULL, num.features = c("FPKM", "TPM"), print.features = FALSE, merge.feature = NULL, merge.all = TRUE, class.names = NULL, verbose = TRUE )
file |
file or folder |
skip |
numeric, lines to skip |
nrow |
numeric, lines to read |
use.data.table |
logical |
level |
Character, read level, default: "gene" |
features |
features to import |
num.features |
names of the numeric features |
print.features |
Logical, print available features |
merge.feature |
Character, merge multiple samples to dataset |
merge.all |
Logial, shall all samples be merged |
class.names |
Vector with class names |
verbose |
Logical, verbose function output |
This function imports a gtf file. The features names to be imported are defined in features
, several features are then
provided as vector. A list of available feature can beprinted, by setting print.features=TRUE
.
The skip
option allows to skip a given number of rows, the default is, however, auto
. In that case, all rows that
start with the #
symbol are skipped.
In case a set of expression values given in gtf format should be imported and to be merged into a single data table, the feature
that should be used for merging can be provided to the merge.feature
option. In that case the function expects a folder
in file
and it will import all gtfs located in that folder and merges them according to the merge.feature
option.
With the option class.names
a vector of prefixes for the merged features can be provided. If this is kept empty, then the
filenames of the gtf will be used instead (without gtf extension).
By default the function imprts all features in column 9 as string character. However, for common labels (FPKM and TPM) the class
type is set automatically to numeric. Additional numerical feature names can be defined with the num.feature
option.
A gtf object
Daniel Fischer
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.gtf", package="GenomicTools.fileHandler") # Same file, but this time as gzipped version fpath.gz <- system.file("extdata","example2.gtf.gz", package="GenomicTools.fileHandler") # Import the example gtf file importGTF(fpath, level="transcript", features=c("gene_id","FPKM")) ## Not run: # For the current you need to have zcat installed (should be standard on a Linux system) importGTF(fpath.gz, level="transcript", features=c("gene_id","FPKM")) ## End(Not run)
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.gtf", package="GenomicTools.fileHandler") # Same file, but this time as gzipped version fpath.gz <- system.file("extdata","example2.gtf.gz", package="GenomicTools.fileHandler") # Import the example gtf file importGTF(fpath, level="transcript", features=c("gene_id","FPKM")) ## Not run: # For the current you need to have zcat installed (should be standard on a Linux system) importGTF(fpath.gz, level="transcript", features=c("gene_id","FPKM")) ## End(Not run)
Import a PED/MAP file pair
importPED( file, n, snps = NULL, which, split = "\t| +", sep = ".", na.strings = "0", lex.order = FALSE, verbose = TRUE )
importPED( file, n, snps = NULL, which, split = "\t| +", sep = ".", na.strings = "0", lex.order = FALSE, verbose = TRUE )
file |
ped filename |
n |
Number of samples to read |
snps |
map filename |
which |
Names of SNPS to import |
split |
Columns separator in ped file |
sep |
Character that separates Alleles |
na.strings |
Definition for missing values |
lex.order |
Logical, lexicographical order |
verbose |
Logical, verbose output |
This function is to a large extend taken from snpStat::read.pedmap
, but here is internally the data.table::fread
function used
that resulted in much faster file processing.
To import the data, the ped file can be provided to the file
option and the map file to the snps
option. If no option is given to
snps
and the file
option is provided without any file extension, then the ped/map extension are automaticall added
a pedmap object
Daniel Fischer
# Define here the location on HDD for the example file pedPath <- system.file("extdata","example.ped", package="GenomicTools.fileHandler") mapPath <- system.file("extdata","example.map", package="GenomicTools.fileHandler") # Import the example ped/map files importPED(file=pedPath, snps=mapPath)
# Define here the location on HDD for the example file pedPath <- system.file("extdata","example.ped", package="GenomicTools.fileHandler") mapPath <- system.file("extdata","example.map", package="GenomicTools.fileHandler") # Import the example ped/map files importPED(file=pedPath, snps=mapPath)
Import the Log-File from STAR
importSTARLog( dir, recursive = TRUE, log = FALSE, finalLog = TRUE, verbose = TRUE )
importSTARLog( dir, recursive = TRUE, log = FALSE, finalLog = TRUE, verbose = TRUE )
dir |
The directory name |
recursive |
Logical, check for sub-directories |
log |
boolean, import also log file |
finalLog |
boolean, import also final_log file |
verbose |
Logical, talkactive function feedback |
This function imports the Log file from STAR
a data frame
Daniel Fischer
Import a VCF function
importVCF( file, na.seq = "./.", simplify = TRUE, getInfo = FALSE, formatFields = NULL )
importVCF( file, na.seq = "./.", simplify = TRUE, getInfo = FALSE, formatFields = NULL )
file |
The file name |
na.seq |
The missing value definition |
simplify |
Logical |
getInfo |
Logical |
formatFields |
Vector with names |
This function imports a VCF file.
In case the logicl flag 'phased' is set to TRUE then the genotypes are expected to be in the format 0|0, otherwise they are expected to be like 0/1 . If the flag simplify is set genotypes like 0/2 or 1/2 will be set to 0,1,2 coding and multi-alternatives are ignored.
If you would like to extract in addition to the genotype information further any other data from th vcf file formatted in the FORMAT field, you can specify their names in the formatFields option. Currently, it only accepts a single value.
The example file was downloaded from here:
ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/pilot_data/release/2010_07/exon/snps/
A vcf object
Daniel Fischer
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.vcf", package="GenomicTools.fileHandler") # Import the example vcf file importVCF(fpath)
# Define here the location on HDD for the example file fpath <- system.file("extdata","example.vcf", package="GenomicTools.fileHandler") # Import the example vcf file importVCF(fpath)
Import an Blast XML file
importXML(folder, seqNames = NULL, which = NULL, idTH = 0.8, verbose = TRUE)
importXML(folder, seqNames = NULL, which = NULL, idTH = 0.8, verbose = TRUE)
folder |
Character, folder path |
seqNames |
Names of sequences |
which |
Which sequences to import |
idTH |
Use the threshold as cut-off |
verbose |
Logical, verbose output |
This function imports XML files as provided as Blast output, it is mainly aimied to import the output of the hoardeR package
An XML object
Daniel Fischer
Plot the total reads
plotTotalReads(STARLog)
plotTotalReads(STARLog)
STARLog |
A STARLog object |
This function plots the total reads from a STARlog object
Part of the diagnostic plot series for of the STARLog. The function accepts also a list of STARLogs and creates then comparative boxplots
A plot
Daniel Fischer
Plot the uniquely mapped reads
plotUniquelyMappedReads(STARLog)
plotUniquelyMappedReads(STARLog)
STARLog |
A STARLog object |
This function plots the percenage of uniquely reads from a STARlog object
Part of the diagnostic plot series for of the STARLog. The function accepts also a list of STARLogs and creates then comparative boxplots
A plot
Daniel Fischer
Preread a gtf file and prints features of it for importing it.
prereadGTF(file, nrow = 1000, skip = "auto")
prereadGTF(file, nrow = 1000, skip = "auto")
file |
Filename |
nrow |
Number of rows to read |
skip |
Rows to skip from top |
This function reads in a gtf file and prints its features for the import step.
By default this function only imports the first 1000 rows, in case all rows should be imported set nrow=-1
.
The number to skip in the beginning can be adjusted by the skip
option. The default is here auto
so that
the function can identify the correct amount of header rows. Hence, this option should be changed only, if there is a
good reason.
A list of available features
Daniel Fischer
Prints a bed
object.
## S3 method for class 'bed' print(x, n = 6, ...)
## S3 method for class 'bed' print(x, n = 6, ...)
x |
Object of class |
n |
Number of lines to print |
... |
Additional parameters |
The print function displays a bed object
Daniel Fischer
Prints a fa
object.
## S3 method for class 'fa' print(x, n = 2, seq.out = 50, ...)
## S3 method for class 'fa' print(x, n = 2, seq.out = 50, ...)
x |
Object of class |
n |
Number of sequences to display |
seq.out |
Length of the subsequence to display |
... |
Additional parameters |
The print function displays a fa object
Daniel Fischer
Prints an featureCounts
object.
## S3 method for class 'featureCounts' print(x, ...)
## S3 method for class 'featureCounts' print(x, ...)
x |
Object of class |
... |
Additional parameters |
The print function displays a featureCounts object
Daniel Fischer
Prints a fq
object.
## S3 method for class 'fq' print(x, n = 2, seq.out = 50, print.qual = TRUE, ...)
## S3 method for class 'fq' print(x, n = 2, seq.out = 50, print.qual = TRUE, ...)
x |
Object of class |
n |
Number of sequences to display |
seq.out |
Length of the subsequence to display |
print.qual |
Logical, shall the quality measures also be printed |
... |
Additional parameters |
The print function displays a fa object
Daniel Fischer
Prints a gtf
object.
## S3 method for class 'gtf' print(x, n = 6, ...)
## S3 method for class 'gtf' print(x, n = 6, ...)
x |
Object of class |
n |
Number of lines to print |
... |
Additional parameters |
The print function displays a bed object
Daniel Fischer
Prints an pedMap
object.
## S3 method for class 'pedMap' print(x, n = 6, m = 6, ...)
## S3 method for class 'pedMap' print(x, n = 6, m = 6, ...)
x |
Object of class |
n |
Number of samples to display |
m |
Number of columns to display |
... |
Additional parameters |
The print function displays a pedMap object
Daniel Fischer
Prints an vcf
object.
## S3 method for class 'vcf' print(x, n = 6, m = 6, fullHeader = FALSE, ...)
## S3 method for class 'vcf' print(x, n = 6, m = 6, fullHeader = FALSE, ...)
x |
Object of class |
n |
Number of samples to display |
m |
Number of columns to display |
fullHeader |
Logical, shall the whole header be printed |
... |
Additional parameters |
The print function displays a vcf object
Daniel Fischer
Summarizes a bed
object.
## S3 method for class 'bed' summary(object, ...)
## S3 method for class 'bed' summary(object, ...)
object |
Object of class |
... |
Additional parameters |
The summary function displays an informative summary of a bed object
Daniel Fischer
Summarizes a fa
object.
## S3 method for class 'fa' summary(object, ...)
## S3 method for class 'fa' summary(object, ...)
object |
Object of class |
... |
Additional parameters |
The summary function displays an informative summary of a fa object
Daniel Fischer
Summarizes a featureCounts
object.
## S3 method for class 'featureCounts' summary(object, ...)
## S3 method for class 'featureCounts' summary(object, ...)
object |
Object of class |
... |
Additional parameters |
The summary function displays an informative summary of a featureCounts object
Daniel Fischer
Summarizes a fq
object.
## S3 method for class 'fq' summary(object, ...)
## S3 method for class 'fq' summary(object, ...)
object |
Object of class |
... |
Additional parameters |
The summary function displays an informative summary of a fq object
Daniel Fischer
Summarizes a gtf
object.
## S3 method for class 'gtf' summary(object, ...)
## S3 method for class 'gtf' summary(object, ...)
object |
Object of class |
... |
Additional parameters |
The summary function displays an informative summary of a gtf object
Daniel Fischer
Summarizes a STARLog
object.
## S3 method for class 'STARLog' summary(object, ...)
## S3 method for class 'STARLog' summary(object, ...)
object |
Object of class |
... |
Additional parameters |
The summary function displays an informative summary of a STARLog object
Daniel Fischer