Package 'hoardeR' reference manual

Title:	Collect and Retrieve Annotation Data for Various Genomic Data Using Different Webservices
Description:	Cross-species identification of novel gene candidates using the NCBI web service is provided. Further, sets of miRNA target genes can be identified by using the targetscan.org API.
Authors:	Daniel Fischer [aut, cre], Anu Sironen [aut]
Maintainer:	Daniel Fischer <[email protected]>
License:	GPL (>= 2)
Version:	0.10
Built:	2025-02-25 05:17:55 UTC
Source:	https://github.com/fischuu/hoarder

Collect and Retrieve Annotation Data for Various Genomic Data Using Different Web Services.

Description

The hoardeR package is designed for collecting, retrieving and transforming data from various sources. The current main focus is on setting up a connection to the NCBI Blast service. Also, the gene information for Ensembl Genes can be retrieved from NCBI. Methods for visualizing the results are also provided. The latest developer version of the package can be downloaded from

https://github.com/fischuu/hoardeR

Details

Package:	hoardeR
Type:	Package
Version:	0.10
Date:	2024-04-08
License:	GPL
LazyLoad:	yes

Author(s)

Daniel Fischer, Anu Sironen

Maintainer: Daniel Fischer <[email protected]>

Sending Genomic Sequences to NCBI Blast service

Description

This function sends genomic sequences to the NCBI Blast service.

Usage

  blastSeq(seq, n_blast=20, delay_req=10, delay_rid=60, email=NULL, 
           xmlFolder=NULL, logFolder=NULL, keepInMemory=FALSE,
           database="refseq_genomes", verbose=TRUE, createLog=TRUE)
blastSeq(seq, n_blast=20, delay_req=10, delay_rid=60, email=NULL, 
           xmlFolder=NULL, logFolder=NULL, keepInMemory=FALSE,
           database="refseq_genomes", verbose=TRUE, createLog=TRUE)

Arguments

`seq`	The fasta sequence that should be blasted (`String`).
`n_blast`	Amount of parallel blast requests, in case `seq` is a vector.
`delay_req`	Seconds between the single Blast requests.
`delay_rid`	Seconds between the single result requests.
`email`	User email, required information from NCBI (`String`).
`xmlFolder`	Path to the result folder.
`logFolder`	Path to the log folder.
`keepInMemory`	Logical, shall the results be kept in the memory.
`database`	The NCBI database to use.
`verbose`	Shall the program give extensive feedback.
`createLog`	Create log files, needed for continuing a crashed program.

Details

This function sends fasta sequences to the NCBI blast service. The defaults for the delays are required by NCBI and must not be smaller than the default values. Also, NCBI asks the user to provide an email address.

The input seq can be a vector of strings. In that case the sequences are one after another processed. The option n_blast sets then the upper threshold of how many blast requests are send to the NCBI Blast service at a time and kept running there parallel. It is here in the users obligation not to misuse the service with too many parallel requests.

The xmlFolder parameter specifies the folder to where the XML results will be stored. In case the folder does not exist, R will create it.

In case the option keepInMemory is set to TRUE the Blast results will be kept in memory, otherwise they will be just written to the HDD. Especially if many sequences are send to the blast service it is recommended to drop the result from the memory, meaning to set the option keepinMemory=FALSE. The option keepinMemory=TRUE is currently still under development and should not be used.

If log files should be written (createLog=TRUE) a log path should be given in logPath. However, if a xmlPath is given and the option createLog=TRUE is set, then the log folder will be automatically created in the parental folder of the xmlFolder and is called logs.

Value

An xml file that contains the the NCBI result.

Author(s)

Daniel Fischer

Examples

## Not run: 
blastSeq("ACGTGCATCGACTAGCTACGACTACGACTATC", email="[email protected]")

## End(Not run)
## Not run: 
blastSeq("ACGTGCATCGACTAGCTACGACTACGACTATC", email="[email protected]")

## End(Not run)

Calculation of the coverage density

Description

Calculates the coverage density.

Usage

  coverageDensity(folder, chr=c(1:22,"X","Y","MT"), chr.length=NULL,
                  posneg=FALSE, verbose=TRUE, use.sqrt=FALSE,
                  kernel.package="slideWindowSum",
                  step.size=50000, window.size=100000, bw=100)
coverageDensity(folder, chr=c(1:22,"X","Y","MT"), chr.length=NULL,
                  posneg=FALSE, verbose=TRUE, use.sqrt=FALSE,
                  kernel.package="slideWindowSum",
                  step.size=50000, window.size=100000, bw=100)

Arguments

`folder`	folder with bam files
`chr`	Chromosome names to be plotted.
`chr.length`	Length of chromosome
`posneg`	Logical, plot pos and neg strand
`verbose`	Logical, verbose output
`use.sqrt`	Logical, apply sqrt transformation
`kernel.package`	Class of kernel smoother
`step.size`	Step size in bases
`window.size`	Window size in bases
`bw`	Bandwidth parameter

Details

This function calculates the coverage of bam-files

Author(s)

Daniel Fischer

Search in the `species`' Object.

Description

This function output rows from the species object that contain a certain string.

Usage

  findSpecies(string)
findSpecies(string)

Arguments

string

Search string.

Details

This function output rows from the species object that contain a certain string. It uses the grepl function to identify the corresponding rows.

Value

A data.frame.

Author(s)

Daniel Fischer

Examples

findSpecies("cattle")
findSpecies("cattle")

Downloading or Importing of Annotation Data

Description

This function downloads (if needed) the annotation file from a given species from NCBI and loads it into the namespace.

Usage

  getAnnotation(species=NULL, assembly=NULL, annotationFolder=NULL, 
                type="gff3", verbose)
getAnnotation(species=NULL, assembly=NULL, annotationFolder=NULL, 
                type="gff3", verbose)

Arguments

`species`	The scientific name of the species (`String`).
`assembly`	The NCBI assembly version.
`annotationFolder`	The folder where the file will be stored.
`type`	The file extension/format of the annotation file.
`verbose`	Logical, if function gives feedback.

Details

This function downloads for a given species the annotation file, as provided from NCBI. The main parameters basically define the URL, where the file is located. The file is then downloaded into the folder, provided in annotationFolder and then imported to the namespace.

If a file has been downloaded previously, it will be loaded directly from that folder. In case the user wants to use an annotation that is not provided by NCBI, the corresponding files can also be placed into the same folder, following the naming scheme as suggested from the function and the function will load it from there.

Value

A data.table with the annotation information.

Author(s)

Daniel Fischer

Examples

## Not run: 
susScrofa <- getAnnotation(species = "Sus scrofa", 
                           annotationFolder="/home/user/annotation")
                           
homoSapiens <- getAnnotation(species = "Homo sapiens", 
                             annotationFolder="/home/user/annotation")

## End(Not run)
## Not run: 
susScrofa <- getAnnotation(species = "Sus scrofa", 
                           annotationFolder="/home/user/annotation")
                           
homoSapiens <- getAnnotation(species = "Homo sapiens", 
                             annotationFolder="/home/user/annotation")

## End(Not run)

Retrieve Gene Information From the NCBI Database.

Description

This function retrieves for a given Ensembl Number the corresponding information from the NCBI database.

Usage

  getEnsgInfo(ensg)
getEnsgInfo(ensg)

Arguments

ensg

Ensembl ID (String).

Details

This function retrieves for a given Ensembl Number the corresponding information from the NCBI database. The object ensg can also be a vector of Ensembl IDs.

Value

A matrix with information.

Author(s)

Daniel Fischer

Examples

## Not run: 
ensg <- c("ENSG00000174482", "ENSG00000113494")
getEnsgInfo(ensg)

## End(Not run)
## Not run: 
ensg <- c("ENSG00000174482", "ENSG00000113494")
getEnsgInfo(ensg)

## End(Not run)

Get fasta information based on locations in bed-format

Description

For a given fasta and a bed file this function can extract the nucleotide sequences and stores them as fasta file.

Usage

  getFastaFromBed(bed, species=NULL, assembly = NULL, fastaFolder=NULL,
                  verbose=TRUE, export=NULL, fileName=NULL)
getFastaFromBed(bed, species=NULL, assembly = NULL, fastaFolder=NULL,
                  verbose=TRUE, export=NULL, fileName=NULL)

Arguments

`bed`	The location in bed format, see details.
`species`	Define the species.
`assembly`	Assembly identifier.
`fastaFolder`	Location of the fasta files.
`verbose`	Logical, should informative status updates be given.
`export`	Foldername.
`fileName`	Filename to store the FA object.

Details

Function expects as an input a data.frame in bed format. This means, the first column should contain the chromosome, the second the start-coordinates, the third the end-coordinates. The forth column contains the ID of the loci.

If a standard species is used (as defined in the species data frame), the function automatically downloads the required files from NCBI, takes the loci and extracts then the nucleotide sequences from it. If the corresponding assemly is not available from NCBI an own fasta file can be provided. For that the fa-file needs to be in the fastaFolder and follow the same naming system as the NCBI files are labelled. In that case, the function suggests the correct filename for an unknown assembly.

The export function, specifies then a folder to where the fasta file should be stored. If no filename is provided, the filename is then the object name passed to the bed function.

Value

An fa object containing the nucleotide sequences in fasta format.

Author(s)

Daniel Fischer

Examples

## Not run: 

myBed <- data.frame(chr=c(1,2),
                    start=c(235265,12356742),
                    end=c(435265,12386742),
                    gene=c("LOC1", "LOC2"))

myFA <- getFastaFromBed(myBed, species="Homo sapiens", fastaFolder="/home/user/fasta/", export=TRUE)

## End(Not run)
## Not run: 

myBed <- data.frame(chr=c(1,2),
                    start=c(235265,12356742),
                    end=c(435265,12386742),
                    gene=c("LOC1", "LOC2"))

myFA <- getFastaFromBed(myBed, species="Homo sapiens", fastaFolder="/home/user/fasta/", export=TRUE)

## End(Not run)

Extracting Gene Locations

Description

This function extracts the gene locations from an imported gtf file.

Usage

  getGeneLocation(gtf)
getGeneLocation(gtf)

Arguments

gtf

An imported gtf object.

Details

This function extracts the information from an imported gtf object.

Value

A matrix.

Author(s)

Daniel Fischer

Examples

## Not run: 
getGeneLocation(gtf)

## End(Not run)
## Not run: 
getGeneLocation(gtf)

## End(Not run)

Extracting a gene sequence from NCBI database.

Description

This function retrieves a gene sequence from the NCBI database.

Usage

  getGeneSeq(chr, start, end, organism)
getGeneSeq(chr, start, end, organism)

Arguments

`chr`	Chromosome number, numeric/string
`start`	Start position, numeric
`end`	End position, numeric
`organism`	Name of the organism, string

Details

Extracting a gene sequence from NCBI database. For a list of available organism, visit

http://genome.ucsc.edu/cgi-bin/das/dsn. All id="." field are available.

Value

A string that contains the genomic sequence.

Author(s)

Daniel Fischer

Examples

## Not run: 
# Extracting for Sus Scrofa, build version 3:
getGeneSeq(1,2134,14532,"susScr3")
getGeneSeq(10,1233312,1233350,"hg38")

## End(Not run)
## Not run: 
# Extracting for Sus Scrofa, build version 3:
getGeneSeq(1,2134,14532,"susScr3")
getGeneSeq(10,1233312,1233350,"hg38")

## End(Not run)

Extracts a sequence from the NCBI webpage

Description

Retrieve a sequence from the NCBI webpage

Usage

  getSequenceFromNCBI(id, file=NULL)
getSequenceFromNCBI(id, file=NULL)

Arguments

`id`	The gene identifier
`file`	File name to where the sequence shall be stored

Details

This function extracts the sequence for a given identifer and then stores, if requested the sequence to the HDD.

Author(s)

Daniel Fischer

Intersect XML object with annotation object

Description

For a annotation object this function intersects the loci of it with the output of the tableSpecies function.

Usage

  intersectXMLAnnot(tabSpecies, annot, level="gene", flanking=NULL)
intersectXMLAnnot(tabSpecies, annot, level="gene", flanking=NULL)

Arguments

`tabSpecies`	The table with locations from `tableSpecies`.
`annot`	The annotation object.
`level`	The level of intersection.
`flanking`	Allowed flanking space for intersection.

Details

Function expects as an input table from tableSpecies with the option locations=TRUE. Further, it needs an annotation object, as provided by the getAnnotation function. With that it intersects then the loci on the level as specified in level. Currently only "gene" is supported.

The flanking option allows for flanking space up- and down-stream of the genes. This is especially then useful if the novel gene candidates are in the extension of known genes (e.g. responsible for regulation or if they are novel exons.)

Value

A table with intersection loci.

Author(s)

Daniel Fischer

Examples

## Not run: 

pigHits <- tableSpecies(xmls, species="Sus scrofa", locations = TRUE)
ssannot <- getAnnotation(species = "Sus scrofa", annotationFolder="/home/user/annotation")
pigInter <- list()
for(i in 1:nrow(pigHits)){
   pigInter[[i]] <- intersectXMLAnnot(pigHits[i,], ssannot)
}

## End(Not run)

## Not run: 

pigHits <- tableSpecies(xmls, species="Sus scrofa", locations = TRUE)
ssannot <- getAnnotation(species = "Sus scrofa", annotationFolder="/home/user/annotation")
pigInter <- list()
for(i in 1:nrow(pigHits)){
   pigInter[[i]] <- intersectXMLAnnot(pigHits[i,], ssannot)
}

## End(Not run)

Plots a coverage density object

Description

Plots a coverage density object.

Usage

  plotCoverage(x, use.sqrt=TRUE)
plotCoverage(x, use.sqrt=TRUE)

Arguments

`x`	A coverage density object
`use.sqrt`	Logical, use sqrt scale?

Details

This function plots the coverage of bam-files

Author(s)

Daniel Fischer

Visualization of a cross-species hit

Description

For each cross-species hit the function plots the similarity within that area together with an optional annotation and coverage track.

Usage

  plotHit(hits, flanking=1, window=NULL, annot=TRUE, coverage=FALSE,
          smoothPara=NULL, diagonal=0.25, verbose=TRUE, output=FALSE,
          hitSpecies=NULL, hitSpeciesAssembly=NULL, origSpecies=NULL,
          origSpeciesAssembly=NULL, fastaFolder=NULL, origAnnot=NULL,
          hitAnnot=NULL, nTick=5, which=NULL, figureFolder=NULL,
          figurePrefix=NULL, indexOffset=0, bamFolder=NULL, bamFiles=NULL,
          groupIndex=NULL, groupColor=NULL, countWindow=NULL)
plotHit(hits, flanking=1, window=NULL, annot=TRUE, coverage=FALSE,
          smoothPara=NULL, diagonal=0.25, verbose=TRUE, output=FALSE,
          hitSpecies=NULL, hitSpeciesAssembly=NULL, origSpecies=NULL,
          origSpeciesAssembly=NULL, fastaFolder=NULL, origAnnot=NULL,
          hitAnnot=NULL, nTick=5, which=NULL, figureFolder=NULL,
          figurePrefix=NULL, indexOffset=0, bamFolder=NULL, bamFiles=NULL,
          groupIndex=NULL, groupColor=NULL, countWindow=NULL)

Arguments

`hits`	The hit object to be plotted.
`flanking`	Allowed flanking site in Mb.
`window`	Moving window size of similarity measure.
`annot`	Logical, add annotation track
`coverage`	Logical, add coverage track
`smoothPara`	Smoothing parameter for coverage
`diagonal`	Threshold for allowed diagonal similarity
`verbose`	Logical, shall the function give status updates
`output`	Logical, shall numerical results be given
`hitSpecies`	Scientific identifier of the hit species.
`hitSpeciesAssembly`	Version of the hit species assembly
`origSpecies`	Scientific name of the original species
`origSpeciesAssembly`	Version of the original species
`fastaFolder`	Location of the fasta files
`origAnnot`	Annotation object of the original species
`hitAnnot`	Annotation object of the hit species
`nTick`	Number of ticks on the annotation track
`which`	Which hits should be plotted
`figureFolder`	Folder where Figures should be stored
`figurePrefix`	Prefix of the figure filenames
`indexOffset`	Offset of the running index of the filenames
`bamFolder`	Folder with the bam-files
`bamFiles`	Filenames of the bam-files
`groupIndex`	Index of subgroups in the bamfiles
`groupColor`	Vector with colors, one for each subgroup
`countWindow`	Window size to count the reads from bam-files.

Details

This function is the workhorse of hoardeR and visualizes the findings of the blast and intersection runs. It is really flexibel to handle the hits and hence there are many different options. The required options are hits, hitSpecies, origSpecies and fastaFolder.

The hit object is an object as provided by intersectXMLAnnot and contains all intersections of interest (=intersections that are in close proximity of a gene in the hit species). Naturally the hit and the original species have to be specified as well as the folder, where the required fasta files are stored, or to where they should be downloaded. If the species are the default species from Ensembl (as can be seen in the data.frame species), the annotation and assembly will be automatically downloaded to the specified location on the harddrive. Changes from that version can be adjusted with the the hitSpeciesAssembly and origSpeciesAssembly options, but the filenames have still to match the convention, as they are provided by NCBI.

If in addition to the similarity also a coverage track should be added, the option coverage has to be set to TRUE. The option smoothPara sets then the level of smoothing of the coverage. By default no smoothing will be applied.

In case an annotation track is requested (annot=TRUE), the annotation objects need to be provided to the origAnnot and hitAnnot options.

The option diagonal defines the minimum level of similarity so that a (diagonal) match will be plotted. The colors are then towards green for total similarity and towards red for total disagree, based on a nucleotide mismatch matrix.

If the option verbose=TRUE is set, the function gives a verbose output while running. Further, if output=TRUE then, in addition to the figure also a data.frame with the numerical results is provided.

In case that hits contains more than one hit, the plotHit function plots for each hit a figure. In that case a folder should be provided to where the figures should be stored, this can be done with the figureFolder and figurePrefix options. In case only asserted hits of hits shall be plotted, they can be selected with the which option.

The function can also plot a coverage track over the similarity. For that, the option coverage=TRUE has to be set and a folder that contains the necessary bam-files has to be specified in bamFolder. By default all bam files in that folder are used, if only a subset is requested, the filenames can be specified in bamFiles. In case several bam-files are given, the average coverage at each loci is used. Further, if the data contains subgroups (e.g. case/control), the vector groupIndex gives the group labels. Naturally its length should be similar to bamFiles (or similar to the total amount of files in the bam-folder). In case that more than one group is plotted in the coverage track, their colors can be defined in groupColor. Of course, this vector has to be as long as the number of groups are defined. The option countWindow controls the moving window length in which the number of counts is calculated. The default is the same length as the hit.

Value

Optional, a table with intersection loci.

Author(s)

Daniel Fischer

Examples

## Not run: 
pigInter.flank <- list()
for(i in 1:nrow(pigHits)){
   pigInter.flank[[i]] <- intersectXMLAnnot(pigHits[i,], ssannot, flanking=100)
}
# Basic usage:
plotHit(hits=pigInter.flank,
        flanking=100,
        hitSpecies = "Sus scrofa",
        origSpecies = "Bos taurus",
        fastaFolder = "/home/user/fasta/",
        figureFolder = "/home/user/figures/") 

# Annotation tracks added:
plotHit(hits=pigInter.flank,
        flanking=100,
        hitSpecies = "Sus scrofa",
        origSpecies = "Bos taurus",
        fastaFolder = "/home/user/fasta/",
        figureFolder = "/home/user/figures/",
        origAnnot=btannot,
        hitAnnot=ssannot)
        
# Annotation and coverage added:
plotHit(hits=pigInter.flank,
        flanking=100,
        hitSpecies = "Sus scrofa",
        origSpecies = "Bos taurus",
        fastaFolder = "/home/daniel/fasta/",
        figureFolder = "/home/user/figures/",
        origAnnot=btannot,
        hitAnnot=ssannot
        coverage=TRUE,
        bamFolder = "/home/users/bams/") 

## End(Not run)
## Not run: 
pigInter.flank <- list()
for(i in 1:nrow(pigHits)){
   pigInter.flank[[i]] <- intersectXMLAnnot(pigHits[i,], ssannot, flanking=100)
}
# Basic usage:
plotHit(hits=pigInter.flank,
        flanking=100,
        hitSpecies = "Sus scrofa",
        origSpecies = "Bos taurus",
        fastaFolder = "/home/user/fasta/",
        figureFolder = "/home/user/figures/") 

# Annotation tracks added:
plotHit(hits=pigInter.flank,
        flanking=100,
        hitSpecies = "Sus scrofa",
        origSpecies = "Bos taurus",
        fastaFolder = "/home/user/fasta/",
        figureFolder = "/home/user/figures/",
        origAnnot=btannot,
        hitAnnot=ssannot)
        
# Annotation and coverage added:
plotHit(hits=pigInter.flank,
        flanking=100,
        hitSpecies = "Sus scrofa",
        origSpecies = "Bos taurus",
        fastaFolder = "/home/daniel/fasta/",
        figureFolder = "/home/user/figures/",
        origAnnot=btannot,
        hitAnnot=ssannot
        coverage=TRUE,
        bamFolder = "/home/users/bams/") 

## End(Not run)

Print an fa Object

Description

Prints an fa object.

Usage

 ## S3 method for class 'fa'
print(x, n=2, seq.out=50, ...)
## S3 method for class 'fa'
print(x, n=2, seq.out=50, ...)

Arguments

`x`	Object of class `fa`.
`n`	Amount of elements to be displayed, numeric.
`seq.out`	Length of each element to be displayed, numeric..
`...`	Additional parameters.

Details

The print function displays an fa object. By default just the first two elements with their first 50 bases are displayed. To display the full sequence, set seq.out=NULL.

Author(s)

Daniel Fischer

Available species at NCBI

Description

This is a list of all organisms/species that are provided by NCBI and hence could end up in the Blast run. Further, it defines the default versions of the assuemblies that will be downloaded if no further version is specified in plotHit, getAnnotation or getFastaFromBed.

Format

A data frame with 348 species.

Source

As downloaded on 05.10.2016 from

ftp://ftp.ncbi.nlm.nih.gov/genomes/

Examples


data(species)
summary(species) 

data(species)
summary(species)

Rewrite the Dose File from a Beagle Output

Description

This function takes a Dose Beagle output and rewrites the output.

Usage

  subDose(file=NULL, vmmk=NULL, out=NULL, removeInsertions=TRUE, verbose=TRUE)
subDose(file=NULL, vmmk=NULL, out=NULL, removeInsertions=TRUE, verbose=TRUE)

Arguments

`file`	Location of the original Beagle file (`String`).
`vmmk`	Location of the Variant Map Master key (`String`).
`out`	Name and location of the output file (`String`).
`verbose`	The function gives feedback.
`removeInsertions`	All Indels will be removed..

Details

This function takes a Beagle Dose file and rewrites the alleles from numerical to character, based on the information provided in a variant map master key.

Value

A rewritten beagle phased file.

Author(s)

Daniel Fischer

Rewrite the Gprobs File from a Beagle Output

Description

This function takes a Gprobs Beagle output and rewrites the output.

Usage

  subGprobs(file=NULL, vmmk=NULL, out=NULL, chunkSize=100000, removeInsertions=TRUE,
             verbose = TRUE, writeOut=TRUE)
subGprobs(file=NULL, vmmk=NULL, out=NULL, chunkSize=100000, removeInsertions=TRUE,
             verbose = TRUE, writeOut=TRUE)

Arguments

`file`	Location of the original Beagle file (`String`).
`vmmk`	Location of the Variant Map Master key (`String`).
`out`	Name and location of the output file (`String`).
`chunkSize`	For large Beagle files, the chunk size.
`removeInsertions`	All Indels will be removed.
`verbose`	The function gives feedback.
`writeOut`	Logical, write the output back to the HDD.

Details

This function takes a Beagle Gprobs file and rewrites the alleles from numerical to character, based on the information provided in a variant map master key. For larger files the function can process the rewriting in chunks in order to save memory.

Value

A rewritten beagle Gprobs file.

Author(s)

Daniel Fischer

Rewrite the Phased File from a Beagle Output

Description

This function takes a phased Beagle output and rewrites the output.

Usage

  subPhased(file=NULL, vmmk = NULL, out=NULL, chunkSize=100000, verbose=TRUE,
            removeInsertions=TRUE)
subPhased(file=NULL, vmmk = NULL, out=NULL, chunkSize=100000, verbose=TRUE,
            removeInsertions=TRUE)

Arguments

`file`	Location of the original Beagle file (`String`).
`vmmk`	Location of the Variant Map Master key (`String`).
`out`	Name and location of the output file (`String`).
`chunkSize`	For large Beagle files, the chunk size.
`verbose`	The function gives feedback.
`removeInsertions`	All Indels will be removed.

Details

This function takes a Beagle phased file and rewrites the alleles from numerical to character, based on the information provided in a variant map master key. For larger files the function can process the rewriting in chunks in order to save memory.

Value

A rewritten beagle phased file.

Author(s)

Daniel Fischer

Summarize an fa Object

Description

Summarizes and prints an fa object in an informative way.

Usage

 ## S3 method for class 'fa'
summary(object, ...)
## S3 method for class 'fa'
summary(object, ...)

Arguments

`object`	Object of class `fa`.
`...`	Additional parameters.

Details

Summary for a fa object, providing the amount of sequences, the minimum and maximum length as well as the average length.

Author(s)

Daniel Fischer

Tables the species in xml file

Description

Tables the species in xml file

Usage

  tableSpecies(xml, species=NULL, type="chr", minOutput=TRUE, exclude="",
               locations=FALSE)
tableSpecies(xml, species=NULL, type="chr", minOutput=TRUE, exclude="",
               locations=FALSE)

Arguments

`xml`	The xml file.
`species`	Restrict species to a certain set.
`type`	Filter option.
`minOutput`	Logical, should the output be minimal.
`exclude`	Names of species to exclude.
`locations`	Logical, shall the hit locations be given as well.

Details

Function provides a table of identified species. This table can e.g. be put into the barplot function to visualize the findings.

Further, if the option locations is set to TRUE the function not only tables the species, but also the individual locations of the hits. This output is required for the further steps. Hence, this function plays a important role in the identification pipeline.

Be default the option type="chr" is set so that only hits in species will full genomes will be reported. Further, the species names are intersected with the species data frame and only those that appear there are reported.

Value

A table with the species from the XML file

Author(s)

Daniel Fischer

Examples

## Not run: 
tableSpecies(xmls)
pigHits <- tableSpecies(xmls, species="Sus scrofa", locations = TRUE)

## End(Not run)
## Not run: 
tableSpecies(xmls)
pigHits <- tableSpecies(xmls, species="Sus scrofa", locations = TRUE)

## End(Not run)

Retrieving miRNA target information from targetscan.org

Description

This function requests from the webpage targetscan.org the stored information for mirnas.

Usage

   targetScan(mirna=NULL, species=NULL, release="7.1", maxOut=NULL)
targetScan(mirna=NULL, species=NULL, release="7.1", maxOut=NULL)

Arguments

`mirna`	The name of the mirna (`String`).
`species`	The species identifier, see details (`String`).
`release`	The release version of targetscan.org.
`maxOut`	The amount of target genes, default (`NULL`) is all.

Details

This function sends a miRNA name to the targetscan.org webpage, retrieves the information and gives it back as a data.frame. Options for species are "Human", "Mouse", "Rat", "Chimpanzee", "Rhesus", "Cow", "Dog", "Opossum", "Chicken", "Frog".

Value

A data.frame with the following columns

`Ortholog`	The ortholog name of the target gene.
`geneName`	The long description of the target gene.
`consSites`	The total number of conserved sites.
`poorlySites`	The total number of poorly conserved sites.

Author(s)

Daniel Fischer

References

V. Agarwal, G. Bell, J.Nam, et al. (2015): Predicting effective microRNA target sites in mammalian mRNAs. eLife, 4, pages 1-38, doi: 10.7554/eLife.05005

Examples

## Not run: 
targetScan(mirna="miR-9-5p", species="Cow", maxOut=5)

## End(Not run)
## Not run: 
targetScan(mirna="miR-9-5p", species="Cow", maxOut=5)

## End(Not run)

Package 'hoardeR'

Help Index

Collect and Retrieve Annotation Data for Various Genomic Data Using Different Web Services.

Description

Details

Author(s)

Sending Genomic Sequences to NCBI Blast service

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Calculation of the coverage density

Description

Usage

Arguments

Details

Author(s)

Search in the species' Object.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Downloading or Importing of Annotation Data

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Retrieve Gene Information From the NCBI Database.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Get fasta information based on locations in bed-format

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Extracting Gene Locations

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Extracting a gene sequence from NCBI database.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Extracts a sequence from the NCBI webpage

Description

Usage

Arguments

Details

Author(s)

Intersect XML object with annotation object

Description

Usage

Arguments

Details

Search in the `species`' Object.