R/load_data.R
extract_variants_from_vcf.Rd
Aaron - Need to describe differnce between ID, and name in the header, and rename in terms of naming the sample. Need to describe differences in multiallelic choices. Also need to describe the automatic error fixing
extract_variants_from_vcf( vcf, id = NULL, rename = NULL, sample_field = NULL, filter = TRUE, multiallele = c("expand", "exclude"), extra_fields = NULL )
vcf | Location of vcf file |
---|---|
id | ID of the sample to select from VCF. If |
rename | Rename the sample to this value when extracting variants.
If |
sample_field | Some algoriths will save the name of the
sample in the ##SAMPLE portion of header in the VCF (e.g.
##SAMPLE=<ID=TUMOR,SampleName=TCGA-01-0001>). If the ID is specified via the
|
filter | Exclude variants that do not have a |
multiallele | Multialleles are when multiple alternative variants
are listed in the same row in the vcf. One of |
extra_fields | Optionally extract additional fields from the |
Returns a data.table of variants from a vcf
vcf_file <- system.file("extdata", "public_LUAD_TCGA-97-7938.vcf", package = "musicatk") library(VariantAnnotation) #> Loading required package: MatrixGenerics #> Loading required package: matrixStats #> #> Attaching package: ‘matrixStats’ #> The following objects are masked from ‘package:Biobase’: #> #> anyMissing, rowMedians #> #> Attaching package: ‘MatrixGenerics’ #> The following objects are masked from ‘package:matrixStats’: #> #> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse, #> colCounts, colCummaxs, colCummins, colCumprods, colCumsums, #> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs, #> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats, #> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds, #> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads, #> colWeightedMeans, colWeightedMedians, colWeightedSds, #> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet, #> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods, #> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps, #> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins, #> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks, #> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars, #> rowWeightedMads, rowWeightedMeans, rowWeightedMedians, #> rowWeightedSds, rowWeightedVars #> The following object is masked from ‘package:Biobase’: #> #> rowMedians #> Loading required package: GenomeInfoDb #> Loading required package: S4Vectors #> Loading required package: stats4 #> #> Attaching package: ‘S4Vectors’ #> The following object is masked from ‘package:NMF’: #> #> nrun #> The following object is masked from ‘package:pkgmaker’: #> #> new2 #> The following objects are masked from ‘package:base’: #> #> expand.grid, I, unname #> Loading required package: IRanges #> Loading required package: GenomicRanges #> Loading required package: SummarizedExperiment #> Loading required package: Rsamtools #> Loading required package: Biostrings #> Loading required package: XVector #> #> Attaching package: ‘Biostrings’ #> The following object is masked from ‘package:base’: #> #> strsplit #> #> Attaching package: ‘VariantAnnotation’ #> The following object is masked from ‘package:base’: #> #> tabulate vcf <- readVcf(vcf_file) variants <- extract_variants_from_vcf(vcf = vcf)