This function creates a musica object from a variant
table or matrix. The musica class stores variants information,
variant-level annotations, sample-level annotations, and count tables and
is used as input to the mutational signature discovery and prediction
algorithms. The input variant table or matrix must have columns for
chromosome, start position, end position, reference allele,
alternate allele, and sample names. The column names in the variant table
can be mapped using the chromosome_col
, start_col
,
end_col
, ref_col
, alt_col
, and
sample_col parameters
.
Usage
create_musica_from_variants(
x,
genome,
check_ref_chromosomes = TRUE,
check_ref_bases = TRUE,
chromosome_col = "chr",
start_col = "start",
end_col = "end",
ref_col = "ref",
alt_col = "alt",
sample_col = "sample",
extra_fields = NULL,
standardize_indels = TRUE,
convert_dbs = TRUE,
verbose = TRUE
)
Arguments
- x
A data.table, matrix, or data.frame that contains columns with the variant information.
- genome
A BSgenome object indicating which genome reference the variants and their coordinates were derived from.
- check_ref_chromosomes
Whether to peform a check to ensure that the chromosomes in the
variant
object match the reference chromosomes in thegenome
object. If there are mismatches, this may cause errors in downstream generation of count tables. If mismatches occur, an attept to be automatically fix these with theseqlevelsStyle
function will be made. DefaultTRUE
.- check_ref_bases
Whether to check if the reference bases in the
variant
object match the reference bases in thegenome
object. DefaultTRUE
.- chromosome_col
The name of the column that contains the chromosome reference for each variant. Default
"chr"
.- start_col
The name of the column that contains the start position for each variant. Default
"start"
.- end_col
The name of the column that contains the end position for each variant. Default
"end"
.- ref_col
The name of the column that contains the reference base(s) for each variant. Default
"ref"
.- alt_col
The name of the column that contains the alternative base(s) for each variant. Default
"alt"
.- sample_col
The name of the column that contains the sample id for each variant. Default
"sample"
.- extra_fields
Which additional fields to extract and include in the musica object. Default
NULL
.- standardize_indels
Flag to convert indel style (e.g. `C > CAT` becomes `- > AT` and `GCACA > G` becomes `CACA > -`)
- convert_dbs
Flag to convert adjacent SBS into DBS (original SBS are removed)
- verbose
Whether to print status messages during error checking. Default
TRUE
.
Examples
maf_file <- system.file("extdata", "public_TCGA.LUSC.maf",
package = "musicatk"
)
variants <- extract_variants_from_maf_file(maf_file)
g <- select_genome("38")
musica <- create_musica_from_variants(x = variants, genome = g)
#> Checking that chromosomes in the 'variant' object match chromosomes in the 'genome' object.
#> Checking that the reference bases in the 'variant' object match the reference bases in the 'genome' object.
#> Standardizing INS/DEL style
#> Converting adjacent SBS into DBS
#> 4 SBS converted to DBS