HY's Farm

[BioInfomatics] BUSCO Installation & Usage


BUSCO v3 Installation & Usage

BUSCO v3 is a tool providing quantitative measures for the assessment of genome assembly, gene set, and transcriptome completeness, based on evolutionarily-informed expectations of gene content from near-universal single-copy orthologs selected from OrthoDB v9.

BUSCO also provides user guide : BUSCO_v3_userguide.pdf

cite : https://busco.ezlab.org

Installation

I executed BUSCO v3 on Ubuntu Server 16.04 LTS. (Amazon AWS)

  1. git clone https://gitlab.com/ezlab/busco.git
  2. move directory to busco.
  3. sudo python setup.py install.

Installation of BUSCO only is finished.

But, it needs additional software to use BUSCO v3 :( (not described in detail in user guide)

Those are NCBI BLAST+ , HMMER , Augustus.

FPTraffic

Especially, installation of Augustus is complex. here is a list of dependency to Augustus.

bamtools, samtools, bcftools, htslib , boost, zlib1g

And, each tool also have a dependency.

bamtools - zlib

htslib - zlib, liblzma, libbzip2

samtools - libncurses5-dev libncursesw5-dev

After installing all dependency to Augustus, should change path of software.

Augustus/auxprogs/bam2hints/Makefile

Augustus/auxprogs/filterBam/src/Makefile

Augustus/auxprogs/bam2wig/Makefile

Then, you can run Augustus perfectly.

I'll post easy-to-use install script soon.

Input

Organism Image Macaca nemestrina (pig-tailed macaque) Macaca nemestrina overview

Download

Usage

python BUSCO.py -i [SEQUENCE_FILE] -l [LINEAGE] -o [OUTPUT_NAME] -m [MODE] -sp [SPECIES] (-c [CPU count])

 -i FASTA FILE, --in 			FASTA FILE
 -l LINEAGE,    --lineage_path  Specify location of the BUSCO lineage data to be used.
 -o OUTPUT,     --out OUTPUT
 -m MODE,       --mode MODE     Specify which BUSCO analysis mode to run.
                                There are three valid modes:
                                - geno or genome, for genome assemblies 
                                - tran or transcriptome, for transcriptome assemblies 
                                - prot or proteins, for annotated gene sets (protein) 
 -sp SPECIES,   --species SPECIES  
                                Name of existing Augustus species gene finding 
                                parameters. See Augustus documentation for available 
                                options.
 -c N,          --cpu N         Specify the number (N=integer) of threads/cores to use.

Result

​ short_summary_result of BUSCO execution

# BUSCO version is: 3.1.0
# The lineage dataset is: mammalia_odb9 (Creation date: 2016-02-13, number of species: 50, number of BUSCOs: 4104)
# To reproduce this run: python busco-master/scripts/run_BUSCO.py -i 
  data/GCF_000956065.1_Mnem_1.0_genomic.fna -o result -l mammalia_odb9/ 
  -m genome -c 16 -sp human

# Summarized benchmarking in BUSCO notation for file data/GCF_000956065.1_Mnem_1.0_genomic.fna
# BUSCO was run in mode: genome

        C:94.7%[S:93.7%,D:1.0%],F:2.1%,M:3.2%,n:4104

        3889    Complete BUSCOs (C)
        3847    Complete and single-copy BUSCOs (S)
        42      Complete and duplicated BUSCOs (D)
        88      Fragmented BUSCOs (F)
        127     Missing BUSCOs (M)
        4104    Total BUSCO groups searched

​ full_table_result of BUSCO execution

# Busco id      Status          Contig        Start           End          Score   Length
EOG090A0FFC   Complete      NW_012014355.1  10381323        10382532        144.0   83
EOG090A0FG2   Complete      NW_012011723.1  598963  642561  190.4   92
EOG090A0FM8   Complete      NW_012012355.1  18069359        18078634        177.3   84
EOG090A0FQ0   Complete      NW_012011278.1  5427743 5447851 159.0   77
EOG090A0FRR   Complete      NW_012012000.1  1219700 1231323 223.9   104
EOG090A0FSN   Complete      NW_012010800.1  1324877 1365806 212.0   99
EOG090A0FSO   Duplicated    NW_012014022.1  22365084        22381425        168.5   89
EOG090A0FSO   Duplicated    NW_012018132.1  13035685        13051851        146.6   88
EOG090A0FTC   Complete      NW_012011100.1  1044303 1045003 187.3   80
EOG090A0FUE   Complete      NW_012012244.1  17031932        17032222        147.5   78
EOG090A0FUU   Missing
EOG090A0FUY   Complete      NW_012010745.1  1404622 1453453 475.2   296
EOG090A0FVS   Complete      NW_012017243.1  22347808        22367367        147.7   75
EOG090A0FW2   Complete      NW_012018354.1  12533257        12538646        170.3   96
EOG090A0FXI   Complete      NW_012013244.1  14331782        14338455        221.1   104
EOG090A0FY5   Complete      NW_012012799.1  6543474 6553096 160.1   101
EOG090A0GAQ   Complete      NW_012010800.1  36211955        36232182        331.4   251


Similar Posts

Content