Genome-wide microsatellite characteristics of five human Plasmodium species, focusing on Plasmodium malariae and P. ovale curtisi.

Microsatellites can be utilized to explore genotypes, population structure, and other genomic features of eukaryotes. Systematic characterization of microsatellites has not been a focus for several species of Plasmodium, including P. malariae and P. ovale, as the majority of malaria elimination programs are focused on P. falciparum and to a lesser extent P. vivax. Here, five human malaria species (P. falciparum, P. vivax, P. malariae, P. ovale curtisi, and P. knowlesi) were investigated with the aim of conducting in-depth categorization of microsatellites for P. malariae and P. ovale curtisi. Investigation of reference genomes for microsatellites with unit motifs of 1-10 base pairs indicates high diversity among the five Plasmodium species. Plasmodium malariae, with the largest genome size, displays the second highest microsatellite density (1421 No./Mbp; 5% coverage) next to P. falciparum (3634 No./Mbp; 12% coverage). The lowest microsatellite density was observed in P. vivax (773 No./Mbp; 2% coverage). A, AT, and AAT are the most commonly repeated motifs in the Plasmodium species. For P. malariae and P. ovale curtisi, microsatellite-related sequences are observed in approximately 18-29% of coding sequences (CDS). Lysine, asparagine, and glutamic acids are most frequently coded by microsatellite-related CDS. The majority of these CDS could be related to the gene ontology terms "cell parts," "binding," "developmental processes," and "metabolic processes." The present study provides a comprehensive overview of microsatellite distribution and can assist in the planning and development of potentially useful genetic tools for further investigation of P. malariae and P. ovale curtisi epidemiology.


Introduction
Recent advancements in gene sequencing technologies and the increasing availability of online genomic resources have made it possible to computationally explore genomic features of an organism that were previously inaccessible [18]. Molecular genetics and polymorphism studies involving microsatellites are among the key beneficiaries of such technological advancement. Microsatellites are short tandem repeats of DNA usually consisting of 1-10 base pair (bp) unit nucleotide motifs. Such microsatellites are known to be formed due to mispairing, improper alignment, and strand-slippage events [13,22,25]. Microsatellites with unit motifs of 2-3 bp are often designated as short tandem repeats, simple sequence repeats, and simple sequence length polymorphisms [44,64]. Moreover, these microsatellites can be characterized as (i) perfect repeats containing only pure motifs with 100% identical copies and constituting only one motif type; (ii) imperfect repeats containing motifs with mutations such as insertions, deletions, or substitution; and (iii) compound microsatellites containing stretches of two or more different repeat motifs [8]. These short tandem repeats of DNA can be highly polymorphic and are widely distributed throughout the genome of eukaryotic cells. Microsatellites are usually abundant in non-coding regions of the genome and can be targeted to produce polymerase chain reaction (PCR) products as markers to identify genetic diversity among a population [13,25]. Currently, microsatellite markers are implemented across wide fields of biology, including gene linkage, genotyping, forensics, kinship relationships, phylogenetic analysis, and others [54]. Conventional procedures for microsatellite studies consist of in vitro microsatellite motif cloning, which is screening of cloned libraries for restricted motif types with often limited prior knowledge of microsatellite categorization and distribution. Such protocols are expensive, time-consuming, suffer from low modularity, and are prone to experimental errors. In contrast, recent years have witnessed a stark increase in the use of in silico tools for the analysis of microsatellites utilizing publicly available genomic databases [55]. Moreover, entire online platforms dedicated to particular groups of organisms (e.g., PlasmoDb and VivaxGen) and genome projects are significantly enhancing the range and accuracy of in silico analysis approaches [4,62].
Decades long malaria intervention strategies have significantly reduced the number of malaria cases and fatalities worldwide. The Greater Mekong Subregion (GMS) has achieved significant progress in reducing the disease burden to meet their target of malaria elimination by 2015-2030. The GMS countries have achieved a 54% reduction in the incidence of malaria cases between 2012 and 2015 and the death rate has fallen by 84% over the same period [66]. The actual drug efficacy and burden of non-falciparum malaria still remain unclear due to the lack of sufficient epidemiological tools to investigate these parasite variants.
Instead of targeting all Plasmodium species, most malaria elimination programs are predominantly directed toward P. falciparum and to a lesser extent P. vivax [33]. Non-P. falciparum malaria, mainly by P. malariae and P. ovale, still presents a major challenge for malaria eradication [32,68]. Plasmodium malariae infects humans and causes fever. These infections are usually asymptomatic with low parasitemia but may cause chronic anemia and nephrotic syndrome [12,17,29,41]. Plasmodium ovale can be subcategorized into two distinct species, P. ovale curtisi and P. ovale wallikeri, which only differ by small genetic variations and a shorter latency period in P. ovale wallikeri [39]. These sympatric-occurring P. ovale subspecies are generally indistinguishable morphologically. Infections by either of these P. ovale subspecies present with mild fever and are currently treated with the conventional antimalarial drug chloroquine [40]. Plasmodium ovale can undergo the hypnozoite stage, which is a dormant stage in the liver. This enables concealment from diagnosis, and reactivation may occur weeks, months, or even years after the initial infection, leading to disease relapse [21,48]. This parasite is endemic throughout parts of Asia, Africa, South America, and the Western Pacific [29,43,50,69].
Recent epidemiological studies conducted in Cameroon [47] and Equatorial Guinea [53] have revealed the presence of over 12% P. malariae followed by 1-6% P. ovale-positive samples in parts of Uganda and Bioko [43]. Separate studies conducted in Tanzania indicate persistent transmission of P. malariae and P. ovale in an area of declining P. falciparum [68]. These findings collectively signal the significant presence of these parasites and reveal an epidemiologic knowledge gap between them and other well-studied Plasmodium species [51].
Barely adequate genetic markers available for P. malariae and P. ovale curtisi compared to P. falciparum [2,16,20,57] and P. vivax [19,24,37] self-elaborate the low emphasis being given to these parasites. Microsatellite-based schemes would greatly facilitate population genetics and therapeutic studies in P. malariae and P. ovale curtisi. Large genome size (~29 Mbp) with high AT content (~75%) ideally make microsatellite-based genotyping markers a suitable means for investigating epidemiology and population genetics of these Plasmodium species [4]. This study aims to present comprehensive categorization of the microsatellite distribution of major human malaria-causing Plasmodium species with a focus on P. malariae and P. ovale curtisi, which may also contribute to the development of additional genotyping markers for this parasite.

Sequencing data
This study is a review and bioinformatics analysis of microsatellites in five human malaria-causing Plasmodium species based on whole genome sequencing data available in the PlasmoDB database. The whole genome sequences of P. malariae UG01, P. falciparum 3D7, P. vivax SAL-1, P. ovale curtisi GH01, and P. knowlesi STRAIN-H were downloaded from the PlasmoDB webserver (http://plasmodb.org/ common/downloads/release-36/) [4]. Plasmodium ovale wallikeri was not included in the analysis due to its close genetic relatedness to P. ovale curtisi and lack of a standardized reference genome in the PlasmoDB database [4,43]. Nucleotide sequences of all predicted and known coding sequence (CDS) regions for each Plasmodium species were obtained using the PlasmoDB webserver's built-in gene resource download tools. The Plasmodium strains with maximum known genes and transcripts in the PlasmoDB webserver were selected for each species under evaluation. The total number of nucleotide base pairs scanned for microsatellites and whole genome GC% content of each organism are listed in Table 1. For P. malariae and P. ovale curtisi, sets of 6573 and 7162 sequences representing !98% of the total available CDS (known and predicted proteins) from the whole-genome sequence were included for evaluation.

Microsatellite analysis
Identification and categorization of perfect and imperfect microsatellites was performed with the highly accurate tandem repeat search tool Phobos version 3.3.11 (http://www.ruhr-unibochum.de/ecoevo/cm/cm_phobos.htm) [35,56]. The total GC% content and basic genomic statistics for each parasite sample were calculated with the python script multifastats.py (https:// github.com/davidrequena/multifastats/blob/master/multifastats. py). The detection criteria for tandem repeats was restricted to evaluation of perfect and imperfect repeats with unit motifs of 1-10 bp with a minimum threshold repeat number of 14,7,5,4,4,4,4,4,4, and 4 for mono-, di-, tri-, tetra-, penta-, hexa-, hepta-, octa-, nona-, and deca-nucleotide microsatellites, respectively. For protein sequences, microsatellite GC content and tandemly repeated residues with a minimum of four repeats and maximum unit motif length of three amino acids were considered for evaluation using OSTRFPD [34]. Analysis of tandemly repeated amino acid sequences for P. malariae and P. ovale curtisi included the entire set of CDS available in the PlasmoDB online database.

Heatmaps and genomic visualization of microsatellites
Clusters of microsatellites present in CDS regions of each chromosome were visualized as heatmaps using the seaborn library-backend python script (https://seaborn.pydata.org) with Euclidean metrics and complete linkage as measurement parameters. Scatter plots with Spearman's correlation coefficients were generated to compute correlations between unit motif length and frequency of microsatellites using the seaborn library-backend python script. Statistical significance was defined as p-values < 0.05. Circos version 0.67-7 (http:// circos.ca/) was used to visualize the genome-wide distribution of microsatellites.

Gene ontology analysis
The Gene ontology (GO) terms associated with the cellular components, molecular functions, and biological processes for the microsatellite-associated protein sequences were computed by the deep neural-net-based hierarchical biological sequence classifier "SECLAF" using its default parameters trained on the UniProtKB GO database [58]. Microsatellite-associated proteins with the highest values of predicted GO terms each exceeding the 0.95 threshold score were included for analysis.

Results
Abundance, distribution, and diversity of microsatellites in five Plasmodium species The analysis was conducted for both perfect and imperfect microsatellites with repeat numbers of 14, 7, 5, 4, 4, 4, 4, 4, 4, and 4 for 1-10 bp unit motif lengths, respectively, to minimize selection of nominal functional repeats related to the extreme AT-richness of some Plasmodium genomes. The genome-wide numbers of microsatellites identified among the five Plasmodium species were highly variable (84,786-20,875), along with the genomic coverage ranging from 11.56% to 2.19% ( Table 1). The largest difference in both microsatellite density and coverage was observed between P. falciparum and P. vivax with densities of 3269. 46 Table 1). Microsatellite density in P. malariae was 1.63-fold higher than that in P. ovale curtisi (Table 1).
There was a high degree of diversity among the Plasmodium species in unit motif lengths of microsatellites across the respective genomes ( Table 2). Plasmodium falciparum had the highest numbers and densities of mono-to penta-nucleotide long motif repeats, whereas P. vivax showed the lowest ( Table 2). Plasmodium malariae showed the highest number of repeats for hexa-, octa-, and deca-nucleotide long motif repeats totaling 1648, 1353, and 1221, respectively. However, because of its large genome size, this did not translate to the highest repeat density or coverage of these motifs ( Table 2). In general, the mono-, di-, and tri-nucleotide motifs were most abundant and collectively they accounted for approximately 80.0-90.0% of the total genome-wide unit motif length in all Plasmodium species. The highest proportions of mono-, di-, tri-, and tetranucleotides were observed in P. vivax (74.46%), P. falciparum (41.78%), P. falciparum (8.19%), and P. falciparum (7.34%), respectively ( Table 2). Plasmodium malariae was found to harbor the highest percentage of hexa-nucleotide long motifs (3.45%) with a corresponding microsatellite density of 49.02 No./Mbp (Table 2). Interestingly, P. malariae had 3-fold higher microsatellite density compared to that of P. ovale curtisi for the unit motif lengths two, four, seven, and nine (Table 2). Nonetheless, P. falciparum, P. malariae, and P. ovale showed a clear negative correlation between 1 bp and 10 bp unit motif lengths and the frequency of microsatellite occurrence (Spearman's R À0.85, p 1.6e-03). The negative correlation was also present for P. vivax (Spearman's R = À0.33, p = 0.35) and P. knowlesi (Spearman's R = À0.67, p = 0.033), albeit to a weaker degree (Fig. 1).

A, AT, and AAT as the most dominant microsatellite motifs in Plasmodium genomes
The microsatellite motif sequences show high diversity among the different Plasmodium species (Supplementary  Table 1), although motif type A was repeated most frequently in all species of Plasmodium except P. falciparum, where AT (41.62%) was the most common repeat. In contrast, AG was the least frequently occurring di-nucleotide motif in all Plasmodium species under investigation (Supplementary Table 1). The motifs A, AT, and AAT collectively accounted for more than 70% of all repeats in the studied species. Motifs containing only C and G were relatively rare (<10%) for mono-and di-nucleotide repeats. Only P. vivax harbored frequently repeated tri-nucleotide motifs AGG with a microsatellite GC content > 50%. In P. malariae, the number of mononucleotide motif repeats A was more than twice the number of di-nucleotide AT motif repeats (Supplementary Table 2).
The CDS microsatellite density distribution in proteincoding regions of Plasmodium analyzed for individual chromosomes was computed as heatmaps (Fig. 2). The highest microsatellite density was on chromosome 6 of P. falciparum (464. 30 No./Mbp), and the lowest density was on chromosome 11 of P. knowlesi (43.70 (Fig. 2).
Diversity of tandemly repeated amino acids in microsatellite-associated gene products of P. malariae and P. ovale curtisi and their ontology annotations Motif-wise distributions of microsatellites in the P. malariae and P. ovale curtisi chromosomes were investigated further. The mono-, di-, and tri-nucleotide motifs accounted on average for 50, 25, and 5% in P. malariae and 75, 9, and 2% in P. ovale curtisi of total unit motif repeats, respectively (Supplementary Table 2). The collective contribution of tri-to deca-nucleotide motifs was 10% of the total repeats in both species (Supplementary Table 2). On average, the chromosomal microsatellite densities for P. malariae and P ovale curtisi were 1568.22 ± 140.48 and 1203.04 ± 82. 28 No./Mbp, respectively. For P. malariae, the A, AT, and AAT unit repeat motifs were the most frequent, constituting 58, 28, and 3% of total chromosomal DNA microsatellites, respectively (Fig. 3a). The total microsatellite densities in non-coding regions were 7.87-10.01-fold higher than in the CDS regions (Fig. 3b). Aggregate GC content of CDS-associated microsatellites was approximately 2-fold higher (21.0%) compared to that in the chromosomal region (Fig. 3c). Evaluation of the amino acid repeats in the annotated and predicted proteins available for P. malariae showed that lysine (34.12%) and asparagine (29.58%) were the most common amino acid repeats, corresponding to the most commonly observed tri-nucleotide repeats AAA (61%) and AAT (4%) (Fig. 3a and d). For P. ovale curtisi, the A, AT, and AAAAT repeat motifs were most frequent in aggregate chromosomal DNA, constituting 84%, 9%, and 2%, respectively (Fig. 4a). The microsatellite densities in non-coding regions were 7.77-12.20-fold higher than in CDS regions (Fig. 4b). Aggregate GC content of CDS-associated microsatellites was approximately 3-fold higher (30.0%) compared to that of the chromosomal region (Fig. 4c). Evaluation of the amino acid repeats indicated that lysine (34.12%) and asparagine (29.58%) were the most common amino acid repeats, corresponding to the most commonly observed tri-nucleotide repeats AAA (61%) and AAT (4%) (Fig. 4d).
For investigating functional annotation of the microsatelliteassociated CDS distribution, the SECLAF classifier was trained for over 980 GO classes with an area under the curve (AUC) of 99.45% [28], providing a rough estimate of GO for largely unclassified microsatellite-linked proteins of P. malariae and P. ovale curtisi. For P. malariae and P. ovale curtisi, only 2054 and 1555 sequences were assigned to specific gene names and descriptions, respectively. In total, 1919 and 1271 distinct CDS were found to contain at least one microsatellite for P. malariae and P. ovale curtisi, respectively. For P. malariae, three major GO categories, cellular component, molecular function, and biological process, were assigned to 229, 810, and 874 microsatellite-associated proteins. Within the categories, the top three GO terms collectively represent at least 25% of the total GO diversity. Regarding microsatellite-associated proteins under the "cellular component" GO category, the GO terms "cell parts," "protein-containing complex," and "intracellular part" collectively constituted over 40% of the total ontologies (Fig. 5a). The three major GO terms with regard to molecular function for microsatellite-associated proteins were "binding," "protein binding," and "translation regulatory activity" (Fig. 5b). The three major GO terms constituting "biological process" were "metabolic process," "reproduction," and "organic substance metabolic process" (Fig. 5c). For P. ovale curtisi, three major GO categories, cellular component, molecular function, and biological process, were assigned to 513, 146, and 446 microsatellite-associated proteins, respectively. In each category, the top three GO terms collectively represented at least 30% of the total GO diversity. GO categorized under "cellular component" displayed "cell parts," "intracellular part," and "cytoplasmic part" as the three major GO terms that collectively constituted over 40% of the total ontologies (Fig. 5a).
The three major GO terms with regard to molecular function for microsatellite-associated proteins were "binding," "protein binding," and "cell adhesion mediator activity" (Fig. 5b). The three major GO terms constituting "biological process" were "response to stimulus," "developmental process," and "immune system process" (Fig. 5c).
Microsatellite distribution map for P. malariae and P. ovale curtisi A graphical representation (Fig. 6) comprising the entire known chromosomal DNA of P. malariae and P. ovale curtisi shows a relatively homogeneous distribution of microsatellites on each chromosome. For P. malariae and P. ovale, the density of genomic microsatellites in non-CDS sequences is !10-fold greater than in CDS regions (Figs. 3b and 5). Most of the microsatellites with smaller unit motifs (<4 bp) were homogenously distributed, whereas microsatellites with longer unit   motifs (>4 bp) appeared to be more concentrated toward the middle region of the chromosome (Fig. 6, Supplementary  Figures 1 and 2). In general, 1-3 unit motif microsatellites had high densities (Supplementary Figure 3). The genome-wide microsatellites with longer repeats, which appeared as peaks in the line chart within the map, were found to be more frequent around the middle region of most chromosomes (Fig. 6).

Discussion
In the absence of a standardized classification metric, the present study provides a comprehensive categorization and distribution of microsatellites in five human malaria-causing Plasmodium species. Strand slippage mutations, improper pairing, and host-parasite adaptation history may have contributed to the wide variation in microsatellite density and GC% content among the Plasmodium species. At present, aside from the distinct individual genomic nature of each Plasmodium species and wide variation in GC% content, there is no concrete evidence to justify or speculate on the observed variations in microsatellite motifs among the species. Previous studies have suggested microsatellites as a major contributor for genetic diversity in P. falciparum and P. vivax populations [45,59,61]. Such high densities of short tandem repeats not only increase nucleotide polymorphisms but also complicate genetic analysis [5,59,61]. Earlier studies without the availability of the highly accurate Phobos search tool have suggested that a microsatellite may occur for every 2-3 kbp in the P. falciparum genome [1]. The present survey of the P. falciparum genome shows comparatively high microsatellite density, totaling 3633.76 No./Mbp, which corresponds to approximately six microsatellites per 2 kbp genome length. The use of more sophisticated tools to evaluate perfect/imperfect microsatellites and the high quality of genome sequence might have contributed to the observed difference in current estimates. Microsatellites have been reported in over 9% of ORFs for the extremely AT-rich (~80%) P. falciparum genome, which is in agreement with the current observation [59]. Likewise, a similar trend was visible for P. malariae and P. ovale curtisi, indicating substantial microsatellite-associated heterozygosity and its potential exploitation to access genomic diversity. All Plasmodium species displayed a logarithmic decline in the frequency of microsatellites with increasing unit motif length. This is likely to be related to the mechanisms of DNA slippage and DNA mismatch repair, which result in a greater likelihood of generating shorter AT-rich simple motifs [14,28,59]. The estimated slippage mutation rate within microsatellites has been suggested to increase exponentially as the length of the repeat motif increases [28]. This phenomenon is reflected by the observed high percentage of short perfect 1-3 unit motif microsatellites (!60%) in all species of Plasmodium under investigation. Thus, these factors result in a higher microsatellite mutation rate compared to the single point mutation rates. Escherichia coli [30], mice [10], humans [65], and P. falciparum [2] are reported to have microsatellite mutation rates of approximately 1 Â 10 À2 , 10 À3 À10 À4 , 1 Â 10 À3 , and 6.95 Â 10 À5 -3.7 Â 10 À4 /locus/ replication, respectively, which are all higher than the single point mutation rates in these organisms. A lack of analysis for Hardy-Weinberg equilibrium and linkage disequilibrium are among the few limitations of the present study as only the standard whole genome sequence of each Plasmodium species was investigated. Plasmodium malariae and P. ovale curtisi have over 1000 known microsatellite-related CDS constituting at least 10% of the entire genome, which are mainly distributed across 14 chromosomes [4]. Microsatellite instability in these CDS could promote protein domain duplication and production of homo-peptide tracts and interfere with transcript splicing, leading to disorders and disease [42,67]. However, natural selection tends to favor suppression of tandem repeats in CDS compared to that in non-coding regions [38,60]. The extreme AT richness in P. falciparum has been reported to contribute to systematic mutational bias, resulting in abnormally high microstructural plasticity; thus, such studies have not yet been assessed for P. malariae [23]. Nonetheless, microsatellite-associated polymorphisms may facilitate the adaptability of P. malariae in primate hosts, including South American primates and chimpanzees [49]. AT-richness of the P. malariae and P. ovale curtisi genomes was in accordance with the high AT content (89%) of CDS-associated microsatellite sequences. Additionally, hydrophilic amino acids such as lysine and asparagine were among the most commonly repeated amino acid motifs observed, which is consistent with the natural bias towards hydrophilic amino acids in proteins [26]. Lysine-rich short tandemly repeated sequences have been observed in different protozoal parasites, including P. falciparum and Leishmania major. These parasites are suggested to generate such amino acid sequences de novo to modulate host protein targeting efficiency [11,36]. Because microsatellite instability in CDS could increase the chance of forming mutant proteins, the study of GO terms for such CDS-associated proteins should not be ignored. Although GO analysis of microsatellite-related CDS has been fairly limited to UniProtKB-based interpretation of known protein sequences, an overview of annotation for cellular components, molecular functions, and biological processes may be useful for high-level interpretation of microsatelliteassociated proteins. In P. malariae and P. ovale curtisi, the majority of ontological terms for proteins associated with microsatellite-containing CDS were assigned to "cell parts," Figure 6. Genome-wide representation of microsatellite distribution map for Plasmodium malariae UG01 and P. ovale curtisi GH01. Different features indicated by microsatellite distribution map for (a) P. malariae UG01 and (b) P. ovale curtisi GH01 from outermost a to innermost ring can be interpreted as: chromosome 1-14 (I-XIV), scatter plotb b for genomic microsatellite distribution based on unit motif length which corresponds to the height of spot from base of its ring, line plot with peaks indicating regions with long repeat length, heatmap c corresponding to the aggregate genomic microsatellite, scatter plot for microsatellites present in protein-coding region, heatmap for the aggregate microsatellites presenting in protein coding region of the genome. a Each unit difference in outermost ring represents chromosomal length of 1 mega base pair. b,c Spots and regions in scatter plot and heatmap may appear overlapped due to high density but are physically apart in sequence.
"binding," "metabolic process," and "response to stimuli" which are often linked to cellular integrity, adaptation, and survival of pathogens [3,6,15]. The relationship between microsatellite content and plasticity of these ontologies is an interesting area for further study.
An important aspect of the genome-wide microsatellite mapping in this study is to facilitate development of genotyping markers. Unlike SNP and DNA barcoding, microsatellite markers can be evaluated for the entire genome of an organism [7,52]. Although SNPs in merozoite surface proteins (msp) 1 and msp2 have been used to investigate polymorphisms in Plasmodium species, the pressure from the selective host immune system often reduces polymorphisms and subsequently lowers the application of such protein gene-based markers [9,31]. In contrast, the majority of microsatellites are in noncoding regions that can achieve a high degree of polymorphism, making it a suitable marker for discriminating variants within a population and drug efficacy studies [27,64]. Mass drug administration programs expose a parasite to antimalarials, which further suggests the urgency in investigating genetic diversity within P. malariae and P. ovale curtisi [63]. Although identification of genotyping markers for these parasites is beyond the scope of the current investigation, this study contributes the first comprehensive knowledge on genome-wide features and 1-10 bp unit motif microsatellite diversity for P. malariae and P. ovale curtisi. Utilization of these computational outcomes could assist in the identification of novel microsatellite markers for haplotype clustering, population differentiation, and linkage disequilibrium, as demonstrated by the success of such in silico-based studies in the past for P. falciparum, P. vivax, and Leishmania panamensis [2,24,46].
In conclusion, this study presents the first comprehensive categorization of mono-to deca-nucleotide microsatellites in five human malaria-causing Plasmodium species. The results indicate high diversity in the CDS and genomic microsatellite distribution across all investigated species of Plasmodium. In P. malariae and P. ovale curtisi, the high density of microsatellite distribution observed warrants further in-depth investigation to identify potential genotyping markers for epidemiological studies.

Supplementary materials
Supplementary material is available at https://www.parasitejournal.org/10.1051/parasite/2020034/olm Supplementary Table 1. Categorization of the most frequently repeated microsatellite motifs in genomes of Plasmodium species.
Supplementary Table 2. Diversity and motif length-wise distribution of chromosomal microsatellite in P. malariae UG01.
Supplementary Table 3. Diversity and motif length-wise distribution of chromosomal microsatellite in P. ovale curtisi.
Supplementary Supplementary Figure 3. Representative chart for the density of microsatellite distribution in Plasmodium malariae UG01 (left) and P. ovale curtisi GH01 (right) chromosomes. Overall representation chart of (A) the microsatellite density distribution (B) gradient-wise distribution of the core regions with maximum microsatellite densities for chromosome 8 in P. malariae. Authors' contribution statement VBM and MI designed the study. VBM preformed the data analysis and wrote the first draft of the manuscript. SN assisted in part of data analysis. NJW and AMD assisted in logistic support and manuscript preparation. All authors read and approved the final manuscript.