De novo transcriptome assembly and identification of G-Protein-Coupled-Receptors (GPCRs) in two species of monogenean parasites of fish

Genomic resources for Platyhelminthes of the class Monogenea are scarce, despite the diversity of these parasites, some species of which are highly pathogenic to their fish hosts. This work aimed to generate de novo-assembled transcriptomes of two monogenean species, Scutogyrus longicornis (Dactylogyridae) and Rhabdosynochus viridisi (Diplectanidae), providing a protocol for cDNA library preparation with low input samples used in single cell transcriptomics. This allowed us to work with sub-microgram amounts of total RNA with success. These transcriptomes consist of 25,696 and 47,187 putative proteins, respectively, which were further annotated according to the Swiss-Prot, Pfam, GO, KEGG, and COG databases. The completeness values of these transcriptomes evaluated with BUSCO against Metazoa databases were 54.1% and 73%, respectively, which is in the range of other monogenean species. Among the annotations, a large number of terms related to G-protein-coupled receptors (GPCRs) were found. We identified 109 GPCR-like sequences in R. viridisi, and 102 in S. longicornis, including family members specific for Platyhelminthes. Rhodopsin was the largest family according to GRAFS classification. Two putative melatonin receptors found in S. longicornis represent the first record of this group of proteins in parasitic Platyhelminthes. Forty GPCRs of R. viridisi and 32 of S. longicornis that were absent in Vertebrata might be potential drug targets. The present study provides the first publicly available transcriptomes for monogeneans of the subclass Monopisthocotylea, which can serve as useful genomic datasets for functional genomic research of this important group of parasites.


Introduction
Monogenea is one of three classes of parasitic Platyhelminthes. The other two are Trematoda and Cestoda. All these are united under the monophyletic Neodermata, whereas the free-living platyhelminths, commonly termed planarians or turbellarians, are distributed in several other subtaxa [1]. There are two subclasses of monogeneans, Monopisthocotylea and Polyopisthocotylea, whose members are commonly ectoparasites of freshwater and marine fishes, but there are also members infecting internal organs of aquatic or semi-aquatic tetrapods. Between 3000 and 4000 species of monogeneans have been described [23], some of which have acquired economic relevance owing to their negative impact on finfish aquaculture [88]. Therefore, there is growing interest in increasing our understanding of the molecular mechanisms involved in host-parasite interactions.
A modern understanding of biology has come to rely on comparative approaches using genomic resources. In the case of trematodes and cestodes, genomes and transcriptomes have provided new and valuable insights into anthelmintic resistance and host-parasite interactions [79]. Unfortunately, the same level of progress has not been made with monogeneans. To date, the genomes of only three monogenean species (Gyrodactylus bullatarudis, G. salaris, and Protopolystoma xenopodis) have been published [21,41,61]. Transcriptomic data are scarcer still: the recently reported transcriptome of Eudiplozoon nipponicum, belonging to Polyopisthocotylea [104], is the sole example. This could be partly due to technical challenges, such as the small size of the organism (in the range of hundreds of micrometers) and the difficulty to obtain sufficient numbers of live individuals to purify the required amount of DNA or RNA. For instance, to assemble the genomes of Gyrodactylus, which has body lengths of 0.5-1 mm, Hahn et al. [41] used approximately 15,000 individuals and Konczal et al. [61] used 2000-3000 individuals. For the transcriptome of E. nipponicum, which has a much larger body than Gyrodactylus, Vorel et al. [104] required only 10 individuals.
The genes encoding G-protein-coupled receptors (GPCRs) are the largest family of genes in animal genomes. In fact, genomic studies on G. bullatarudis revealed that among duplicated genes, the most abundant group of Gene Ontology (GO) terms relate to GPCR signaling pathways [61]. GPCRs are evolutionarily conserved seven-transmembrane (TM) proteins with immense structural and functional diversity. Upon activation by various extracellular signals, they mediate many biological processes, including reproduction, locomotion and behavior [59,69]. Some GPCRs are specific to platyhelminths and, therefore, their synthetic ligands have the potential to be parasite-selective anthelmintics [81]. Research in this area is of considerable significance because GPCRs are attractive and well-established drug targets. Of note, there are 475 drugs (~34% of all drugs approved by the FDA) that act on 108 unique GPCR targets [42].
In order to contribute to the genomics of monogeneans, the present work provides the de novo transcriptome analyses of two monogenean species, Scutogyrus longicornis (Dactylogyridae family) and Rhabdosynochus viridisi (Diplectanidae family), the first publicly available transcriptomes from Monopisthocotylea. Furthermore, the present work provides the first description of GPCRs in monogeneans. Dactylogyridae is the most diverse group of monogeneans in freshwater fishes. The dactylogyrid S. longicornis has mainly been observed in the freshwater Nile tilapia Oreochromis niloticus and other fish hosts such as O. mortimeri, O. aureus, and Sarotherodon galilaeus from different regions of the world [112]. Although this species has not been reported as pathogenic, S. longicornis and other dactylogyrids (Cichlidogyrus spp.) are frequently found in farmed tilapia [2,52]. Members of the Diplectanidae family are commonly found to infect marine fishes, and some species are considered serious pathogens of farmed fish [3,25]. In particular, R. viridisi is considered a threat to the production of the Pacific white snook, Centropomus viridis [17,84]. The present study provides a comparative genomics framework for Diplectanidae monogeneans that should facilitate further studies aimed at testing evolutionary hypotheses in this underrepresented phylum.

Parasite material
Adult individuals of S. longicornis were collected from the gills of tilapia (Oreochromis niloticus) cultured on a fish farm in the state of Sinaloa, northwestern Mexico. Adult individuals of R. viridisi were collected from the gills of snooks (C. viridis) reared in the marine fish hatchery at the Centro de Investigación en Alimentación y Desarrollo, Mazatlán, Sinaloa. The methods used for parasite sampling and species identification have already been described in detail by Caña-Bozada et al. [17,18].

RNA amplification and sequencing
RNA was extracted from 10 and 40 individuals of S. longicornis and R. viridisi, respectively. The methods used for RNA extraction, amplification, and sequencing are described in detail by Caña-Bozada et al. [17,18]. The BioSamples accession numbers for R. viridisi and S. longicornis are SAMN17210467 and SAMN14607874, respectively. Briefly, for each species, individuals were rinsed in pure water, pooled, homogenized in 500 lL of 1Â RNAshield (Zymo Research, Irvine, CA, USA), and vortexed with 100 mg of 200 lm glass with beads. RNA was extracted from pooled homogenates using a Quick-DNA/RNA Miniprep Kit (Zymo Research, Irvine, CA, USA), according to the manufacturer's instructions. The obtained RNA fractions were below the limit of quantification (<1 ng/lL) when measured in a Qubit fluorometer using the HS dsDNA kit (Life Technologies, Carlsbad, CA, USA). Therefore, to obtain double-stranded full-length cDNA, 3 lL of purified total RNA was used for reamplification using a SMART-Seq v4 Ultra Low Input RNA Kit for sequencing (Takara Bio USA, Inc., San Jose, CA, USA), according to the manufacturer's protocol. In addition, the cDNA was enriched over 12 cycles of PCR and then ultrasonically sheared into 300-500 bp fragments using a Covaris S220 ultrasonicator (Covaris, Woburn, MA, USA). The genomic libraries with multiplex adapters were prepared using a TruSeq Nano DNA Sample Preparation Kit (Illumina, San Diego, CA, USA), and run on a NextSeq500 Illumina platform (2 Â 75 paired-end chemistry).
Each assembly was aligned using BLASTx [15] against a database of protein sequences from bacteria, virus and fungi (downloaded from UniProt/Swiss-Prot; [4]), and reference proteins from the monogeneans Gyrodactylus salaris and E. nipponicum, the cestode Hymenolepis microstoma, the trematode Schistosoma mansoni, and the turbellarian Schmidtea mediterranea [46,104] (https://github.com/jirivorel/Eudiplozoon-nipponicum-transcriptome-secretome). The contigs that were the best hits (e-values < 1e À5 ) with bacteria, virus or fungi were considered contaminant sequences and were discarded from the assemblies. Finally, to eliminate host contaminant sequences, each assembly was aligned using BLASTx against a database that included predicted proteins of the fish hosts C. viridis (NCBI SRA: SRP165941) and O. niloticus (NCBI Assembly: GCA_001858045.2), as well as the proteins of the aforementioned Platyhelminthes. The contigs that were best hits (e-values < 1e À20 ) with fish sequences were discarded from the assemblies. The filtered transcriptomes were used to predict ORFs and putative proteins.
The putative proteins were filtered to eliminate possible contamination of bacteria, virus, fungi, and fish sequences. To this end, the predicted ORFs were compared against the EggNOG database [49] using the TRAPID tool [102]. EggNOG is a database containing information about orthologous relationships of prokaryotic, eukaryotic, and viral genomes. Hits with e-values < 1e À5 were considered significant. Some ORFs showed significant similarity with bacteria and fish sequences. In the case of R. viridisi, the bacterial ORFs belonged to Vibrio spp. Therefore, decontamination was performed: using BLASTp (e-values < 1e À5 ), the putative proteins of R. viridisi were aligned against proteins of Vibrio sinaloensis, V. harveyi, V. brasiliensis, V. tubiashii (NCBI Assembly IDs GCA_000189275.2, GCA_001558435.2, GCA_000189255.1, GCA_000772105.1, respectively), C. viridis, and the platyhelminths mentioned in the previous section. The putative proteins of S. longicornis were aligned using BLASTp (e-values < 1e À5 ) against protein sequences of O. niloticus and platyhelminths, and proteins of bacteria downloaded from the UniProt/Swiss-Prot database. Protein sequences for which the best hits were contaminant sequences were discarded.
In addition, stricter filtering was performed by aligning the translated proteins against the UniRef90 database using Diamond [12] with default parameters. The sequences that were best hits with Protostomia (taxid: 33317) were retrieved, and the remaining sequences were considered contaminants.

Functional annotation
SignalP 4.1 [92] and TMHMM 2.0 [63] were used to predict signal peptides and TM domains in the putative proteins. Annotation of signal peptides, TM domains, and Swiss-Prot and Pfam data were loaded into Trinotate 3.1.1 (http:// trinotate.github.io) [11]. Kyoto Encyclopedia of Genes and Genomes (KEGG) [54], Clusters of Orthologous Groups (COG) [101], and GO [35] IDs were retrieved. To avoid redundancy, in the Results section we show only the annotation of the longest protein for each gene, which was extracted using the script "get_longest_isoform_seq_per_trinity_gene.pl" in Trinity. Thus, each protein was considered a collection of expressed sequences of a putative gene [83]. In addition, the number of phenotypes (across worm model organisms) and UniProt keywords were determined with dcGO Predictor [31] using the Swiss-Prot IDs.

Transcriptome quality assessment
Analysis of the completeness of assembly was performed using Benchmarking Universal Single-Copy Orthologs (BUSCO) v. 3.0.2 [106], using the core metazoan dataset, which contains 978 genes. The completeness of assembly was also verified with the TRAPID tool (conservation threshold = 0.9) using the core gene families of metazoans from the EggNOG database. Finally, we checked again for contamination using the EggNOG database and the TRAPID tool, as described above. As no reference genomes of R. viridisi and S. longicornis were available, BUSCO analyses were also performed on publicly available genomes or transcriptomes from other monogenean species (G. salaris, E. nipponicum, and P. xenopodis), and from other widely studied species of platyhelminths (S. mansoni, H. microstoma, Macrostomum lignano, and S. mediterranea). The obtained completeness values were used as references to evaluate the completeness of assembly for R. viridisi and S. longicornis.

G-protein-coupled receptors GPCR identification
To identify GPCRs, proteins were scanned against conserved domains of the different GPCR families obtained from the Pfam database (PF00001, PF00002, PF00003, PF01534, V. Caña-Bozada et al.: Parasite 2022, 29, 51 PF10320, PF10324, and PF10328), using HMMER 3.1b1 [32] with e-values < 1e À5 . Then, TMHMM was used to detect TM domains; proteins with 3-15 TM domains were retained. Subsequently, the TM proteins were aligned using BLASTp (e-values < 1e À4 ) against annotated proteins from databases specific to platyhelminth species. To avoid overrepresentation of genes by their number of isoforms, the longest isoform for each gene was extracted using the Trinity helper script "get_longest_isoform_seq_per_trinity_gene.pl". For comparative purposes, these analyses were also performed for the monogeneans G. bullatarudis, G. salaris, E. nipponicum, and P. xenopodis; the cestodes T. asiatica and E. multilocularis; the trematodes Schistosoma japonicum, S. mansoni, and F. hepatica; and the turbellarians S. mediterranea and Bothrioplana semperi. The reference proteins were obtained from the WormBase ParaSite database, except for E. nipponicum and B. semperi. The proteins of E. nipponicum were downloaded from https://github.com/jirivorel/Eudiplozoon-nipponicumtranscriptome-secretome [104]. The proteins of B. semperi were obtained using TransDecoder, with the options "-retain_ pfam_hits" and "-retain_blastp_hits" from the assembly provided by Laumer et al. [67]. In addition, to remove possible remaining contaminant sequences and non-GPCR sequences, the GPCRs of R. viridisi and S. longicornis were aligned against the NCBI nonredundant protein database, including and excluding the phylum Platyhelminthes. For an initial annotation, the sequences of each predicted GPCR of R. viridisi and S. longicornis were aligned against reference sequences of GPCRs of Caenorhabditis elegans, Bombyx mori, Homo sapiens, Lottia gigantea, Drosophila melanogaster, Daphnia pulex, Dictyostelium discoideum, Platynereis dumerilii, Anopheles gambiae, Lymnaea stagnalis, S. mediterranea, and S. mansoni, using BLASTp. We classified GPCRs according to the GRAFS (glutamate, rhodopsin, adhesion, frizzled, secretin) system of classification [33], which groups receptors with shared evolutionary ancestry present in the Bilateria [111]. The rhodopsin family was further classified into the a subfamily, which contains amine, opsin-like, and melatonin receptors; and the b subfamily, which contains peptide and peptide hormone receptors.
To identify lineage-specific GPCRs, the predicted GPCRs of R. viridisi and S. longicornis were aligned against the NCBI nonredundant protein databases and limited by the option "Organism", which included the sister classes Cestoda (taxid: 6199) and Trematoda (taxid: 6178), the basal class Rhabditophora (taxid: 147100), and the taxa Lophotrochozoa (taxid: 1206795; exclude: Platyhelminthes), Spiralia (taxid: 2697495; exclude: Lophotrochozoa), Protostomia (taxid: 33317; exclude: Spiralia), Bilateria (taxid: 33213; exclude: Protostomia), and Vertebrata (taxid: 7742). Predicted GPCRs with e-values > 1e À5 were considered to be specific for Monogenea, and were represented by heatmaps using the ggtree library in R (version 4.0.4). According to Kerfeld and Scott [60] "sequences with a recent shared ancestry will have a high degree of similarity; their alignments will have many identical residues, few substitutions and gaps, and tiny e-values. Conversely, sequences with an ancient common ancestor will be deeply divergent, with few shared sequence identities, many gaps, and larger e-values." Thus, considering that hit sequences can be interpreted as sequences sharing evolutionary history, the log 10 -transformed e-values were correlated between different taxa using a Spearman analysis (Supplementary File S1). Correlation values close to 1 might indicate that monogenean GPCRs have similar evolutionary changes between the two respective taxa.

Phylogenetic analyses
To refine the GPCR annotation process for R. viridisi and S. longicornis, phylogenetic trees were constructed for each GPCR family with sequences used in the initial annotation. To this end, the proteins were aligned using MAFFT 7.31 [55] with the parameter "-genafpair" (E-INS-i), which is particularly useful for aligning proteins with conserved regions. The proteins belonging to the rhodopsin family, given their high number, were additionally trimmed with Trimal [19], using the parameter "-gappyout", and their alignments refined in MUSCLE 3.8.31 [29]. The phylogenetic analysis was conducted with IQ-TREE [86], which performs a first step for the selection of the evolutionary model using the ModelFinder program [53] and subsequently the construction of the phylogenetic tree. The trees were constructed using 1000 replicates of the approximate likelihood ratio test, which is similar to the Shimodaira-Hasegawa test. Trees were visualized and annotated with FigTree 1.4.2 (available from http://tree.bio.ed.ac.uk/software/figtree/) and the ggtree library [110] using R. Proteins yielding contradictory results in different analyses were designated unclassified. Rhodopsin-family proteins that formed non-concordant groups were designated orphans. Information about the sequences used for the phylogenetic analysis of each GPCR family is presented in the Mendeley Data repository (DOI: https://doi.org/10.17632/ 2wvnwn4d7p.1), under the folder "phylogeny_secuences".

Results
Raw sequencing reads of R. viridisi and S. longicornis were deposited in the NCBI Sequence Read Archive (SRA) database under accession numbers SRR16891876 and SRR16889716, respectively (BioProjects PRJNA689569 and PRJNA625740). The de novo transcriptome assemblies and conceptual translations of R. viridisi and S. longicornis are available in the Mendeley Data repository (DOI:10.17632/2wvnwn4d7p.1) under the folder "assemblies_CDS_and_ORFs".

RNA samples and sequencing
A total of 271,701,941 and 80,972,372 raw paired-end reads of 75 bp were generated from R. viridisi and S. longicornis RNA samples, respectively. After removing low-quality reads (Q scores < 30) and adapters, 263,800,117 and 76,472,690 high-quality reads were obtained for the assemblies of R. viridisi and S. longicornis, respectively.

De novo transcriptome assembly
For R. viridisi, a total of 392,252 contigs were assembled with an average coverage of 37.33Â (SD = 172.693 bp), N50 of 3567 bp, and GC content of 45.72% (Table 1)

ORF prediction
We generated 64,637 ORFs for R. viridisi and 29,315 for S. longicornis ( Table 1). The alignment of putative proteins of R. viridisi against Vibrio spp. and C. viridis sequences filtered out 9513 proteins ( Table 2). The alignment of putative proteins of S. longicornis against bacteria and tilapia sequences filtered out 1165 proteins (Table 2). Finally, after alignment against sequences from Protostomia, 47,187 R. viridisi proteins encoded by 23,857 genes, and 25,696 S. longicornis proteins encoded by 12,020 genes, were retained. Information on the hits obtained from TRAPID is shown in Table 3.

Functional annotation
Although all the putative proteins were annotated, we present results for the longest, nonredundant proteins to avoid overrepresentation of sequences. Complete results are shown in Supplementary Tables S1-S3. After removing redundant sequences (isoforms), the representative proteins were reduced in number from 47,187 to 23,857 in R. viridisi, and from 25,696 to 12,020 in S. longicornis. For R. viridisi, 3214, 2761, 2798, 2114, and 1205 protein sequences were aligned to the Swiss-Prot, Pfam, GO, KEGG, and COG databases, respectively. In addition, 6824 proteins were predicted to contain a TM domain, and 2823 a signal peptide. For S. longicornis, 5849, 5422, 5647, 4132, and 2074 proteins were aligned to the Swiss-Prot, Pfam, GO, KEGG, and COG  databases, respectively. In addition, 2579 proteins were predicted to contain a TM region, and 1098 a signal peptide. A general overview of the functional and ontological representation is presented in Supplementary File S1 and Supplementary Figs. S1-S4.

G-protein-coupled receptors
GO terms relating to GPCRs were among the most abundant in both Biological Process and Molecular Function terms (Supplementary Figs. S1, S2). We identified 109 GPCR-like sequences in R. viridisi and 102 in S. longicornis, of which at least 40% contained seven-TM domains. An analysis of all sequences with at least three domains revealed that the putative GPCRs comprised one glutamate, 99 rhodopsin, two adhesion, six frizzled, and one secretin in R. viridisi; and three glutamate, 94 rhodopsin, one adhesion, three frizzled, and one secretin in S. longicornis. Fifteen and nine putative GPCRs presented signal peptides in R. viridisi and S. longicornis, respectively; mainly, these were proteins with seven-TM domains. Information about GPCR identification in R. viridisi and S. longicornis is presented in Supplementary Figs. S1, S2. In addition, for comparative purposes, we predicted the GPCRs of other species of monogeneans, trematodes, cestodes, and turbellarians. Ninety GPCRs were identified in G. bullatarudis, 98 in G. salaris, 85 in P. xenopodis, 23 in E. nipponicum, 105 in F. hepatica, 117 in S. japonicum, 118 in S. mansoni, 70 in E. multilocularis, 79 in T. asiatica, 67 in B. semperi, and 384 in S. mediterranea (Table 4). Information about GPCR identification in these platyhelminths is provided in Supplementary Table S5.
The alignments of monogenean GPCRs against those of Trematoda, Cestoda, and Rhabditophora gave lower e-values than with other taxa. According to the Spearman analysis, the e-values were highly correlated between non-platyhelminth taxa (r > 0.834, p < 0.001) but were less correlated between platyhelminth taxa (r < 0.677, p < 0.001) ( Fig. 2 and Supplementary  Table S5).
Nine and four putative proteins (peptide and orphan) of R. viridisi and S. longicornis, respectively were found to be specific to the Platyhelminthes (e-values ! 1e À5 ). Of these, six peptides and one orphan receptor were specific to parasitic taxa. Five peptide receptors of R. viridisi and one orphan receptor of S. longicornis were specific to Monogenea. We also found that of the R. viridisi and S. longicornis GPCRs, 23 were specific to Lophotrochozoa, 31 to Spiralia, and 64 to Protostomia. Thirty-seven (37) R. viridisi and 27 S. longicornis proteins (peptide and orphan) were absent in Bilateria; 40 and 32 from the respective species were absent in Vertebrata.
Rhabdosynochus viridisi and S. longicornis shared 100 and 99 GPCRs, respectively with Trematoda, 96 and 92 with Cestoda, 85 and 88 with free-living Platyhelminthes, and 98 and 97 with Lophotrochozoa. Furthermore, we found that four proteins (peptide and adhesion) of R. viridisi and S. longicornis were absent in parasitic platyhelminths but present in their freeliving counterparts.

Rhodopsin family
Rhodopsins formed the largest family of the GRAFS classifications in all the species (Table 4). We identified 99 and 94 rhodopsin members in R. viridisi and S. longicornis, respectively. Thus, rhodopsin represented >90% of the putative GPCR proteins. The phylogenetic analysis revealed two subfamilies (a and b) and four subgroups (amine, opsin, peptide and melatonin receptors) of rhodopsins ( Supplementary  Figs. S5 and S6). In R. viridisi, the a subfamily comprised 15 amine receptors, three opsins and one melatonin; the b subfamily contained 72 peptide receptors. In S. longicornis, the a subfamily comprised 14 amine receptors, nine opsins and two melatonins; the b subfamily contained 62 peptide receptors. Eight receptors of R. viridisi and seven of S. longicornis could not be grouped into any subfamily and were considered orphan receptors. Nucleotide-activated receptors (c subfamily) and olfactory receptors (d subfamily) were absent in both species.
A separate phylogenetic analysis was performed for amine and peptide receptors and included reference proteins from other organisms ( Fig. 3; Supplementary Figs. S7 and S8). The analysis of the amine receptor group showed that proteins of both R. viridisi and S. longicornis clustered with octopamine (OCT1/2/3 of Drosophila melanogaster), dopamine (DOP4 of C. elegans), G-protein-coupled acetylcholine receptor (GAR1/ 2 of C. elegans, SmGAR of S. mansoni), tyramine (TYR2 of C. elegans), and serotonin (SmSER5HT-1 of S. mansoni, SER1 and SER4 of C. elegans); and a protein of S. longicornis clustered with dopamine (SmD2 of S. mansoni) (Fig. 3).

Adhesion and secretin families
Phylogenetic analysis of the adhesion and secretin families showed that R. viridisi and S. longicornis have non-orthologous receptors in humans. Nonetheless, the GPCRs were classified into adhesion and secretin receptors, with two adhesion receptors in R. viridisi and one in S. longicornis, and one secretin in each species (Fig. 4).

Glutamate family
The glutamate family included one protein of R. viridisi and three of S. longicornis, which were clustered with metabotropic glutamate receptors (mGluRs) and mGluR-like proteins of H. sapiens, D. melanogaster, and A. gambiae. One protein of S. longicornis grouped with GABA B of H. sapiens, D. melanogaster, and D. discoideum, and another clustered with a mGluR of S. mediterranea (SmGluR) (Fig. 5).

Frizzled family
We identified nine frizzled family receptors (six in R. viridisi, three in S. longicornis), seven of which presented signal peptides. The phylogenetic analysis clustered these proteins with fzd2a/b of C. elegans and D. melanogaster, and fzf5/8 of H. sapiens (Fig. 6). Other clusters were formed by one frizzled protein from R. viridisi and S. longicornis and fzd1/2/3/6/7 of C. elegans, H. sapiens, and D. melanogaster. The remainder of the proteins showed discordance between phylogenetic analysis and alignment with BLAST, and they were therefore not classified.

Discussion
The present study provides the first publicly available transcriptomes for monogeneans of the subclass Monopisthocotylea. These parasites are typically small: adult worms of R. viridisi and S. longicornis have a body length of approximately 400 lm. Thus, one of the main challenges of the present study was to obtain enough genomic material for sequencing. This difficulty might explain why genomic data for monogeneans are scarce. Another challenge was the filtering-out of contaminant sequences that belong mainly to bacteria and fish. Contaminant sequences were removed using multiple filtering steps that involved the use of tools such as TRAPID. This program validated our decontamination strategy, because most ORFs matched with sequences of S. mansoni. Similar results were observed when we used TRAPID on protein-coding sequences of G. salaris, E. nipponicum, P. xenopodis, S. mansoni, H. microstoma, M. lignano, and S. mediterranea. It is important to note that our filtering strategy included data from the monogeneans G. salaris, P. xenopodis, and E. nipponicum, which allowed us to reduce the incidence of false negatives. Most of the putative proteins of R. viridisi and S. longicornis have not been characterized previously, which is consistent with the transcriptome of E. nipponicum [104]. Among the few annotated proteins, we found abundant terms related to GPCRs, as well as terms related to other membrane proteins, proteases, and kinases.
The quality of our assemblies can be considered adequate, because for non-model organisms the reported completeness is typically between 50% and 95%, whereas for model organisms it is higher than 95% [98]. In addition, we verified that the completeness of the transcriptomes or publicly available genomes of other monogeneans was between 57.3% and 85.8%, which is a similar range to that for R. viridisi and S. longicornis.

G-protein-coupled receptors
Phylogenetic analysis and inspection of each sequence allowed us to predict and classify the GPCRs of R. viridisi and S. longicornis with a high degree of confidence. Monogenean GPCRs presented higher sequence similarity with other parasitic Platyhelminthes, mainly trematodes, than with other taxa. However, most receptors are present in non-platyhelminth ancestral taxa. This is consistent with the findings of Koziol et al. [62], who argued that most GPCRs in one taxon are orthologs of GPCRs from ancestral taxa. According to McVeigh et al. [82], the majority of GPCRs are expressed and found in the RNA-Seq datasets. For instance, with the support of RNA-Seq data, Campos et al. [16] identified 65 of the 70 identified GPCRs in the genome of Schistosoma haematobium, and 56 of the 68 in the genome of S. mansoni. In the present study, RNA-Seq was performed on adult parasites. Therefore, the identification of GPCRs represents an approximation of the GPCRs present in the genomes of monogeneans. In addition, the numbers of putative GPCRs in R. viridisi and S. longicornis are comparable to those predicted for the other monopisthocotylean monogeneans and trematodes that we analyzed (Table 4). Konczal et al. [61] also found that GPCR signaling pathways were among the most represented terms in the monogenean G. bullatarudis. In general, GPCRs act in neuromuscular signaling, chemosensation, and development [9,76]. However, to the best of our admittedly limited knowledge on GPCRs in flatworms, only 11 flatworm receptors have been deorphanized and the functionality of even fewer has been determined [81].
By aligning GPCRs of monogeneans with other platyhelminth and non-platyhelminth taxa, a higher correlation was expected between e-values of Platyhelminthes; however, the correlation was higher between non-Platyhelminthes. This might indicate that the GPCRs are more diverse within Platyhelminthes. That is consistent with the work of Koziol et al. [62], who reported that most GPCRs from parasitic flatworms are quite diverse in comparison with those in other bilaterians. Such divergence might pose a problem for classifying these GPCRs [111]. Therefore, in addition to robust classification studies, experimental validation through the identification of ligands is recommended. However, in silico analysis can help us further understand the evolution of these receptors.
The dominance of rhodopsin receptors found in the present study, including members specific to monogeneans or Platyhelminthes, is consistent with findings in species from different phyla [33,82,111,113]. McVeigh et al. [82] noted that the rhodopsin family includes highly diverse flatworm-specific receptors. In parasitic platyhelminths, these receptors might have key roles, such as in determining virulence and in host finding [71,111]. Rhodopsin receptors are stimulated by light, odorant molecules, and neurotransmitters, and perform functions in vision and olfaction [71]. The rhodopsin family was mostly represented by the b subfamily, predominantly peptide receptors. Some ligands of peptide receptors are neuropeptides intercellular signaling molecules that act as neurotransmitters, neuromodulators, or neurohormones [30]. Neuropeptide signaling systems in flatworms are associated with locomotion, reproduction, feeding, and larval host finding [78,80]. The two putative melatonin receptors found in S. longicornis represent the first record of this group of proteins in parasitic platyhelminths. These receptors were orthologous to the melatonin receptors of S. mediterranea. In free-living freshwater flatworms, melatonin receptors have been implicated in the control of biological rhythms, with neoblast proliferation processes, and posttraumatic regeneration [85,109]. It is possible that melatonin receptors regulate egg-laying rhythms in monogeneans, which leads to increased egg production in periods of darkness [44,75].  There were some members of the glutamate family in R. viridisi and S. longicornis. This family is associated with the modulation of glutamate responses for a variety of central nervous system functions [8]. In vertebrates, the glutamate GPCRs comprise subgroups with extremely divergent physiological roles owing to expansion during vertebrate evolution. However, some of these subgroups are missing in invertebrates [8]. The mGluR and GABA receptors are two of the few subgroups to be retained in invertebrates and in most bilateral species [8]. The Platyhelminthes have mGluR; however, GABA receptors have yet to be reported [82,111]. Similar to other platyhelminths, we found mGluR in R. viridisi and S. longicornis. Although parasitic platyhelminths seem to have lost receptors from their free-living ancestor, we found one probable GABA receptor in S. longicornis. It is expected that further studies will confirm the presence of GABA receptors in monogeneans.
Furthermore, adhesion and secretin GPCRs were found in R. viridisi and S. longicornis. The adhesion family has protein domains in the extracellular region that participate in cell-cell interactions and are present in almost every organ system with physiological roles in development, immunity, reproduction, Figure 2. Spearman analysis. e-values were highly correlated between non-platyhelminth taxa (r > 0.834, p < 0.001), whereas e-values between platyhelminth taxa were less correlated (r < 0.677, p < 0.001). epithelial and neuronal function, and tumorigenesis [65]. Although in vertebrates this family has the second most members, in invertebrates, especially parasitic platyhelminths, it has few members [111]. The secretin GPCRs mediate important hormonal functions through binding those hormones, and it has been suggested that they originate from the adhesion GPCRs [87], owing to the similarity of sequence and the large number of proteins in this family. However, we found a similarly low number of secretin and adhesion GPCR sequences in Platyhelminthes, which indicates that adhesion GPCRs were lost as this taxonomic group evolved.
Frizzled receptors in R. viridisi and S. longicornis were not classified at the subfamily level. Pathways involving frizzled receptors have important roles in adult tissues and developing embryos in the regulation of cell polarity and formation of neural synapses; specifically, they interact with Wnt proteins and other ligands [47]. In comparison with Platyhelminthes of the classes Cestoda and Rhabditophora, in monogeneans no smoothened receptors (SMOs) were found [82]. SMO is a key signal transducer in the hedgehog (Hh) pathway, which is important in development [5]. Although SMO is present in both invertebrates and vertebrates, its domains are divergent between lineages, which might explain the differences in SMO signaling in different organisms [6].

GPCR drug targets
The 40 GPCRs of R. viridisi and the 32 of S. longicornis that were absent in Vertebrata could be considered potential drug targets. Given their role as mediators of signal transduction involving a range of neurotransmitters, GPCRs have been proposed as drug targets in parasitic platyhelminths [81]. We assumed that some functions of the monogenean GPCRs are similar to those found in other helminths. The three Figure 3. Phylogenetic analysis of Rhabdosynochus viridisi and Scutogyrus longicornis amine-subfamily GPCRs. The midpoint-rooted phylogenetic tree was constructed using 1000 replicates of the approximate likelihood ratio test (similar to the Shimodaira-Hasegawa test). The LG + F + G4 model was implemented. The colored boxes indicate the log 10 -transformed e-values obtained from the alignment of the R. viridisi and S. longicornis sequences against sequences of different taxa using the NCBI database (Tre, Trematoda; Ces, Cestoda; Rha, Rhabditophora; Lop, Lophotrochozoa; Spi, Spiralia; Pro, Protostomia; Bil, Bilateria; Ver, Vertebrata); the species to which the sequence belongs (R. viridisi or S. longicornis); and the type of receptor inferred from phylogenetic similarity to reference sequences. Only nodes with bootstrap support greater than 80 are shown. monogenean GPCRs clustered with NPYR-1 of S. mediterranea might be involved in sexual maturation and germ cell differentiation [96]. The monogenean GPCR that was orthologous to the dopamine receptor SmD2 of S. mansoni might be a component of the neuromuscular system [100]. The monogenean GPCRs similar to SmSER5HT-1, expressed in nerve tissue, and SmGAR of S. mansoni, and also those similar to DOP-1 of C. elegans, might be involved in motility [20,74,91]. The monogenean GPCRs clustered with octamine in D. melanogaster, as well as those clustered with SER-1 of C. elegans might participate in the activation of multiple signaling pathways to induce egg laying [20,68,72]. The potential of GPCRs as drug targets has been shown in other parasites. For instance, Santos et al. [97] showed that knockout of the GPCR-like PfSR25 increases the susceptibility of malaria parasites to antiparasitic compounds. In the future, in the context of drug repurposing, it would be necessary to examine the role of GPCRs in the susceptibility of monogeneans to existing drugs.

Other membrane proteins
In addition to GPCRs, we revealed the important presence of putative proteins related to membrane trafficking and transport. These proteins are necessary for exocytosis, endocytosis, endosome-lysosome transport, endosome-Golgi transport, endoplasmic reticulum-Golgi transport, autophagy, and so forth. In parasitic platyhelminths, these functions are important for host-parasite interactions as well as xenobiotic detoxification. For instance, the excretory/secretory proteins released by parasites facilitate feeding and modulation of host immune responses [28,71]. TM proteins such as the ABC transporters play a principal role in the export of a wide spectrum of different substrates. It has been suggested that in helminths, several ABC transporters are involved in drug resistance and detoxification processes to facilitate survival in the host [64].
Innexins were among the most abundant proteins in R. viridisi and S. longicornis. Innexins are integral membrane The midpointrooted phylogenetic tree was constructed using 1000 replicates of the approximate likelihood ratio test (similar to the Shimodaira-Hasegawa test). The VT+R5 model was implemented. The colored boxes indicate the log 10 -transformed e-values obtained from the alignment of the R. viridisi and S. longicornis sequences against sequences of different taxa using the NCBI database (Tre, Trematoda; Ces, Cestoda; Rha, Rhabditophora; Lop, Lophotrochozoa; Spi, Spiralia; Pro, Protostomia; Bil, Bilateria; Ver, Vertebrata); the species to which the sequence belongs (R. viridisi or S. longicornis); and the type of receptor inferred from phylogenetic similarity to reference sequences. Only nodes with bootstrap support greater than 80 are shown.
proteins that participate in cellular communication in invertebrates [24]. These proteins are bifunctional, that is, they can form gap junctions and unpaired membrane channels (innexons). Gap junctions allow the diffusion of second messengers and other ions and small molecules between two adjacent cells, whereas innexons allow the exchange of ions and metabolic and signaling molecules between the cell interior and the extracellular milieu [24,39]. The number of innexins found herein is similar with other species of platyhelminths: 24 in Taenia solium, 25 in S. mansoni, 24 in S. japonicum, 17 in H. microstoma, 19 in E. multilocularis, and 20 in Echinococcus granulosus [103]. This diversity of innexins reflects their involvement in several biological processes, such as morphogenesis, neurogenesis, behavior, memory, and the immune response [39]. Some gap junction proteins might be lineage-specific. For example, CX39.4 is a teleost lineage-specific gap junction protein involved in the formation of skin patterns [105]. Oviedo et al. [90] reported the involvement of specific innexins in tissue regeneration in planarians. Thus, the innexins found in monogeneans might become the focus for experiments aimed at discovering lineage-specific functions.
The ion channel pathways, principally those involving voltage-gated channels, were highly represented in both monogeneans studied. Ion channels are multimeric complexes of membrane proteins involved in the passive diffusion of ions across biological membranes. These proteins perform key functions in the nervous system [48]. Platyhelminths have relatively well-developed neuromuscular systems, which coordinate many activities essential for parasite survival, such as motility, feeding, excretion, and egg laying [94,107]. The voltage-gated potassium (Kv) and calcium (Cav) channels are well represented in the genomes of flatworms [107]. In schistosomes, activated Cav channels initiate muscle contraction and are associated with synaptic transmission, enzyme activity, and gene expression. It is possible that these proteins perform similar functions in monogeneans.

Proteases
Cysteine cathepsins, especially cathepsins L, were annotated in S. longicornis and R. viridisi. It is thought that cathepsins L perform functions associated with the modulation or impairment of host immune responses, given their ability to degrade IgG and host Toll-like receptors [14,22]. However, proteases in general have been scarcely studied in monogeneans [93]. Recently, Jedličková et al. [51] reported evidence for the involvement of a variety of cathepsins L in several processes requiring proteolysis in E. nipponicum. Furthermore, these authors observed that some cathepsins L of this monogenean can cleave immunoglobulins in vitro, which might form part of the mechanism of host immune evasion. Other parasitic platyhelminths have also been found to possess a variety of cathepsins L. For instance, Robinson et al. [95] suggested that the expansion of the cathepsin L family in F. hepatica is related to its ability to infect and adapt to new hosts. A better understanding of the biology of monogeneans, including their interactions with hosts, will likely require rigorous molecular and biochemical characterization of their cysteine cathepsins. The midpoint-rooted phylogenetic tree was constructed using 1000 replicates of the approximate likelihood ratio test (similar to the Shimodaira-Hasegawa test). The LG+F+R5 model was implemented. The colored boxes indicate the log 10 -transformed e-values obtained from the alignment of the R. viridisi and S. longicornis sequences against sequences of different taxa using the NCBI database (Tre, Trematoda; Ces, Cestoda; Rha, Rhabditophora; Lop, Lophotrochozoa; Spi, Spiralia; Pro, Protostomia; Bil, Bilateria; Ver, Vertebrata); the species to which the sequence belongs (R. viridisi or S. longicornis); and the type of receptor inferred from phylogenetic similarity to reference sequences. Only nodes with bootstrap support greater than 80 are shown.

Kinases
We found high representation of pathways involving kinase activity, mainly that of serine/threonine kinases. Kinases mediate most of the signal transduction occurring in cells and thus control many cellular processes such as metabolism, transcription, cell cycle progression, cytoskeletal rearrangement and cell movement, apoptosis, and differentiation [77]. In helminths, kinases, such as the serine/threonine kinases, have functions associated with growth and development [27]. Nonetheless, this group of proteins have scarcely been studied in platyhelminths. Evidence that polo-like kinases participate in the regulation of the cell cycle in both mitosis and meiosis in S. mansoni [26] indicates a role at the schistosomula stage and association of these kinases with rapid growth and body remodeling [38]. Members of the protein kinase C (PKC) family possibly have a central function in secretion and larval transformation, and influence the host's response to the parasite by directly regulating interactions with host cells [7]. Furthermore, the differential expression of kinases throughout the development of S. mansoni and F. hepatica suggests they perform important functions [38,45]. Kinases are well-known drug targets in humans [89] and are now being investigated as targets for new anthelmintics [36].

Phenotype
The proteins of R. viridisi and S. longicornis participate mainly in the following phenotypes: negative chemotaxis variant and chemotaxis, protein phosphorylation, egg-laying, and drug-induced gene expression variants. Being ectoparasites, monogeneans are directly affected by environmental changes, so these parasites must respond effectively to these changes to survive. In the context of medicine, the drug-induced gene expression phenotype might be interesting to study in monogeneans because changes in genes or in gene expression in response to anthelmintics can lead to drug resistance and survival of the parasite [50]. The development of drug resistance has already been observed in helminths, including monogeneans [13,108]. According to WormBase, the chemotaxis variant is related to the direct movement of an animal in response to chemical repellents with pathways that involve GPCRs and kinase, and calcium channel activity. In monogeneans, chemotaxis is essential for host finding and egg hatching [56,58]. In C. elegans, changes in chemical signals corresponding to food levels and population density in the environment are either directly received by GPCRs or interact with GPCR pathways, and can thus regulate the stage of animal development [73,99]. The egg-laying variant is associated with variations in the stage at which eggs are laid, egg-laying cycle, number of eggs laid, or egg laying in response to stimuli. This phenotype involves cytoskeletal proteins. Monogeneans are highly prolific, and they are able to maintain egg production in a wide range of water temperatures [43,57]. Studying the proteins involved in egg laying in monogeneans might be useful to better understand infection dynamics in the context of climate change.
Acknowledgements. This research was funded by the National  . Phylogenetic analysis of Rhabdosynochus viridisi and Scutogyrus longicornis frizzled-subfamily GPCRs. The midpoint-rooted phylogenetic tree was constructed using 1000 replicates of the approximate likelihood ratio test (similar to the Shimodaira-Hasegawa test). The WAG + F + R5 model was implemented. The colored boxes indicate the log 10 -transformed e-values obtained from the alignment of the R. viridisi and S. longicornis sequences against sequences of different taxa using the NCBI database (Tre, Trematoda; Ces, Cestoda; Rha, Rhabditophora; Lop, Lophotrochozoa; Spi, Spiralia; Pro, Protostomia; Bil, Bilateria; Ver, Vertebrata); the species to which the sequence belongs (R. viridisi or S. longicornis); and the type of receptor inferred from phylogenetic similarity to reference sequences. Only nodes with bootstrap support greater than 80 are shown.
Supplementary file 1: Detailed description of the functional annotation process, retrieval of lineage-specific GPCR, and code for the different pipelines used for assembly, annotation, read mapping and quantitative evaluation of the obtained transcriptome using BUSCO.