Molecular phylogeny of 42 species of Culicoides (Diptera, Ceratopogonidae) from three continents

The genus Culicoides includes vectors of important animal diseases such as bluetongue and Schmallenberg virus (BTV and SBV). This genus includes 1300 species classified in 32 subgenera and 38 unclassified species. However, the phylogenetic relationships between different subgenera of Culicoides have never been studied. Phylogenetic analyses of 42 species belonging to 12 subgenera and 8 ungrouped species of genus Culicoides from Ecuador, France, Gabon, Madagascar and Tunisia were carried out using two molecular markers (28S rDNA D1 and D2 domains and COI mtDNA). Sequences were subjected to non-probabilistic (maximum parsimony) and probabilistic (Bayesian inference (BI)) approaches. The subgenera Monoculicoides, Culicoides, Haematomyidium, Hoffmania, Remmia and Avaritia (including the main vectors of bluetongue disease) were monophyletic, whereas the subgenus Oecacta was paraphyletic. Our study validates the subgenus Remmia (= Schultzei group) as a valid subgenus, outside of the subgenus Oecacta. In Europe, Culicoides obsoletus, Culicoides scoticus and Culicoides chiopterus should be part of the Obsoletus complex whereas Culicoides dewulfi should be excluded from this complex. Our study suggests that the current Culicoides classification needs to be revisited with modern tools.


Introduction
Biting midges of the genus Culicoides Latreille 1809 (Diptera: Ceratopogonidae) are among the world's smallest haematophagous flies, measuring from 1 to 3 mm, and are described worldwide, except in Antarctica and New Zealand [45]. They are mainly known as vectors of bluetongue virus (BTV), Schmallenberg virus (SBV) and Oropouche virus (OROV) [12].
Currently, approximately 1300 living and 42 fossil species of Culicoides have been described worldwide. Their classification includes 32 subgenera [9] and 38 groups although 13% of occurring species remain ungrouped [11]. This classification is exclusively typological, based on common morphological similarities (e.g. characteristics of reproductive organs, wings, antennae and palps), without any phylogenetic considerations. As most species feature spotted wings, the accurate identification of adults is largely based on subtle variations in size, shape and position of spots that form wing patterns [61][62][63].
Since the recent European bluetongue epizootic outbreak, there has been growing interest in DNA barcoding of Culicoides based on the mitochondrial DNA (mtDNA) cytochrome oxidase I (COI) gene, ribosomal (rDNA) regions such as internal transcribed spacer 1 (ITS1) and internal transcribed spacer 2 (ITS2), and the nuclear CAD gene [29]. The rise of DNA barcoding and the lack of taxonomic experts thus enabled COI sequencing to become a tool for rapid identification of Culicoides species [1].
The lack of phylogenetic data about Culicoides does not allow hypotheses about the vector competence for diseases caused by different Culicoides-borne viruses. Due to the wide distribution and the great economic importance of veterinary diseases transmitted by biting midges, it seems important to build a modern classification of these insects based on phylogenetic studies to help in epidemiological analyses.
In this study, we carried out a phylogenetic analysis of 42 Culicoides species from Europe, America and Africa (including Madagascar) using specimens available in our laboratory. Our sampling included major proven vectors of diseases (i.e. subgenera Avaritia, Culicoides, Haematomydium, and Schultzei group). In each case, the mtDNA COI and the D1 and D2 regions of the 28S rDNA were analysed. The latter regions were chosen based on the fact that they appear to contain major phylogenetic information at the considered taxonomic level [18,30,51] especially for Culicoides [27,28,58].
Specimen identification was performed after mounting the head, wings and spermathecae on microscope slides, leaving the thorax and legs for subsequent DNA extraction [2]. Consequently, we were unable to identify C. fulvithorax and C. ochrothorax without their thorax that includes their discriminant character. Moreover, the accurate identification of females of some closely related specimens, such as C. cataneii and C. gejgelensis, was not possible [36]. Two specimens from Gabon, belonging to subgenus Avaritia, present new morphological characters compared with currently known species; hereafter we will refer to these specimens as Culicoides sp. At least two specimens of each species were sequenced, except for 13 species from which only one specimen was available (  [11]. Species distribution included: (i) Ecuadorian specimens (12 species) assigned to subgenera Anilomyia, Haematomyidium and Hoffmania and the unclassified groups Carpenteri group, Fluvialis group and Leoni group; (ii) French specimens (11 species) assigned to subgenera Avaritia, Culicoides, Monoculicoides, Oecacta and Wirthomyia; (iii) four Gabonese specimens assigned to subgenera: Avaritia, Meijerehelea and Trithecoides; (iv) Malagasy specimens (7 species) assigned to subgenera Avaritia, Remmia and to the Milnei group and (v) Tunisian specimens (8 species) assigned to six subgenera (Avaritia, Beltranmyia, Culicoides, Monoculicoides, Oecacta, Remmia) and 1 species was C. paolae (incertae sedis).
Amplification conditions for D1-D2 were: initial denaturation step at 94°C for 3 min followed by 35 cycles of denaturation at 94°C for 30 s, annealing at 58°C for 90 s and extension at 68°C for 60 s followed by a final extension at 68°C for 10 min. For COI amplification, conditions included: (1) initial denaturation step at 95°C for 15 min, then 5 cycles at 95°C for 40 s, 45°C for 40 s, 72°C for 1 min, were followed by 45 cycles at 95°C for 40 s, 50°C for 40 s, 72°C for 1 min and a final extension step at 72°C for 20 min for C1J1718/C1N2191, and (2) initial denaturation step at 94°C for 3 min, 5 cycles of denaturation at 94°C for 30 s, annealing at 45°C for 90 s and extension at 68°C for 60 s were followed by 35 cycles of denaturation at 94°C for 30 s, annealing at 51°C for 90 s, and extension at 68°C for 60 s and a final extension at 68°C for 10 min for LepF1/LepR. Amplicons were analysed by electrophoresis in 1.5% agarose gel stained with 0.1% ethidium bromide. All sequences obtained are available in GenBank (Table 1).
Maximum parsimony analysis was carried out with PAUP* 4.0b10 [59] by selecting the heuristic search option with tree bisection reconnection branch swapping (TBR) and 1000 random sequence addition (RSA). All sites were equally weighed but a step matrix (ponderation TS/TV = 2) was applied. Sequences were edited and aligned manually using Se-Al [52]. The insertion of several interlocked gap zones was therefore necessary to align sequences. Sequence alignment was performed respecting the criteria defined by [6]: (1) to minimise the number of inferred mutations (number of steps); (2) to prefer substitution to insertion-deletion, and (3) to prefer transitions to transversions because they have a higher probability of occurrence.
A total of 423 bp and 613 bp were analysed for COI and D1-D2, respectively.
For the model-based approach, the best-fit model of nucleotide substitutions was computed with jModelTest v2.1.4 [16] using the Akaike Information Criterion (AIC). The Hasegawa, Kishino and Yano (HKY) +I+C model was indicated as the best-fit model for the mitochondrial COI gene. The general time reversible (GTR) +I+C model was indicated as the best-fit model for both D1-D2 and concatenated markers (rDNA marker D1D2 and COI). Bayesian analyses were carried out using MrBayes 3.1.2 [54] with 4,000,000 generations, 10,000 of the saved trees were discarded, and the 30,000 remaining were used to construct the resulting BI tree. The robustness of tree nodes was assessed by clade posterior probability values (CPP).
A first maximum parsimony analysis on COI sequences showed trees of 2786 steps with a consistency index (CI of 0.225) and a retention index (RI of 0.443). The codon position 3 in the COI gene was found to have saturated transition information as compared to position 1+2 (data not shown). Therefore, we decided to remove the codon position 3. A new analysis was performed with COI codon position 1+2 including 10,000,000 generations. As many as 25,000 of the saved trees were discarded. Both COI and D1-D2 sequences were analysed independently using the BI and MP approaches and a concatenated fragment using the BI approach.

Results
A first run of amplification was carried out using the C1J1718/C1N2191 primers. Pseudogenes were amplified for six specimens from Ecuador (C. castillae, C. hylas, C. pseudoheliconiae, C. guttatus, C. tetrathyris and C. diabolicus) and one specimen from Gabon (C. distinctipennis). Consequently, the COI of these specimens was tentatively amplified and sequenced using the LepF1/LepR primers. Finally, the COI of all these specimens was obtained, except for Culicoides castillae and C. tetrathyris.
For the parsimony analysis, the sequences for COI with 127 variable characters, of which 97 were parsimony-informative were analysed. The most parsimonious trees obtained were 557 steps long. With D1D2 sequences, 203 variable characters were found, out of which 169 were parsimony-informative. The most parsimonious trees obtained were 639 steps long. The ribosomal gene had a much higher proportion of parsimony-informative sites than the mitochondrial gene ( Table 2).
Topologies of the trees obtained by MP and BI are presented in the Appendices. Our main findings were that: (i) the MP COI tree (data not shown) is unusable; (ii) the MP D1D2 tree (Appendix 1) shows that the subgenera Monoculicoides, Culicoides, Haematomyidium and Remmia were monophyletic, whereas the subgenera Hoffmania and Avaritia were paraphyletic; (iii) the BI COI tree (Appendix 2) shows that the subgenera Avaritia, Culicoides, Hoffmania, Monoculicoides and Remmia were monophyletic, whereas the subgenus Oecacta was paraphyletic, and (iv) the BI D1D2 tree (Appendix 3) shows that the subgenera Culicoides, Hoffmania, Monoculicoides and Remmia were monophyletic, whereas the subgenus Oecacta was paraphyletic.

Results of the concatenated Bayesian inference analysis of
Culicoides relationships are shown in Figure 5 and presented below.
According to the combined data analysis, the genus Culicoides was clearly monophyletic.
The subgenus Culicoides was also monophyletic: Tunisian specimens of C. newsteadi were grouped in one clade (CPP = 100) with French specimens of C. punctatus, C. lupicaris and C. impuctatus.
The subgenus Hoffmania (species from Ecuador) displayed two clusters with C. batesi and C. guttatus as the sister group of the Hylas group. The tree shows that the subgenus Hoffmania was monophyletic (CPP = 100), and in the position of sister species of the Milnei group and the subgenera Trithecoides, Anilomyia and Avaritia.
The subgenus Avaritia was also monophyletic (CPP = 100). C. chiopterus, C. obsoletus and C. scoticus were grouped together as a clade, the sister group of all other members of the subgenus Avaritia (CPP = 78). C. dewulfi was shown to be closely related to the new Culicoides species from Gabon (CPP = 92). The analysis also revealed that the Imicola group was monophyletic. C. imicola s. st. is the sister species of C. miombo (CPP = 100) and C. kibatiensis is the sister species of them (CPP = 86).
Bayesian analysis showed that the subgenus Oecacta was paraphyletic: C. circumscriptus -subgenus Beltranmyia and Culicoides distinctipennis -subgenus Meijerehela are included within the members of this subgenus, and C. sahariensis is separated from the other members of subgenus Oecata.
The monophyly of the Fluvialis group (CPP = 74) is discussed due to the inclusion in this clade of the incertae sedis C. paolae. The subgenus Haematomyodium is the sister group of the Fluvialis group (BI, CPP = 91).

Discussion
The present study is, to our knowledge, the first systematic molecular analysis carried out on the genus Culicoides at a large taxonomic level, not focusing on closely related species (42 species belonging to 12 subgenera and unclassified groups collected in Afrotropical, Neotropical and Palaearctic areas). The data, based on COI and D1-D2 sequences, were subjected to a range of MP and BI analyses in order to explore the phylogenetic signal. To date, other molecular phylogenies, based mainly on COI sequences, were commonly restricted to a single subgenus or group located in Afrotropical or Palaearctic areas [29], except for one containing 37 Palaearctic species representing 10 subgenera [1]. Phylogenies studies based on ITS1 and ITS2 have also been reported on 25 French species [49] and 9 Italian species [26], respectively.
Pseudogenes are homologous sequences arising from currently or evolutionarily active genes that have lost their ability to function as a result of disrupted transcription or translation. They may contain stop codons, repetitive elements, have frame shifts and/or lack of transcription. However, they might retain gene-like features [65]. Pseudogenes have been identified in the mitochondrial genome of insects, also widely used in phylogenetic studies, with the risk of obtaining erroneous results during phylogenetic reconstruction [34]. To our knowledge, this is the first report of pseudogenes in Culicoides.
The subgenera Anilomyia, Beltranmyia, Meijerehelea, Trithecoides, Wirthomyia and some groups are only represented by a single species. Consequently, further studies are required to discuss their monophylies.  We demonstrate here the monophyly of the subgenus Monoculicoides that is in agreement with previous studies (ITS1: [49]; COI: [1,55]).
Monophyly of the subgenus Avaritia is clearly supported by the present study, which corroborates previous results obtained from both morphological [46] and molecular data based on COI [15] and ITS1 [32,49]. However, the paraphyly of the subgenus Avaritia has also been reported by analysis of COI [1,50,55] and ITS2 [32] sequences, taking into account the phylogenetic position of C. dewulfi outside this subgenus.
Most authors erroneously included C. dewulfi in the obsoletus complex or the obsoletus group [25,41]. Our results clearly suggest that C. dewulfi does not belong to this group ( Fig. 1) as previously emphasised by different molecular markers [25,28,43,56]. Based on morphological, morphometrical and molecular data [2,23,28,46,47], C. obsoletus and C. scoticus are two closely related species belonging to the Obsoletus complex [5,28,60]. According to [42], the Imicola group in the Afrotropical region includes C. imicola and C. miombo. C. kibatiensis could present similar characters to C. trifasciellus, and subgroup trifasciellus is distinct from Culicoides imicola and the Imicola group [24]. C. trifasciellus belongs to the Orientalis group of the Afrotropical region [41] and its taxonomic position with C. kibatiensis is not resolved. Our results show that C. imicola is a cryptic species of C. miombo, C. kibatiensis being the sister species of the clade formed by these two species. Future studies are needed to carry out investigations at a subgeneric scale in order to determine the species status of the most important vector subgenera. Interestingly, the specimens collected in Gabon belong to the subgenus Avaritia and show unique morphological characters and nucleotide substitutions (COI). Therefore, a morphological description of this new species of Culicoides is in progress.
Our study validates the subgenus Remmia (= Schultzei group) as a valid subgenus, outside of the subgenus Oecacta [3,11], and includes C. kingi [4,14]. The status of the Schultzei group and its subgeneric affiliation has been disputed. It is sometimes included in the subgenus Remmia Glukhova [11,19,43], the latter considered by some authors as a junior synonym of the subgenus Oecacta (Poey) [14,64] or sometimes unclassified [62]. Unlike many authors, we believe the subgenus Oecacta includes only the Furens group and/or perhaps the Schultzei group, if the first group is excluded from the latter as previously suggested [3,14]. Future integrative taxonomy studies [27][28][29] should define this group with more precision, especially its relationships with the subgenus Oecacta.
C. paolae from Tunisia is included in the Fluvialis group constituted by New World species in our sampling (C. castillae, C. fluvialis, C. tetrathyris and C. tetrathyris like). The hypothesis that Culicoides paolae could be a synonym of the Central American species C. jamaicensis seems fair. Indeed, C. paolae and C. jamaicensis present huge morphological similarities [8,44], thus raising the possibility that it was introduced into the Mediterranean Region at the time of Columbus, and was only discovered 500 years later and named C. paolae [44].
The subgenus Hoffmania is monophyletic with two clades (Hylas and Guttatus groups). To our knowledge, no recent data exist in the literature. This subgenus includes 82 extant species from South America and Asia. The wide geographic distribution warrants further studies of the subgenus Hoffmania using phylogenetic and integrative approaches at several scales.
Palaearctic, Neotropical and Afrotropical Culicoides are mixed in our cladogram (Fig. 5) and there is no geographic clustering, indicating that the palaeobiogeography of the genus Culicoides does not follow the generalised tracks [37]. Consequently, Culicoides settled in different areas by wind [45], animal carriage [39] or by human activities [44]. For example, C. imicola specimens collected in Laos, Thailand, Vietnam and Reunion Island are of African origin [40]. In a Culicoides catalogue [11], 42 fossils were recorded from ambers from the Dominican Republic, the USA, Canada, Germany, Poland, the Baltic area and Russia, suggesting a Laurasian origin of the genus.
In conclusion, this study showed that the subgenera Monoculicoides, Culicoides, Haematomyidium, Hoffmania and Avaritia (including the main vectors of bluetongue disease) are monophyletic, whereas the subgenus Oecacta is paraphyletic. As proposed by Harrup et al. [29], a cladistic reinterpretation of the subgeneric classification of Culicoides,