Inference of Population Splits and Mixtures from Genome-Wide Allele Frequency Data
Abstract
Many
aspects of the historical relationships between populations in a
species are reflected in genetic data. Inferring these relationships
from genetic data, however, remains a challenging task. In this paper,
we present a statistical model for inferring the patterns of population
splits and mixtures in multiple populations. In our model, the sampled
populations in a species are related to their common ancestor through a
graph of ancestral populations. Using genome-wide allele frequency data
and a Gaussian approximation to genetic drift, we infer the structure of
this graph. We applied this method to a set of 55 human populations and
a set of 82 dog breeds and wild canids. In both species, we show that a
simple bifurcating tree does not fully describe the data; in contrast,
we infer many migration events. While some of the migration events that
we find have been detected previously, many have not. For example, in
the human data, we infer that Cambodians trace approximately 16% of
their ancestry to a population ancestral to other extant East Asian
populations. In the dog data, we infer that both the boxer and basenji
trace a considerable fraction of their ancestry (9% and 25%,
respectively) to wolves subsequent to domestication and that East Asian
toy breeds (the Shih Tzu and the Pekingese) result from admixture
between modern toy breeds and “ancient” Asian breeds. Software
implementing the model described here, called
TreeMix, is available at
http://treemix.googlecode.com.
Abstract
Phylogenies
of highly genetically variable viruses such as HIV-1 are potentially
informative of epidemiological dynamics. Several studies have
demonstrated the presence of clusters of highly related HIV-1 sequences,
particularly among recently HIV-infected individuals, which have been
used to argue for a high transmission rate during acute infection. Using
a large set of HIV-1 subtype B pol sequences collected from men who
have sex with men, we demonstrate that virus from recent infections tend
to be phylogenetically clustered at a greater rate than virus from
patients with chronic infection (‘excess clustering’) and also tend to
cluster with other recent HIV infections rather than chronic,
established infections (‘excess co-clustering’), consistent with
previous reports. To determine the role that a higher infectivity during
acute infection may play in excess clustering and co-clustering, we
developed a simple model of HIV infection that incorporates an early
period of intensified transmission, and explicitly considers the
dynamics of phylogenetic clusters alongside the dynamics of acute and
chronic infected cases. We explored the potential for clustering
statistics to be used for inference of acute stage transmission rates
and found that no single statistic explains very much variance in
parameters controlling acute stage transmission rates. We demonstrate
that high transmission rates during the acute stage is not the main
cause of excess clustering of virus from patients with early/acute
infection compared to chronic infection, which may simply reflect the
shorter time since transmission in acute infection. Higher transmission
during acute infection can result in excess co-clustering of sequences,
while the extent of clustering observed is most sensitive to the
fraction of infections sampled.
No hay comentarios:
Publicar un comentario