Register or Login To Download This Patent As A PDF
United States Patent Application 
20160300036

Kind Code

A1

RAMAZZOTTI; DANIELE
; et al.

October 13, 2016

METHODS, COMPUTERACCESSIBLE MEDIUM AND SYSTEMS TO MODEL DISEASE
PROGRESSION USING BIOMEDICAL DATA FROM MULTIPLE PATIENTS
Abstract
An exemplary embodiment of system, method and computeraccessible medium
can be provided to reconstruct models based on the probabilistic notion
of causation, which can differ fundamentally from that can be based on
correlation. A general reconstruction setting can be complicated by the
presence of noise in the data, owing to the intrinsic variability of
biological processes as well as experimental or measurement errors. To
gain immunity to noise in the reconstruction performance, it is possible
to use a shrinkage estimator. On synthetic data, the exemplary procedure
can outperform currently known procedures and, for some real cancer
datasets, there are biologically significant differences revealed by the
exemplary reconstructed progressions. The exemplary system, method and
computer accessible medium can be efficient even with a relatively low
number of samples and its performance quickly converges to its asymptote
as the number of samples increases.
Inventors: 
RAMAZZOTTI; DANIELE; (Muggio MB, IT)
; CARAVAGNA; GIULIO; (Viareggio LU, IT)
; OLDE LOOHUIS; LOES; (New York, NY)
; GRAUDENZI; ALEX; (Modena MO, IT)
; KORSUNCKY; IIYA; (New York, NY)
; MAURI; GIANCARLO; (Varedo MB, IT)
; CH; MARCO; (Lugano, CH)
; MISHRA; BHUBANESWAR; (Great Neck, NY)

Applicant:  Name  City  State  Country  Type  NEW YORK UNIVERSITY
UNIVERSITA DEGLI STUDI DI MILANO  BICOCCA  New York
Milano  NY  US
IT   
Family ID:

1000002022552

Appl. No.:

15/032903

Filed:

October 28, 2014 
PCT Filed:

October 28, 2014 
PCT NO:

PCT/US14/62688 
371 Date:

April 28, 2016 
Related U.S. Patent Documents
        
 Application Number  Filing Date  Patent Number 

 61896566  Oct 28, 2013  
 62038697  Aug 18, 2014  
 62040802  Aug 22, 2014  

Current U.S. Class: 
1/1 
Current CPC Class: 
G06F 19/3437 20130101 
International Class: 
G06F 19/00 20060101 G06F019/00 
Claims
1. A nontransitory computeraccessible medium having stored thereon
computerexecutable instructions for generating a model of progression of
at least one disease using biomedical data of at least one patient,
wherein, when a computer arrangement executes the instructions, the
computer arrangement is configured to perform procedures comprising:
obtaining the biomedical data; and generating the model of progression,
which includes at least one of (i) states of the at least one disease or
(ii) transitions among the states, based on the obtained biomedical data.
2. The computeraccessible medium of claim 1, wherein the model of
progression further includes a progression graph.
3. The computeraccessible medium of claim 2, wherein the progression
graph is based on a causal graph.
4. The computeraccessible medium of claim 2, wherein the model of
progression further includes at least one of a directed acyclic graph
(DAG), a disconnected DAG, a tree or a forest.
5. The computeraccessible medium of claim 4, wherein nodes of the DAG
are atomic events and edges represent a progression between the atomic
events.
6. The computeraccessible medium of claim 1, wherein the model of
progression is further based on a noise model.
7. The computeraccessible medium of claim 6, wherein the noise model
includes a biological noise model.
8. The computeraccessible medium of claim 7, wherein the computer
arrangement is further configured to use the biological noise model to
distinguish spurious causes from genuine causes.
9. The computeraccessible medium of claim 6, wherein the noise model
includes an experimental noise model.
10. The computeraccessible medium of claim 6, wherein the noise model
includes an experimental noise model and a biological noise model.
11. The computeraccessible medium of claim 1, wherein the biomedical
data includes at least one of genomics, transcriptomics, epigeneomics or
imaging data.
12. The computeraccessible medium of claim 1, wherein the biomedical
data includes information pertaining to at least one of at least one
normal cell, at least one tumor cell, cellfree circulating DNA or at
least one circulating tumor cell.
13. The computeraccessible medium of claim 1, wherein the computer
arrangement is further configured to determine the states of the disease
by at least one of genomics, transcriptomics or epigeneomics mutational
profiles.
14. The computeraccessible medium of claim 1, wherein the computer
arrangement is further configured to determine transitions of the states
by a causality relationship whose strength is estimated by
probabilityraising by at least one unbiased estimator.
15. The computeraccessible medium of claim 14, wherein the unbiased
estimator includes at least one shrinkage estimator.
16. The computeraccessible medium of claim 15, wherein the shrinkage
estimator is a measure of causation among any pair of events atomic
events.
17. The computeraccessible medium of claim 1, wherein the at least one
disease includes cancer.
18. The computeraccessible medium of claim 1, wherein the computer
arrangement is further configured to (i) receive further biomedical data
related to at least one further patient, and (ii) generate information
about the at least one further patient based on the model of progression
and the further biomedical data.
19. The computeraccessible medium of claim 18, wherein the information
includes a classification of at least one further disease of the at least
one further patient.
20. A method for modeling a progression of at least one disease using
biomedical data for one or more patients, comprising: (a) obtaining the
biomedical data; and (b) using a computer hardware arrangement,
generating the model of progression, which includes at least one of (i)
states of the disease or (ii) transitions among the states, based on the
obtained biomedical data.
2138. (canceled)
39. A system for modeling a progression of at least one disease using
biomedical data for one or more patients, comprising: a computer hardware
arrangement configured to: (a) obtaining the biomedical data; and (b)
using a computer hardware arrangement, generating the model of
progression, which includes at least one of (i) states of the disease or
(ii) transitions among the states, based on the obtained biomedical data.
4057. (canceled)
Description
CROSSREFERENCE TO RELATED APPLICATIONS
[0001] This application relates to and claims priority from U.S. Patent
Application No. 61/896,566, filed on Oct. 28, 2013, U.S. Patent
Application No. 62/038,697, filed on Aug. 18, 2014, and U.S. Patent
Application No. 62/040,802, filed on Aug. 22, 2014, the entire
disclosures of which are incorporated herein by reference.
FIELD OF THE DISCLOSURE
[0002] The present disclosure relates generally to cancer progression
models, and more specifically, to exemplary embodiments of an exemplary
system, method and computeraccessible medium for a determination of
cancer progression models, which can include noise and/or biological
noise and/or can use biological data from multiple patients.
BACKGROUND INFORMATION
[0003] Cancer is a disease of evolution. Its initiation and progression
can be caused by dynamic somatic alterations to the genome manifested as
point mutations, structural alterations of the genome, DNA methylation
and histone medication changes. (See e.g., Reference 15). These genomic
alterations can be generated by random processes, and since individual
tumor cells compete for space and resources, the test variants can be
naturally selected for. For example, if through some mutations of a cell
acquires the ability to ignore antigrowth signals from the body, this
cell may thrive and divide and its progeny may eventually dominate part
of the tumor. This clonal expansion can be seen as a discrete state of
the cancer's progression, marked by the acquisition of a genetic event,
or a set of events. Cancer progression can then be thought of as a
sequence of these discrete progression steps, where the tumor acquires
certain distinct properties at each state. Different progression
sequences can be used, although some can be more common than others, and
not every order can be viable. (See, e.g., References 14 and 25).
[0004] In the last two decades, many specific genes and genetic mechanisms
have been identified that are involved in different types of cancer (see,
e.g. References 3, 19 and 31), and targeted therapies that aim to affect
the product of these genes are developed at a fast pace. (See, e.g.,
Reference 25). However, unfortunately, the causal and temporal relations
among the genetic events driving cancer progression remain largely
elusive. The main reason for this state of affairs can be that
information revealed in the data can usually be obtained only at one, or
a few points, in time, rather than over the course of the disease.
Extracting this dynamic information from the available static, or
crosssectional data can be a challenge, and the combination of
mathematical, statistical and computational techniques can be needed to
decipher the complex dynamics. The results of the research addressing
these issues will have important repercussions for disease diagnosis and
prognosis, and therapy.
[0005] In recent years, several methods that aim to extract progression
models from crosssectional data have been developed; starting from the
seminal work on singlepathmodels (see, e.g., Reference 32), up to
several models of oncogenetic trees (see, e.g., References 2, 4 and 4),
probabilistic networks (see, e.g., Reference 17) and conjunctive Bayesian
networks (see, e.g., References 1 and 11). Some of these models, use
correlation to identify relations among genetic events. (See e.g.,
References 2, 4 and 5). These techniques reconstruct tree models of
progression as independent acyclic paths with branches and no
consequences. More complex models (see e.g., References 1 and 11),
extract topologies such as direct acyclic graphs. However, in these
cases, other constraints on the joint occurrence of events can be
imposed.
[0006] Accordingly, there is a need to address and/or solve at least some
of the deficiencies described herein above.
SUMMARY OF EXEMPLARY EMBODIMENTS
[0007] To that end, an exemplary system, method and computeraccessible
medium for generating a model of progression a disease(s) using
biomedical data of a patient(s) can be provided. Such exemplary system,
method and computeraccessible medium can be used to, for example, obtain
the biomedical data, and generate the model of progression which includes
(i) states of the disease or (ii) transitions among the states based on
the obtained biomedical data. The model of progression can include a
progression graph. The progression graph can be based on a causal graph.
The model of progression can include a directed acyclic graph, where
nodes of the DAG can be atomic events and edges represent a progression
between the atomic events. The model of progression can be further based
on a noise model, which can include a biological noise model, an
experimental noise model or a combination thereof. The biological noise
model can be used to distinguish spurious causes from genuine causes.
[0008] In some exemplary embodiments of the present disclosure, the
biomedical data can include genomics, transcriptomics, epigeneomics or
imaging data and/or can include information pertaining to a normal
cell(s), a tumor cell(s), cellfree circulating DNA or a circulating
tumor cell(s). The states of the disease can be determined by genomics,
transcriptomics or epigeneomics mutational profiles, and/or by a
causality relationship whose strength is estimated by probabilityraising
by an unbiased estimator(s). The unbiased estimator can include a
shrinkage estimator(s), which can be a measure of causation among any
pair of events atomic events.
[0009] In certain exemplary embodiments of the present disclosure, the
disease can include cancer. Further biomedical data related to a further
patient(s) can be received, and information about the further patient can
be generated based on the model of progression and the further biomedical
data. The information can be a classification of a further disease(s) of
the further patient(s).
[0010] These and other objects, features and advantages of the exemplary
embodiments of the present disclosure will become apparent upon reading
the following detailed description of the exemplary embodiments of the
present disclosure, when taken in conjunction with the appended claims.
BRIEF DESCRIPTION OF THE DRAWINGS
[0011] Further objects, features and advantages of the present disclosure
will become apparent from the following detailed description taken in
conjunction with the accompanying Figures showing illustrative
embodiments of the present disclosure, in which:
[0012] FIG. 1 is a graph of an exemplary shrinkage coefficient according
to an exemplary embodiment of the present disclosure;
[0013] FIGS. 2A and 2B are graphs of optimal .lamda. datasets of different
sizes according to an exemplary embodiment of the present disclosure;
[0014] FIGS. 3A and 3B are graphs illustrating an exemplary comparison of
noisefree synthetic data according to an exemplary embodiment of the
present disclosure;
[0015] FIGS. 4A and 4B are diagrams of a set of exemplary reconstructed
trees according to an exemplary embodiment of the present disclosure;
[0016] FIGS. 5A and 5B are graphs illustrating an exemplary reconstruction
with noisy synthetic data and .lamda.=1/2 according to an exemplary
embodiment of the present disclosure;
[0017] FIGS. 6A and 6B are graphs of an exemplary oncotree reconstruction
of an ovarian cancer progression according to an exemplary embodiment of
the present disclosure;
[0018] FIGS. 7A and 7B are charts illustrating the estimated confidence
for ovarian cancer progression according to an exemplary embodiment of
the present disclosure;
[0019] FIGS. 8A and 8B are graphs illustrating the exemplary
reconstruction with the noisy synthetic data where .lamda.=0 according to
an exemplary embodiment of the present disclosure;
[0020] FIG. 9 is an illustration of an exemplary block diagram of an
exemplary system in accordance with certain exemplary embodiments of the
present disclosure;
[0021] FIGS. 10A and 10B are diagrams of examples of screeningoff and
background context according to an exemplary embodiment of the present
disclosure;
[0022] FIG. 11 is a diagram of exemplary properties and/or procedures
according to an exemplary embodiment of the present disclosure;
[0023] FIGS. 12A and 12B are diagrams of exemplary singlecause topology
according to an exemplary embodiment of the present disclosure;
[0024] FIGS. 13A and 13B are diagrams of exemplary conjunctivecause
topology according to an exemplary embodiment of the present disclosure;
[0025] FIG. 14 is a diagram of caveats in inferring synthetic lethality
relations according to an exemplary embodiment of the present disclosure;
[0026] FIG. 15 is a diagram of an exemplary pipeline and/or procedure for
a exemplary CAncer PRogression Inference ("CAPRI") according to an
exemplary embodiment of the present disclosure;
[0027] FIG. 16 is a set of diagrams and reconstruction trees for DAGs for
small exemplary data sets according to an exemplary embodiment of the
present disclosure;
[0028] FIG. 17 is a set of diagrams and reconstruction trees and forests
for small exemplary data sets according to an exemplary embodiment of the
present disclosure;
[0029] FIG. 18 is a set of diagrams illustrating exemplary conjunctive
causal claims according to an exemplary embodiment of the present
disclosure;
[0030] FIG. 19 is a set of graphs illustrating comparisons with
conventional progression models;
[0031] FIG. 20 is a further set of graphs illustrating comparisons with
conventional progression models;
[0032] FIG. 21 is an even further set of graphs illustrating comparisons
with conventional progression models;
[0033] FIG. 22 is a set of charts illustrating the exemplary
reconstruction of disjunctive causal claims with no hypothesis according
to an exemplary embodiment of the present disclosure;
[0034] FIG. 23 is a set of diagrams illustrating the exemplary
reconstruction of synthetic lethality with hypotheses according to an
exemplary embodiment of the present disclosure;
[0035] FIG. 24 is a diagram illustrating exemplary progression models
according to an exemplary embodiment of the present disclosure;
[0036] FIG. 25 is a diagram illustrating further exemplary progression
models according to an exemplary embodiment of the present disclosure;
[0037] FIG. 26 is a diagram illustrating exemplary CAPRI procedures
according to an exemplary embodiment of the present disclosure;
[0038] FIG. 27 is a set of exemplary diagrams and charts illustrating the
accuracy and performance of the exemplary CAPRI according to an exemplary
embodiment of the present disclosure;
[0039] FIGS. 28A28D is a diagram illustrating an exemplary procedure
according to an exemplary embodiment of the present disclosure;
[0040] FIGS. 29A29D are graphs illustrating exemplary tests of the
exemplary Polaris procedure according to an exemplary embodiment of the
present disclosure;
[0041] FIG. 30 is a diagram of an exemplary Polaris mode according to an
exemplary embodiment of the present disclosure;
[0042] FIGS. 31A31D are graphs illustrating exemplary performance results
for Polaris, BIC, and clairvoyant DiProg on CMPNs;
[0043] FIGS. 32A32D are exemplary graphs illustrating further exemplary
performance results for Polaris, BIC, and clairvoyant DiProg on DMPNs;
[0044] FIGS. 33A33D are exemplary graphs illustrating yet further
exemplary experimental performance results for Polaris, BIC, clairvoyant
DiProg on XMPNs;
[0045] FIGS. 34A34F are exemplary graphs illustrating an exemplary
.alpha.filter rejects hypotheses prior to optimization of the score
according to an exemplary embodiment of the present disclosure; and
[0046] FIG. 35 is a flow diagram of an exemplary method for generating a
model of progression of a disease according to an exemplary embodiment of
the present disclosure.
[0047] Throughout the drawings, the same reference numerals and
characters, unless otherwise stated, are used to denote like features,
elements, components or portions of the illustrated embodiments.
Moreover, while the present disclosure will now be described in detail
with reference to the figures, it is done so in connection with the
illustrative embodiments and is not limited by the particular embodiments
illustrated in the figures and appended claims.
DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS
[0048] The exemplary biological notion of causality can be based on the
notions of Darwinian evolution, in that it can be about an ensemble of
entities (e.g., population of cells, organisms, etc.). Within this
ensemble, a causal event, (e.g., c) in a member entity may result in
variations (e.g., changes in genotypic frequencies); such variations can
be exhibited in the phenotypic variations within the population, which
can be subject to Darwinian positive, and subsequently, Malthusian
negative selections, and sets the stage for a new effect event (e.g., e)
to be selected, should it occur next; it can then be concluded that "ce".
[0049] While there could be other meaningful extensions of this exemplary
framework (see, e.g., Reference 3), it can be sufficient to describe the
causality relations implicit in the somatic evolution responsible for
tumor progression. Via its very statistical nature, those relations that
only reflect "Typelevel Causality", but ignore "Tokenlevel Causality",
can be captured. Thus, while it can be estimated, for a population of
cancer patients of a particular kind (e.g., atypical Chronic Myeloid
Leukemia, aCML, patients) whether and with what probability a mutation,
such as SETBP1, would cause certain other mutations, such as ASXL1 single
nucleotide variants or indel to occur, it will remain silent as to
whether a particular ASXL1 mutation in a particular patient was caused by
an earlier SETBP1 mutation.
[0050] Exemplary Hume's regularity theory: The modern study of causation
begins with the Scottish philosopher David Hume (17111776). According to
Hume, a theory of causation could be defined axiomatically, using the
following ingredients: temporal priority, implying that causes can be
invariably followed by their effects (see, e.g., Reference 4), augmented
by various constraints, such as contiguity, constant conjunction1, etc.
Theories of this kind, that try to analyze causation in terms of
invariable patterns of succession, have been referred to as regularity
theories of causation.
[0051] The notion of causation has spawned far too many variants, and has
been a source of acerbic debates. All these theories present wellknown
limitations and confusion, but have led to a small number of modern
versions of commonly accepted, at least among the philosophers,
frameworks. For example, the theory discussed and studied by Suppes,
which is one of the most prominent causation theories, whose axioms can
be expressible in probabilistic propositional modal logics, and amenable
to algorithmic analysis. It can be the framework upon which the exemplary
our analyses and procedures can be built.
[0052] Regularity theories can have many limitation, which are described
below. Imperfect regularities: In general, it cannot be stated that
causes can be invariably (e.g., without fail) followed by their effects.
For example, while it can be stated state that "smoking can be a cause of
lung cancer", there can still be some smokers who do not develop lung
cancer.
[0053] Situations such as these can be referred to as imperfect
regularities, and could arise for many different reasons. One of these
which can be a very common situation in the context of cancer can involve
the heterogeneity of the situations in which a cause resides. For
example, some smokers may have a genetic susceptibility to lung cancer,
while others do not. Moreover, some nonsmokers may be exposed to other
carcinogens, while others may not be. Thus, the fact that not all smokers
develop lung cancer can be explained in these terms.
[0054] Irrelevance:
[0055] An event that can invariably be followed by another can be
irrelevant to it. (See, e.g., Reference 5). Salt that has been hexed by a
sorcerer invariably dissolves when placed in water, but hexing does not
cause the salt to dissolve. In fact, hexing can be irrelevant for this
outcome. Probabilistic theories of causation capture exactly this
situation by requiring that causes alter the probabilities of their
effects.
[0056] Asymmetry:
[0057] If it can be claimed that an event c causes another event e, then,
typically, it can be anticipated to claim that e does not cause c, which
would naturally follow from a strict temporalpriorityconstraint (e.g.,
cause precedes effect temporally). In the context of the preceding
example, smoking causes lung cancer, but lung cancer does not cause one
to smoke.
[0058] Spurious Regularities:
[0059] Consider a situation, not very uncommon, where a unique cause can
be regularly followed by two or more effects. As an example, suppose that
one observes the height of the column of mercury in a particular
barometer dropping below a certain level. Shortly afterwards, because of
the drop in atmospheric pressure, (the unobserved cause for falling
barometer), a storm occurs. In this settings, a regularity theory could
claim that the drop of the mercury column causes the storm when, indeed,
it may only be correlated to it. Following common terminologies, such
situations can be due to spurious correlations. There now exists an
extensive literature discussing such subtleties that can be important in
understanding the philosophical foundations of causality theory. (See,
e.g., Reference 2).
[0060] The following exemplary notation will be used below. Atomic events,
in general, are denoted by small Roman letters, such as a, b, c, . . . ;
when it can be clear from the context that the event in the model can be,
in fact, a genomic mutational event, it can be referred to directly using
the standard biological nomenclature, for example, BRCA1, BRCA2, etc.
Formulas over events will be mostly denoted by Greek letters, and their
logical connectives with the usual "and" (), "or" () and "negation" (.)
symbols. Standard operations on sets will be used as well.
[0061] Which quantity can be referred to, can be made clear from the
context. In the following, P(x) can denote the probability of x; P(xy),
the joint probability of x and y, which can be naturally extended to the
notation P(xy.sub.1. . . y.sub.u) for an arbitrary arity; and P(xy), the
conditional probability of x given y. For example, x and y can be
formulas over events.
[0062] As with causal structures, ce, where c and e can be events being
modeled, in order to denote the causal relation "c causes e". The
notation to .phi.e can be generalized with the meaning generalized
mutatis mutandis.
[0063] According to an exemplary embodiment of the present disclosure, an
exemplary framework to reconstruct cumulative progressive phenomena, such
as cancer progression, can be provided. The exemplary method, procedure,
system, etc. can be based on a notion of probabilistic causation, which
can be more suitable than correlation to infer causal structures. More
specifically, it can be possible to utilize the notion of causation. (See
e.g., Reference 30). Its basic intuition can be as follows: event a
causes event b if (i) a occurs before b and (ii) the occurrence of a
raises the probability of observing event b. Probabilistic notions of
causation have been used in biomedical applications before (e.g., to find
driver genes from CNV data (see e.g., Reference 20), and to extract
causes from biological time series data (see e.g., Reference 22)),
however, it can be believed that there was a lack of any inference of
progression models in the absence of direct temporal information.
[0064] With the problem setting (see e.g., Reference 4), according to an
exemplary embodiment of the present disclosure, an exemplary technique
can be utilized to infer probabilistic progression trees from
crosssectional data. It can be assumed that the input can be a set of
preselected genetic events such that the presence or the absence of each
event can be recorded for each sample. Using the notion of probabilistic
causation described herein, it can be possible to infer a tree whose
induced distribution best describes causal structure implicit in the
data.
[0065] The problem can be complicated by the presence of noise, such as
the one provided by the intrinsic variability of biological processes
(e.g., genetic heterogeneity) and experimental or measurement errors. To
deal with this, it can be possible to utilize a shrinkage estimator to
measure causation among any pair of events under consideration. (See,
e.g., Reference 9). The intuition of this type of estimators can be to
improve a raw estimate .alpha. (e.g., in this case probability raising)
with a correction factor .beta. (e.g., in this case a measure of temporal
distance among events); a generic shrinkage estimator can be defined as
.theta.=(1.lamda.).alpha.(x)+.lamda..beta.(x), where
0.ltoreq..lamda..ltoreq.1 can be the shrinkage coefficient, x can be the
input data and .theta. can be the estimates that should be evaluated.
.theta. can be arbitrarily shrunk towards .alpha. or .beta. by varying
.lamda.. The estimator can be biased. The power of shrinkage lies in the
possibility of determining an optimal value for .lamda. to balance the
effect of the correction factor on the raw model estimate. This approach
can be effective to regularize illposed inference problems, and
sometimes the optimal .lamda. can be determined analytically. (See, e.g.,
References 10 and 31).
[0066] According to an exemplary embodiment of the present disclosure,
however, the performance of interest can be that of the reconstruction
technique, rather than that of the estimator, usually measured as mean
squared error. Thus, it is possible to numerically estimate the global
optimal coefficient value for the reconstruction performance. Based on
synthetic data, the methods, systems and computeraccessible medium
according to the exemplary embodiments of the present disclosure can
outperform the existing tree reconstruction algorithm. (See e.g.,
Reference 4). For example, the exemplary shrinkage estimator, according
to an exemplary embodiment of the present disclosure, can provide, on
average, an increased robustness to noisy data which can ensure that it
can outperform the standard correlationbased procedure. (See e.g., id.).
Further, the exemplary method, according to an exemplary embodiment of
the present disclosure, can operate in an efficient way with a relatively
low number of samples, and its performance can quickly converge to its
asymptote as the number of samples increases. This exemplary outcome can
indicate an applicability of the exemplary procedure with relatively
small datasets without compromising its efficiency.
[0067] To that end, exemplary methods, computeraccessible medium, and
systems, according to exemplary embodiments of the present disclosure,
can be provided for modeling progression of a disease using patients'
biomedical data, for example, genomics data from patients' tumor and
normal cells to model progression of cancer, can be provided. Existing
techniques to reconstruct models of progression for accumulative
processes such as cancer, generally seek to estimate causation by
combining correlation, and a frequentist notion of temporal priority. The
exemplary methods, computeraccessible medium and systems can provide an
exemplary framework to reconstruct such models based on the probabilistic
notion of causation, which can differ fundamentally from that based on
correlation. The exemplary embodiments of the methods,
computeraccessible medium and systems can address the reconstruction
problem in a general setting, which can be complicated by the presence of
noise in the data, owing to the intrinsic variability of biological
processes as well as experimental or measurement errors. To gain immunity
to noise in the reconstruction performance, the exemplary methods,
computeraccessible medium and systems can utilize a shrinkage estimator.
The exemplary methods, computeraccessible medium and systems can be
efficient even with a relatively low number of samples and its
performance can quickly converge to its asymptote as the number of
samples increases.
Exemplary Problem Setting
[0068] An exemplary setup of the reconstruction problem can be as follows.
Assuming that a set G of n mutations (e.g., events, in probabilistic
terminology) and m samples can be provided, it can be possible to
represent a crosssectional dataset as an m.times.n binary matrix. In
this exemplary matrix, an entry (k, l)=1 if the mutation can be observed
in sample k, and 0 otherwise. It should be reemphasized that such a
dataset may not provide explicit information of time. The problem to be
solved can be that of extracting a set of edges E yielding a progression
tree T=(G U {o}, E,o) from this matrix. To be more precise, it can be
possible to reconstruct a proper rooted tree that can satisfy: (i) each
node has at most one incoming edge in E, (ii) there may be no incoming
edge to the root (iii) there may be no cycles. The root of T can be
modeled using a (e.g., special) event o G, so to extract, in principle,
heterogeneous progression paths, for example, forests. Each progression
tree can subsume a distribution of observing a subset of mutations in a
cancer sample.
[0069] Definition 1 (TreeInduced Distribution).
[0070] Let T be a tree and .alpha.:E>[0, 1] a labeling function
denoting the independent probability of each edge, T can generate a
distribution where the probability of observing a sample with the set of
alterations
G * G is ( G * ) = e .dielect cons.
E ' .alpha. ( e ) ( u , v ) .dielect cons.
E u .dielect cons. G * , v G [ 1  .alpha. ( u
, v ) ] ( 1 ) ##EQU00001##
where E'CE can be the set of edges connecting the root o to the events in
G*.
[0071] The exemplary temporal priority principle states that all causes
should precede their effects. (See e.g., Reference 28). This exemplary
distribution subsumes the following temporal priority: for any oriented
edge (a.fwdarw.b) a sample can contain alteration b with probability
P(a)P(b) which, roughly speaking, means that the probability of observing
a can be greater than the probability of observing b.
[0072] The notion of treeinduced distribution can be used to state an
important aspect which can make the reconstruction problem more
difficult. The input data can be, for example, a set of samples
generated, preferably, from an unknown distribution induced by an unknown
tree that can be intended to be reconstructed. However, in some cases, it
can be possible that no tree exists whose induced distribution generates
exactly those data. When this happens, the set of observed samples can
slightly diverge from any treeinduced distribution. To model these
situations, a notion of noise can be introduced, which can depend on the
context in which data can be gathered.
Exemplary Oncotree Approach
[0073] Previous method described how to extract progression trees, named
"oncotrees", initially applied to static CNV data. (See e.g., Reference
4). In these exemplary trees, nodes can represent CNV events and edges
correspond to possible progressions from one event to the next.
[0074] The reconstruction problem can be exactly as described above, and
each tree can be rooted in the special event o. The choice of which edge
to include in a tree can be based on the exemplary estimator
w a .fwdarw. b = log [ ( a ) ( a ) + ( b )
( a , b ) ( a ) ( b ) ] ( 2 )
##EQU00002##
which can assign, to each edge a.fwdarw.b, a weight accounting for both
the relative and joint frequencies of the events, thus measuring
correlation. The exemplary estimator can be evaluated after including o
to each sample of the dataset. In this definition the rightmost term can
be the likelihood ratio (e.g., symmetric) for a and b occurring together,
while the leftmost can be the asymmetric temporal priority measured by
rate of occurrence. This implicit form of timing can assume that, if a
can occur more often than b, then it likely can occur earlier, thus
satisfying the inequality
( a ) ( a ) + ( b ) > ( b ) ( a )
+ ( b ) . ##EQU00003##
[0075] An exemplary oncotree can be the rooted tree whose total weight
(e.g., sum of all the weights of the edges) can be maximized, and can be
reconstructed in 0(G.sup.2) steps using Edmond's seminal result. (See
e.g., Reference 6). By the exemplary construction, the resulting
exemplary graph can be a proper tree rooted in o: each event can occur
only once, confluences can be absent, for example, any event can be
caused by at most one other event. The branching trees method has been
used to derive progression trees for various cancer datasets (see, e.g.,
References 18, 21 and 27), and even though several extensions of the
method exist (see, e.g., References 2 and 5), it is one of the most used
methods to reconstruct trees and forests.
Exemplary Probabilistic Approach to Causation
[0076] Before introducing the notion of causation, upon which the
exemplary procedure can be based, the approach to probabilistic causation
is described. An extensive discussion on this topic, its properties and
its problems has been previously discussed (See e.g., References 16 and
30)
[0077] Exemplary Definition 2 (Probabilistic Causation (See, e.g.,
Reference 30)).
[0078] For any two events c and e, occurring respectively at times t.sub.e
and t.sub.e, under the assumptions that 0<P(c),P(e)<1, the event c
causes the event e if it occurs before the effect and the cause raises
the probability of the effect, for example:
t.sub.c<t.sub.e and (ec)>(ec). (3)
[0079] The exemplary input of the exemplary procedure can include
crosssectional data and no information about the timings to may be
available. Thus the probability raising ("PR") property can be considered
as a notion of causation, for example, P(e c)>P(e I6'). Provided
below is a review some exemplary properties of the PR.
[0080] Exemplary Proposition 1 (Dependency):
[0081] Whenever the PR holds between two events a and b, then the events
can be statistically dependent in a positive sense, that can be, for
example:
(ba)>(b )(a,b)>(a)(b). (4)
[0082] This property, as well as Property 2, is a wellknown fact of the
PR. Notice that the opposite implication holds as well. When the events a
and b can still be dependent but in a negative sense, for example, P(a,
b)<P(a)P(b), the PR does not hold, for example, P(ba)<P(b ).
[0083] It can be possible to use the exemplary PR to determine whether a
specific a pair of events a and b satisfy a causation relation. Thus
facilitating the conclusion when the event a causes the event b, and a
can be placed before b in the progression tree. Unfortunately, it may not
be possible to simply state that a causes b when P(ba)<P(b ) since,
although PR can be known not to be symmetric, it holds.
[0084] Exemplary Proposition 2 (Mutual PR):
(ba)>(b )(ab)>(ab). Proposition 2 (Mutual PR).
For example, if a raises the probability of observing b, then b raises
the probability of observing a too.
[0085] Nevertheless, to determine causes and effects among the genetic
events, it can be possible use the confidence degree of probability
rising to decide the direction of the causation relationship between
pairs of events. In other words, if a raises the probability of b more
than the other way around, then a can be a more likely cause of b than b
of a. As discussed, the PR may not be symmetric, and the direction of
probability rising can depend on the relative frequencies of the events.
[0086] Exemplary Proposition 3 (Probability Raising and Temporal
Priority):
[0087] For any two events a and b such that the probability raising
P(ab)>P(a) holds, the following can be provided
( a ) > ( b ) .revreaction. ( b a )
( b a _ ) > ( a b ) ( a b _ ) .
( 5 ) ##EQU00004##
[0088] For example, according to the PR model, given that the PR holds
between two events, a raises the probability of b more than b raises the
probability of a, if a can be observed more frequently than b. The ratio
can be used to assess the PR inequality. An exemplary proof of this
proposition can be found in the Appendix. From this exemplary result, it
follows that if the timing of an event can be measured by the rate of its
occurrence (e.g., P(a)>P(b) can imply that a happens before b), this
exemplary notion of PR subsumes the same notion of temporal priority
induced by a tree. Further, this can be the temporal priority made
explicit in the coefficients of Desper's. Given these exemplary results,
it can be possible to define the following notion of causation.
[0089] Exemplary Definition 3.
[0090] For example, a causes b if a can be a probability raiser of b, and
it occurs more frequently:
(ba)>(band >(b).
[0091] Further, it is possible to utilize the conditions for the exemplary
PR to be computable: every mutation a should be observed with probability
strictly 0<P(a)<1. Moreover, each pair of mutations (a, b) can be
reviewed to be distinguishable in terms of PR, that can be P(a b)<1 or
P(b I<1 similarly to the above condition. Any nondistinguishable pair
of events can be merged as a single composite event. From now on, it can
be assumed that these conditions have been verified.
Extracting Progression Trees with Probability Raising and Shrinkage
Estimator
[0092] The exemplary procedure can be similar to Desper's algorithm, with
one of the differences being an alternative weight function based on a
shrinkage estimator.
[0093] Definition 4 (Shrinkage Estimator):
[0094] It is possible to define the shrinkage estimator m.sub.a.fwdarw.b
of the confidence in the causation relationship from a to b as
m a .fwdarw. b = ( 1  .lamda. ) .alpha. a .fwdarw. b
+ .lamda..beta. a .fwdarw. b , where 0 .ltoreq.
.lamda. .ltoreq. 1 and ( 6 ) .alpha. a .fwdarw. b
= ( b a )  ( b a _ ) ( b a ) + (
b a _ ) .beta. a .fwdarw. b = ( a , b )
 ( a ) ( b ) ( a , b ) + ( a ) ( b )
. ( 7 ) ##EQU00005##
[0095] This exemplary estimator can combine a normalized version of the
PR, the raw model estimate .alpha., with the correction factor .beta..
The exemplary shrinkage can improve the performance of the overall
reconstruction process, not limited to the performance of the exemplary
estimator itself. For example, m can induce an ordering to the events
reflecting the confidence for their causation. However, this exemplary
framework may not imply any performance bound for the, for example, mean
squared error of m. The exemplary shrinkage estimator can be an effective
way to get such an ordering when data is noisy. In the exemplary system,
method and computeraccessible medium a pairwise matrix version of the
estimator can be used.
TABLEUS00001
Algorithm 1 Treealike reconstruction with shrinkage estimator
1: consider a set of genetic events G = {g.sub.1, . . . , g.sub.n} plus a
special
event .diamond., added to each sample of the dataset;
2. define a n .times. n matrix M where each entry contains the shrinkage
estimator
m i > j = ( 1  .lamda. ) ( j  i )  (
j  i _ ) ( j  i ) + ( j  i _ ) +
.lamda. ( i , j )  ( i ) ( j )
( i , j ) + ( i ) ( j ) ##EQU00006##
according to the observed probability of the events i and j;
3: [PR causation] define a tree T = (G .orgate. {.diamond.}, E, .diamond.)
where (i .fwdarw. j) .epsilon.
E for i, j .epsilon. G if and only if:
m.sub.i.fwdarw.j > 0 and m.sub.i.fwdarw.j > m.sub.j.fwdarw.i and
.Ainverted.i' .epsilon. G, m.sub.i,j > m.sub.i',j.
4: [Correlation filter] define G.sub.j = {g.sub.i .epsilon. G  (i) >
(j)}, replace edge
(i .fwdarw. j) .epsilon. E with edge (.diamond. .fwdarw. j) if, for all
g.sub.w .epsilon. G.sub.j, it holds
1 1 + ( j ) > ( w ) ( w ) + ( j
) ( w , j ) ( w ) ( j ) .
##EQU00007##
[0096] Exemplary Raw Estimator and the Correction Factor.
[0097] By considering, for example, only the exemplary raw estimator
.alpha., it can be possible to include an edge (a.fwdarw.b) in the tree
consistently in terms of (i) Definition 3 and (ii) if .alpha. can be the
best probability raiser for b. When P(a)=P(b), the events a and b can be
indistinguishable in terms of temporal priority. Thus .alpha. may not be
sufficient to decide their causal relation, if any. This intrinsic
ambiguity becomes unlikely when .beta. can be introduced, even if it can
still be possible.
[0098] This exemplary formulation of .alpha. can be a monotonic normalized
version of the PR ratio.
[0099] Proposition 4 (Monnotonic normalization) For any two events a and b
we have
( a ) > ( b ) .revreaction. ( b a )
( b a _ ) > ( a b ) ( a b _ )
.revreaction. .alpha. a .fwdarw. b > .alpha. b .fwdarw. a .
( 8 ) ##EQU00008##
[0100] This exemplary raw model estimator can satisfy
 1 .ltoreq. .alpha. a .fwdarw. b , .alpha. b .fwdarw. a
.ltoreq. 1 ##EQU00009##
and can have the following meaning: when it tends to 1, the pair of
events can appear disjointly (e.g., they can show an anticausation
pattern in the observations), when it tends to 0, no causation or
anticausation can be inferred, and the two events can be statistically
independent. And when it tends to 1, causation relationship between the
two events can be robust. Therefore, a can provide a quantification of
the degree of confidence for a given causation relationship, with respect
to probability rising.
[0101] However, .alpha. does not provide a general criterion to
disambiguate among groups of candidate parents of a given node. A
specific case can be shown in which .alpha. may not be a sufficient
estimator. For instance, a set of three events can be provided that can
be involved in a causal linear path: a.fwdarw.b.fwdarw.c. In this case,
when evaluating the candidate parents a and b for c, the following can be
provided: a.sub.a.fwdarw.c=a.sub.b.fwdarw.c=1. Accordingly, it can be
possible to infer that t.sub.a<t.sub.c and t.sub.b<t.sub.c, for
example, a partial ordering, which may not help to disentangle the
relation among a and b with respect to c.
[0102] In this exemplary case, the .beta. coefficient can be used to
determine which of the two candidate parents can occur earlier. In
general, such a correction factor can provide information on the temporal
distance between events, in terms of statistical dependency. In other
words, the higher the coefficient, the closer two events can be. The
exemplary shrinkage estimator m can then result in a shrinkable
combination of the raw PR estimator .alpha. and of the .beta. correction
factor, which can satisfy an important property.
[0103] Exemplary Proposition 5 (Coherence in Dependency and Temporal
Priority):
[0104] The .beta. correction factor can be symmetrical, and can subsume
the same notion of dependency of the raw estimator .alpha., that can be,
for example:
(a,b)>(a)(b).alpha..sub.a.fwdarw.b>0.beta..sub.a.fwdarw.b>0 and
.beta..sub.a.fwdarw.b=.beta..sub.b.fwdarw.a. (9)
Thus, the correction factor can respect the temporal priority induced by
the raw estimator .alpha..
[0105] The Correlation Filter.
[0106] Following Desper's approach, it can be possible to add a root o
with P(o)=1 to separate different progression paths, which can then be
represented as different subtrees rooted in o. the exemplary system,
method and computeraccessible medium initially builds a unique tree by
using m. Then, the correlationalike weight between any node j and o can
be computed as, for example:
( .diamond. ) ( .diamond. ) + ( j ) (
.diamond. , j ) ( .diamond. ) ( j ) = 1 1 + ( j
) . ##EQU00010##
[0107] If this quantity can be greater than the weight of j with each
upstream connected element i, it can be possible to substitute the edge
(i j) with the edge (o>j). It can then be possible to use a
correlation filter because it would make no sense to ask whether o was a
probability raiser for j, besides the technical fact that a may not be
defined for events of probability 1. For example, this exemplary filter
can imply a nonnegative threshold for the shrinkage estimator, when a
cause can be valid.
[0108] Exemplary Theorem 1 (Independent Progressions).
[0109] Let G*={al, . . . , ak}C G a set of k events which can be candidate
causes of some b G*, for example, P(a.sub.i)>P(b) and
m.sub.ai.fwdarw.b>0 for any a.sub.i. There exist
1<.gamma.<1/P(a.sub.i) and S>0 such that b determines an
independent progression tree in the reconstructed forest, for example,
the edge o b can be picked by the exemplary system, method and
computeraccessible medium, if for any a.sub.i
(a.sub.i,b)<.gamma.[(a.sub.i)(b)]+.delta.. (10)
[0110] The proof of this Theorem can be found in the enclosed Appendix.
What can be indicated by this exemplary theorem can be that, by examining
the level of statistical dependency of each pair of events, it can be
possible to determine how many trees can compose the reconstructed
forest. Further, it can suggest that the exemplary system, method and
computeraccessible medium can be defined by first processing the
correlation filter, and then using m to build the independent progression
trees in the forest. Thus, the exemplary procedure/algorithm can be used
to reconstruct welldefined trees in the sense that no transitive
connections and no cycles can appear.
[0111] Exemplary Theorem 2 (Procedure Correctness).
[0112] The exemplary system, method and computeraccessible medium can
reconstruct a welldefined tree T without disconnected components,
transitive connections and cycles.
[0113] The proof of this Theorem follows immediately from Proposition 3
and can be found in the enclosed Appendix.
Exemplary Performance of Procedure and Estimation of Optimal Shrinkage
Coefficient
[0114] Synthetic data can be used to evaluate the performance of the
exemplary system, method and computeraccessible medium as a function of
the shrinkage coefficient A. Many distinct synthetic datasets were
created for this. The procedure performance was measured in terms of Tree
Edit Distance ("TED") (see, e.g., Reference 35). For example, the
exemplary minimumcost sequence of node edit operations (e.g.,
relabeling, deletion and insertion) that transforms the reconstructed
trees into the ones generating the data.
Exemplary Synthetic Data Generation
[0115] Synthetic datasets were generated by sampling from various random
trees, constrained to have depth log(JG1), since wide branches can be
hard to reconstruct than straight paths. Unless differently specified, in
all the experiments, 100 distinct random trees, or forests, accordingly
to the test to perform of 20 events each were used. This is a fairly
reasonable number of events, and can be in line with the usual size of
reconstructed trees. (See, e.g., References 13, 24, 26 and 29). The
scalability of the reconstruction performance was tested against the
number of samples by letting IGI range from 50 to 250, with a step of 50,
and by replicating 10 independent datasets for each parameters setting.
[0116] A form of noise was included in generating the datasets, so to
account for (i) the realistic presence of biological noise (e.g., the one
provided by bystander mutations, genetic heterogeneity, etc.) and (ii)
experimental or measurement errors. A noise parameter 0<v<1 can
denote the probability that any event assumed a random value (e.g., with
uniform probability), after sampling from the treeinduced distribution'.
This can introduce both false negatives and false positives in the
datasets. Algorithmically, this exemplary process can generate, on
average, G .nu./2 random entries in each sample (e.g. with v=0.1 there
can be, on average, one error per sample). It can be possible to assess
whether these noisy samples can mislead the reconstruction process, even
for low values of .nu..
[0117] In what follows, it can be possible to refer to datasets generated
with v>0 as noisy synthetic dataset. In the exemplary experiments,
.nu. can be discretized by 0.025, (e.g., about 2.5% noise).
Exemplary Optimal Shrinkage Coefficient
[0118] Given that the events can be dependent on the topology to
reconstruct, it was not possible to determine analytically an optimal
value for the shrinkage. The exemplary assumption that noise can be
uniformly distributed among the events may appear simplistic. In fact
some events may be more robust, or easy to measure, than others. For
example, works more sophisticated noise distributions can be considered
coefficient by using, for example, the standard results in shrinkage
statistics. (See e.g., Reference 9). Therefore, an empirical estimation
of the optimal A can be used, both in the case of trees and forests.
[0119] As shown in FIG. 1, the variation in the performance of the
exemplary system, method and computeraccessible medium the exemplary
system, method and computeraccessible medium as a function of A can be
indicated, for example, in the specific case of datasets with 150 samples
generated on tree topologies. The exemplary optimal value (e.g., lowest
TED) for noisefree datasets (e.g., v=0) can be obtained for
.lamda..fwdarw.0, whereas for the noisy datasets a series of Ushaped
curves can indicate a unique optimum value for .lamda..fwdarw.1/2, with
respect to all the levels of noise. Identical results can be obtained
when dealing with forests. Further, exemplary experiments show that the
estimation of the optimal .lamda. may not be dependent on the number of
samples in the datasets. (See FIG. 2). An exemplary analysis was limited
to datasets with the typical sample size that can be characteristic of
data currently available. In other words, if the noisefree case can be
considered, the best performance can be obtained by, for example,
shrinking m to the PR model raw estimate .alpha., for example:
m a .fwdarw. b .apprxeq. .lamda. .fwdarw. 0 .alpha. a
.fwdarw. b . ( 11 ) ##EQU00011##
which can be obtained by setting .lamda. to a very small value, e.g.
10.sup.2, in order to consider the contribution of the correction factor
too. Conversely, when considering the case of v>0, the best
performance can be obtained by averaging the shrinkage effect, as for
example:
m a .fwdarw. b = .lamda. = 1 / 2 .alpha. a .fwdarw.
b 2 + .beta. a .fwdarw. b 2 . ( 12 ) ##EQU00012##
[0120] These exemplary results can indicate that, in general, a unique
optimal value for the shrinkage coefficient can be determined.
[0121] FIGS. 2A and 2B shown an optimal .lamda. with datasets of different
sizes similar to FIG. 1, with sample sizes of 50 and 250 respectively.
The estimation of the optimal shrinkage coefficient .lamda. is
irrespective of sample size.
Exemplary Performance of Procedure Compared to Oncotrees
[0122] As shown in exemplary graphs of FIGS. 3A and 3B, the performance of
the exemplary system, method and computeraccessible medium (element 305)
can be compared with oncotrees (element 310), for the case of noisefree
synthetic data. In this exemplary case, the optimal shrinkage coefficient
was used in equation (11): .lamda..fwdarw.0. FIGS. 4A and 4B show
diagrams of an example of a reconstructed tree where, for the noisefree
case, whereas the exemplary system, method and computeraccessible medium
can infer the correct tree while oncotrees mislead a causation relation.
Examples of reconstruction from a dataset by the Target tree (see FIG.
4A, where numbers represent the probability of observing a mutation while
generating sample), with v=0. The oncotree (shown in FIG. 4B) misleads
the correct causal relation for the doublecircled mutation. It evaluates
w=0 for the real causal edge and w=0.14 for the wrong one. The exemplary
system, method and computeraccessible medium, according to an exemplary
embodiment of the present disclosure, can infer the correct tree.
[0123] In general, the TED of the exemplary system, method and
computeraccessible medium can be, on average, bounded above by the TED
of the oncotrees, both in the case of trees (see FIG. 3A) and forests
(see FIG. 3B). For trees, with a low number of samples (e.g., 50) the
average TED of the exemplary system, method and computeraccessible
medium can be around 7, whereas for Desper's technique can be around 13.
The exemplary performance of both procedures can improve as long as the
number of samples can be increased: the exemplary system, method and
computeraccessible medium has the best performance (e.g., TED.apprxeq.0)
with 250 samples, while oncotrees have TED around 6. When forests can be
considered, the difference between the performances of the procedures can
slightly reduce, but also in this case the exemplary system, method and
computeraccessible medium clearly outperforms branching trees.
[0124] The exemplary improvement due to the increase in the sample set
size tends toward a plateau, and the initial TED for the exemplary
estimator is close to the plateau value. Thus, this can indicate that the
exemplary system, method and computeraccessible medium has good
performance with few samples. This can be an important result,
particularly considering the scarcity of available biological data.
[0125] In the exemplary graphs of FIGS. 5A and 5B, the comparison is
extended to noisy datasets. In this exemplary case, the optimal shrinkage
coefficient in equation (12): .lamda..fwdarw.1/2. can be used. The
exemplary results can confirm what can be observed in the case of
noisefree data, as exemplary the exemplary system, method and
computeraccessible medium outperforms Desper's branching trees up to
v=0.15 (e.g., v=0.1), for all the sizes of the sample sets. (See e.g.,
element 505). The bar plots represent the percentage of times the best
performance is achieved at v=0.
Exemplary Performance on Cancer Datasets
[0126] The exemplary results above indicate that the exemplary method,
according to an exemplary embodiment of the present disclosure,
outperforms oncotrees. The exemplary procedure was tested on a real
dataset of cancer patients.
[0127] To test the exemplary reconstruction procedure on a real dataset,
it was applied to the ovarian cancer dataset made available within the
oncotree package. (See, e.g., Reference 4). The data was collected
through the public platform SKY/MFISH (see, e.g., Reference 23), which
can be used to facilitate investigators to share and compare molecular
cytogenetic data. The data was obtained by using the Comparative Genomic
Hybridization technique ("CGH") on samples from papillary serous
cystadenocarcinoma of the ovary. This exemplary procedure uses
fluorescent staining to detect CNV data at the resolution of chromosome
arms. This type of analysis can be performed at a higher resolution,
making this dataset rather outdated. Nevertheless, it can still serve as
a good testcase for the exemplary approach. The seven most commonly
occurring events can be selected from the approximate 87 samples, and the
set of events can be the following gains and losses on chromosomes arms
G={8q+, 3q+, 1q+, 5q, 4q, 8p, Xp}, where for example, 4q can denote a
deletion of the q arm of the 4th chromosome.
[0128] In the exemplary diagrams of FIGS. 6A and 6B, the trees
reconstructed by the two approaches can be compared. The exemplary
procedure, according to the exemplary embodiment of the present
disclosure, can differ from Desper's in terms of how it predicts by
predicting the causal sequence of alterations [0129] 8q+ 8p >Xp.
[0130] For example, all the samples in the dataset can be generated by the
distribution induced by the recovered tree. Thus facilitating the
consideration of this exemplary dataset as noisefree; algorithmically,
this facilitates the use of the exemplary estimator for
.lamda..fwdarw.0).
[0131] While, a biological interpretation for this result is not provided,
it is known that common cancer genes reside in these regions (e.g., the
tumor suppressor gene PDGFR on 5q, and the oncogene MYC on 8q, and loss
of heterozygosity on the short arm of chromosome 8 can be very common.
Recently, evidence has been reported that location 8p contains many
cooperating cancer genes. (See, e.g., Reference 34).
[0132] To assign a confidence level to these inferences, both parametric
and nonparametric bootstrapping methods can be applied to the exemplary
results. (See, e.g., Reference 7). For example, these tests consists of
using the reconstructed trees (in the parametric case), or the
probability observed in the dataset (in the nonparametric case) to
generate new synthetic datasets, and then reconstruct the progressions
again. (See, e.g., Reference 8). The confidence can be given by the
number of times the trees shown in FIGS. 6A and 6B are reconstructed from
the generated data. A similar approach can be used to estimate the
confidence of every edge separately. For oncotrees the exact tree can be
obtained about 83 times out of about 1000 nonparametric resamples, so
its estimated confidence can be about 8.3%. For the exemplary procedure,
according to the exemplary embodiment of the present disclosure, the
confidence can be about 8.6%. In the nonparametric case, the confidence
of oncotrees can be about 17% while that of the exemplary procedure can
be much higher, for example, 32%. For the nonparametric case, an
exemplary edge confidence is shown in the exemplary tables of FIGS. 7A
and 7B. For example, the exemplary procedure, according to an exemplary
embodiment of the present disclosure, can reconstruct the inference 8q+
8p with high confidence (confidence of about 62%, and 26% for 5q 8p),
while the confidence of the edge 8q+5q can be only 39%, almost the same
as 8p 8q+ (confidence of about 40%). FIGS. 7A and 7B show the frequency
of edge occurrences in the nonparametric bootstrap test, for the trees
shown in FIGS. 6A and 6B. Element 705 represents <0.4%, element 710
represents 0.4%0.8%, and element 715 represents >0.8%. Bold entries
are the edges received using the exemplary system, method and
computeraccessible medium.
[0133] FIGS. 8A and 8B illustrate exemplary graphs providing an exemplary
reconstruction with noisy synthetic data and .lamda.>0. The exemplary
settings of the exemplary experiments are the same as those used for FIG.
5, but the estimator is shrunk to .alpha. by .lamda.>0 (e.g.,
.lamda.=0.01). For example, in element 805, the performance of the
exemplary system, method and computeraccessible medium can converge with
the Exemplary Desper's for v approximately equal to 0.01. Thus it is
faster than the case where .lamda. is approximately equal to 1/2).
[0134] Exemplary Analysis of Other Datasets.
[0135] The differences between the reconstructed trees can also be based
on datasets of gastrointestinal and oral cancer. (See, e.g., References
13 and 26). In the case of gastrointestinal stromal cancer, among the 13
CGH events considered (see e.g., Reference 13), for gains on 5p, 5q and
8q, losses on 14q, 1p, 15q, 13q, 21q, 22q, 9p, 9q, 10q and 6q, the
branching trees can identify the path progression as, for example:
1p 15q +13q +21q while the exemplary system, method and
computeraccessible medium can reconstruct the branch as, for example:
1p >15q 1p >13q >21q
[0136] In the case of oral cancer, among the 12 CGH events considered for
gains on 8q, 9q, 11q, 20q, 17p, 7p, 5p, 20p and 18p, losses on 3p, 8p and
18q, the reconstructed trees differ since oncotrees identifies the path
as, for example:
8q+ >20q+ >20p+
[0137] These examples show that the exemplary the exemplary system, method
and computeraccessible medium can provide important differences in the
reconstruction compared to the branching trees.
Exemplary Discussion
[0138] As described herein, an exemplary framework for the reconstruction
of the causal topologies, according to an exemplary embodiment of the
present disclosure, has been described that can provide extensive
guidance on a cumulative progressive phenomena, based on the probability
raising notion of probabilistic causation. Besides such a probabilistic
notion, the use of an exemplary shrinkage estimator has been discussed
for efficiently unraveling ambiguous causal relations, often present when
data can be noisy. Indeed, an effective exemplary procedure can be
described for the reconstruction of a tree or, in general, forest models
of progression which can combine, for the first time, probabilistic
causation and shrinkage estimation.
[0139] Such exemplary procedure was compared with a standard approach
based on correlation, to show that that the exemplary method can
outperform the state of the art on synthetic data, also exhibiting a
noteworthy efficiency with relatively small datasets. Furthermore, the
exemplary procedure has been tested on lowresolution chromosomal
alteration cancer data. This exemplary analysis can indicate that the
exemplary procedure, system and computer accessible medium, according to
an exemplary embodiment of the present disclosure, can infer, with high
confidence, exemplary causal relationships which would remain
unpredictable by basic correlationbased techniques. Even if the cancer
data that used can be coarsegrained, and does not account for, for
example, smallscale mutations and epigenetic information, this exemplary
procedure can be applied to data at any resolution. In fact, it can
require an input set of samples containing some alterations (mutations in
the case of cancer), supposed to be involved in a certain causal process.
The exemplary results of the exemplary procedure can be used not only to
describe the progression of the process, but also to classify. In the
case of cancer, for instance, this genomelevel classifier could be used
to group patients and to set up a genomespecific therapy design.
[0140] Further complex models of progression can be inferred with
probability raising, for example, directed acyclic graphs. (See, e.g.,
References 1, 11 and 12). These exemplary models, rather than trees, can
explain the common phenomenon of preferential progression paths in the
target process via, for example, confluence among events. In the case of
cancer, for example, these models can be more suitable than trees to
describe the accumulation of mutations.
[0141] Further, the exemplary shrinkage estimator itself can be modified
by, for example, introducing, different correction factors. In addition,
regardless of the correction factor, an analytical formulation of the
optimal shrinkage coefficient can be provided with the hypotheses which
can apply to the exemplary problem setting. (See e.g., References 10 and
31).
Exemplary Simplified Framework
[0142] The currently existing literature lacks a framework readily
applicable to the problem of reconstructing cancer progression, as
governed by somatic evolution; however, each theory has ingredients that
can be highly promising and relevant to the problem.
[0143] Each of the existing theories faces various difficulties, which can
be rooted primarily in the attempt to construct a framework in its full
generality: each theory aims to be both necessary and sufficient for any
causal claim, in any context. In contrast, the exemplary system, method,
and computeraccessible medium, according to an exemplary embodiment of
the present disclosure simplifies the problem by breaking the task into
two: first, defining a framework for Suppes' prima facie notion, though
it admits some spurious causes and then dealing with spuriousness by
using a combination of tools, for example, Bayesian, empirical Bayesian,
regularization. The framework can be based on a set of conditions that
can be necessary even though not sufficient for a causal claim, and can
be used to refine a prima facie cause to either a genuine or a spurious
cause (e.g., or even ambiguous ones, to be treated as plausible
hypotheses which can be refuted/validated by other means).
[0144] Statement of Assumptions.
[0145] Along with the described interpretation of causality, throughout
this document, the following exemplary simplifying assumptions can be
made: [0146] (i) All causes involved in cancer can be expressed by
monotonic Boolean formulas. For example, all causes can be positive and
can be expressed in CNF where all literals occur only positively. The
size of the formula and each clause therein can be bounded by small
constants. [0147] (ii) All events can be persistent. For example, once a
mutation has occurred, it cannot disappear. Hence, situations where
P(ec)>P(ec) are not modelled. [0148] (iii) Closed world. All the
events which can be causally relevant for the progression can be
observable and the observation can significantly describe the progressive
phenomenon. [0149] (iv) Relevance to the progression. All the events have
probability strictly in the real open interval (e.g., 0, 1), for example,
it can be possible to asses if they can be relevant to the progression.
[0150] (v) Distinguishability. No two events appear equivalent, for
example, they can neither be both observed nor both missing
simultaneously.
Exemplary Learning of Bayesian Networks ("BN"s)
[0151] A BN can be a statistical model that succinctly represents a joint
distribution over n variables, and encodes it in a direct acyclic graph
over n nodes (e.g., one per variable). In BNs, the full joint
distribution can be written as a product of conditional distributions on
each variable. An edge between two nodes, A and B, can denote statistical
dependence, P(A B).noteq.P(A)P(B), no matter on which other variables can
be conditioned on (e.g., for any other set of variables C it holds
P(ABC).noteq.P(AC)P(BC). In such an exemplary graph, the set of
variables connected to a node X can determine its set of "parent" nodes
.pi.(X). Note that a node cannot be both ancestor and descendant of
another node, as this would cause a directed cycle.
[0152] The joint distribution over all the variables can be written as
.parallel..sub.xP(X.pi.(X)). If a node has no incoming edges (e.g., no
parents), its marginal probability can be P(X). Thus, to compute the
probability of any combination of values over the variables, the
conditional probabilities of each variable given its parents can be
parameterized. If the variables can be binary, the number of parameters
in each conditional probability table can be locally of exponential size
(namely, 2.sup.n(X)1). Thus, the total number of parameters needed to
compute the full joint distribution can be of size .SIGMA..sub.x
sin(X).sub.2, which can be considerably less than 2.sup.n1.
[0153] A property of the graph structure can be, for each variable, a set
of nodes called the Markov blanket which can be defined so that,
conditioned on it, this variable can be independent of all other
variables in the system. It can be proven that for any BN, the Markov
blanket can consist of a node's parents, children as well as the parents
of the children.
[0154] The usage of the symmetrical notion of conditional dependence can
introduce important limitations of structure learning in BNs. In fact,
note that edges A.fwdarw.B and B.fwdarw.A denote equivalent dependence
between A and B. Thus distinct graphs can model the exact same set of
independence and conditional independence relations. This yields the
notion of Markov equivalence class as a partially directed acyclic graph,
in which the edges that can take either orientation can be left
undirected. A theorem proves that two BNs can be Markov equivalent when
they have the same skeleton and the same vstructures; the former being
the set of edges, ignoring their direction (e.g., A.fwdarw.B and
B.fwdarw.A constitute a unique edge in the skeleton) and the latter being
all the edge structures in which a variable has at least two parents, but
those do not share an edge (e.g., A.fwdarw.B.rarw.C). (See, e.g.,
Reference 52).
[0155] BNs have an interesting relation to canonical Boolean logical
operators , and .sym. formulas over variables. These formulas, which can
be "deterministic" in principle, in BNs, can be naturally softened into
probabilistic relations to facilitate some degree of uncertainty or
noise. This probabilistic approach to modeling logic can facilitate
representation of qualitative relationships among variables in a way that
can be inherently robust to small perturbations by noise. For instance,
the phrase "in order to hear music when listening to an mp3, it can be
necessary and sufficient that the power be on and the headphones be
plugged in" can be represented by a probabilistic conjunctive formulation
that relates power, headphones and music, in which the probability that
music can be audible depends only on whether power and headphones can be
present. On the other hand, there can be a small probability that the
music will still not play (e.g., perhaps no songs were loaded on to the
device) even if both power and headphones are on, and there can be small
probability that music can be heard even without power or headphone.
[0156] Note that the subset of networks that have discrete random
variables that may be visible can only be considered. Networks with
latent and continuous variables present their own challenges, although
they share most of the mathematical foundations discussed here.
Exemplary Approaches to Learn the Structure of a BN
[0157] Classically, there have been two families of methods aimed at
learning the structure of a BN from data. The methods belonging to the
first family seek to explicitly capture all the conditional independence
relations encoded in the edges, and are referred to as constraint based
approaches. The second family, that of score based approaches, seeks to
choose a model that maximizes the likelihood of the data given the model.
Since both the exemplary approaches can lead to intractability (e.g.,
NPhardness) (see, e.g., References 53 and 54), computing and verifying
an optimal solution can be impractical and, therefore, heuristic
procedures have to be used, which only sometimes guarantee optimality. A
third class of learning procedures that takes advantage of specialized
logical relations have been introduced below. Below is a description of
other exemplary approaches. After the exemplary approach is introduced,
it can be compared with that of all the techniques described.
Exemplary Constraint Based Approaches
[0158] An intuitive explanation of several common procedures used for
structure discovery can be presented by explicitly considering
conditional independence relations between variables.
[0159] The basic idea behind all procedures can be to build a graph
structure reflecting the independence relations in the observed data,
thus matching as closely as possible the empirical distribution. The
difficulty in this exemplary approach can be in the number of conditional
pairwise independence tests that a procedure would have to perform to
test all possible relations. This can be exponential, which can be
conditioned on a power set when testing for the conditional independence
between two variables. This inherent intractability benefits from the
introduction of approximations.
[0160] In this exemplary case, two (or more or less) exemplary constraint
based procedures, the PC procedure (see, e.g., Reference 55) and the
Incremental Association Markov Blanket ("IAMB") can be focused on, (see,
e.g., Reference 56), because of their proven efficiency and widespread
usage. In particular, the PC procedure can solve the aforementioned
approximation problem by conditioning on incrementally larger sets of
variables, such that most sets of variables will never have to be tested.
Whereas the IAMB first computes the Markov blanket of all the variables
and conditions only on members of the blankets.
[0161] The PC procedure (see, e.g., Reference 55) begins with a fully
connected graph and, on the basis of pairwise independence tests,
iteratively removes all the extraneous edges. It can be based on the idea
that if a separating set exists that makes two variables independent, the
edge between them can be removed. To avoid an exhaustive search of
separating sets, these can be ordered to find the correct ones early in
the search. Once a separating set can be found, the search for that pair
can end. The exemplary PC procedure can order separating sets of
increasing size l starting from 0, the empty set, and incrementing until
l=n2. The exemplary procedure stops when every variable has fewer than
l1 neighbors, since it can be proven that all valid sets must have
already been chosen. During the computation, the larger the value of l
can be, the larger number of separating sets must be considered. However,
by the time l gets too large, the number of nodes with degree l or higher
must have dwindled considerably. Thus, in practice, only a small subset
of all the possible separating sets can need to be considered.
[0162] A distinct type of constraint based learning procedures uses the
Markov blankets to restrict the subset of variables to test for
independence. Thus, when this knowledge can be available in advance, a
conditioning on all possible variables does not have to be tested. A
widely used, and efficient, procedure for Markov blanket discovery can be
IAMB. In it, for each variable X, a hypothesis set can be tracked. The
goal can be for H(X) to equal the Markov blanket of X, B(X), at the end
of the exemplary procedure. IAMB can consist of a forward and a backward
phase. During the forward phase, it can add all possible variables into
H(X) that could be in B(X). In the backward phase, it can eliminate all
the false positive variables from the hypotheses set, leaving the true
B(X). The forward phase can begin with an empty H(X) for each X.
Iteratively, variables with a strong association with X (e.g.,
conditioned on all the variables in H(X)) can be added to the hypotheses
set. This association can be measured by a variety of nonnegative
functions, such as mutual information. As H(X) grows large enough to
include B(X), the other variables in the network will have very little
association with X, conditioned on H(X). At this point, the forward phase
can be complete. The backward phase can start with H(X) that contains
B(X) and false positives, which can have little conditional association,
while true positives can associate strongly. Using this exemplary test,
the backward phase can remove the false positives iteratively until all
but the true positives can be eliminated.
Exemplary Score Based Approaches
[0163] This exemplary approach to structural learning seeks to maximize
the likelihood of a set of observed data. Since it can be assumed that
the data can be independent and identically distributed, the likelihood
of the data (.cndot.) can be simply the product of the probability of
each observation. That can be, for example:
L ( D ) = P ( d ) ##EQU00013##
for a set of observations D. Since it can be beneficial to infer a model
that best explains the observed data, the likelihood of observing the
data given a specific model can be defined as, for example:
LL ( , ) = d .dielect cons. D P ( d )
##EQU00014##
[0164] The actual likelihood may not be used in practice, as this quantity
can become very small, and impossible to represent in a computer.
Instead, the logarithm of the likelihood can be used for three reasons.
First, the log(.) function can be monotonic. Second, the values that the
loglikelihood takes do not cause the same numerical problems that
likelihood does. Third, it can be easy to compute because the log of a
product can be the sum of the logs (e.g., log(xy)=log x+log y), and the
likelihood for a Bayesian network can be a product of simple terms.
[0165] For example, there can be a problem in learning the network
structure by maximizing loglikelihood alone. In particular, for any
arbitrary set of data, the most likely graph can always be the fully
connected one (e.g., all edges can be present), since adding an edge can
only increase the likelihood of the data. To correct for this phenomenon,
loglikelihood can be supplemented with a regularization term that can
penalize the complexity of the exemplary model. There can be a plethora
of regularization terms, some based on information theory and others on
Bayesian statistics (see, e.g. Reference 57), which can serve to promote
sparsity in the learned graph structure, though different regularization
terms can be better suited for particular applications.
[0166] Additionally, in this exemplary case, a particularly relevant and
known score, the Bayesian Information Criterion ("BIC"), (see, e.g.,
Reference 50) can be described, which will be subsequently compared to
the performance of the exemplary approach.
[0167] BIC uses a score that can consist of a loglikelihood term and a
regularization term depending on a model and data , where, for example:
BIC ( , ) = LL ( , )  log m 2 dim
( ) ( 13 ) ##EQU00015##
[0168] Here, D can denote the data, m can denote the number of samples and
dim() can denote the number of parameters in the model. Because
dim(.cndot.) can depend on the number of parents each node has, it can be
a good metric for model complexity. Moreover, each edge added to tj can
increase model complexity. Thus, the regularization term based on dim(.)
can favor graphs with fewer edges and, more specifically, fewer parents
for each node. The term log m/2 essentially weighs the regularization
term. The effect can be that the higher the weight, the more sparsity
will be favored over "explaining" the data through maximum likelihood.
[0169] The likelihood can be implicitly weighted by the number of data
points, since each point can contribute to the score. As the sample size
can increase, both the weight of the regularization term and the "weight"
of the likelihood can increase. However, the weight of the likelihood can
increase faster than that of the regularization term. Thus, with more
data, likelihood can contribute more to the score, and the observations
can be trusted more, and can have less of a need for regularization.
Statistically speaking, BIC can be a consistent score. (See, e.g.,
Reference 50). In terms of structure learning, this observation can imply
that for sufficiently large sample sizes, the network with the maximum
BIC score can be Iequivalent to the true structure. Consequently, can
contain the same independence relations as those implied by the true
exemplary structure. As the independence relations can be encoded in the
edges of the graph, a Markovequivalent network can be learned, with the
same skeleton and the same vstructures as the true graph, though not
necessarily with the correct orientations for each edge.
Exemplary Learning Logically Constrained Networks
[0170] As discussed herein, it was noted that an important class of BNs
can capture common binary logical operators, such as , , and .sym..
Although the learning procedures mentioned above can be used to infer the
structure of such networks, some exemplary procedures can employ
knowledge of these logical constraints in the learning process.
[0171] A widely used approach to learn a monotonic cancer progression
network with a directed acyclic graph ("DAG") structure and conjunctive
events can be Conjunctive Bayesian Networks (see CBNs, (see, e.g.,
Reference 58)). This exemplary model can be a standard BN over Bernoulli
random variables with the constraint that the probability of a node X
taking the value 1 can be zero if at least one of its parents has value
0. This can define a conjunctive relationship, in that all the parents of
X must be 1 for X to possibly be 1. Thus, this model alone cannot
represent noise, which can be an essential part of any real data. In
response to this shortcoming, hidden CBNs, (see, e.g., Reference 59),
were developed by augmenting the set of variables: to each CBN variable
X, which can capture the "true" state, and can be assigned a
correspondence to a new variable Y that can represent the observed state.
Thus, each new variable Y can take the value of the corresponding
variable X with a high probability, and the opposite value with a low
probability. In this exemplary model, the variables X can be latent. For
example, they may not be present in the observed data, and have to be
inferred from the observed values for the new variables. Learning can be
performed via a maximum likelihood approach, and can be separated into
multiple iterations of two steps. First, the parameters for the current
hypothesized structure can be estimated using the
ExpectationMaximization procedure (see, e.g., Reference 60), and the
likelihood given those parameters can be computed. Second, the structure
can be perturbed using some hill climbing heuristic. A Simulated
Annealing procedure (see, e.g., Reference 61) can be used for this step.
These two steps can be repeated until the score converges. However, the
ExpectationMaximization procedure only guarantees convergence to a
likelihood local maximum and, thus, the overall exemplary procedure may
not be guaranteed to converge to the optimal structure.
[0172] Since CBNs can represent the current benchmark for the
reconstruction of cancer progression models from crosssectional genomic
data, their comparison with the exemplary approach can be informative.
Exemplary A Framework for Prima Facie Causation
[0173] For the sake of clarity, the exemplary procedures can include
successive steps of successively increasing complexity of the causal
formulas; for example, going from singlecause (e.g., "atomic") formulas,
to conjunctive formulas consisting of atomic events to formulas in
Conjunctive Normal Forms ("CNF") (e.g., [(`burning cigarette``dried
wood`)(`lightning``no rain`)`forest fire`]). The causal formulas can be
represented as a directed graph: G=V,E), where the nodes can be the
atomic events, and edges can be between an event that appears positively
as a literal in the formula describing the cause and an event that can be
its effect: .Ainverted.c,e.dielect cons.E, if c can be a literal in
.phi. and .phi.e.
[0174] Throughout the Specification "real world" can refer to the concrete
instance where data can be gathered (e.g., as opposed to the
counterfactual terminology of "possible worlds") and by "topology", a
combination of structural and quantitative probabilistic parameters.
Exemplary SingleCause Prima Facie Topologies
[0175] When at most a single incoming edge can be assigned to each event
(e.g., an event has at most one unique cause in the real world:
.Ainverted..Ebackward.ce), this can be called a causal structure
singlecause prima facie topology, a special and important case of the
most general prima facie topology causal structures. Note that the
general model can be represented as a DAG where each edge can be a prima
facie cause between a parent and its child. In the special case of the
singlecause prima facie topology, the causal graphs can be trees or,
more generally, forests when there can be disconnected components. Thus,
each progression tree subsumes a distribution of observing a subset of
the mutations in a cancer sample (see, e.g., Reference 62).
[0176] The following propositions (e.g., shown in exemplary graphs of
FIGS. 10A and 10B) were shown to hold for singlecause prima facie
topologies, and used to derive an procedure to infer tree (e.g., forests)
models of cancer progression based upon the Suppes definition. (See,
e.g., Reference 62). Examples of screeningoff and of background context
are shown in an exemplary diagram of FIG. 10A, which illustrate an
example of Reichenbach's screeningoff where c can be a genuine cause of
e and a can be a genuine cause of c, and the correlations between a and e
may only just manifestations of these known causal connections, and c can
be a common cause of both a and e. FIG. 10B illustrates an exemplary
diagram of Cartwright's background context.
[0177] FIG. 11 illustrates a diagram of exemplary (e.g., prima facie)
properties. For example, properties of Suppes definition of probabilistic
causation where c can be a prima facie cause of e if the cause can be a
probability raiser of e, and it can occur more frequently.
[0178] Statistical dependence. Whenever the PR holds between two events c
and e, then the events can be statistically dependent in a positive
sense, for example:
P(ec)>P(ec)P(ec)>(e)P(c). (14)
[0179] Mutuality. If c can be a probability raiser for e, then so can be
the converse, for example:
P(ec)>P(ec)P(ce)>P(C )
[0180] Natural ordering. For any two events c and e such that c can be a
probability raiser for e, a "natural" ordering arises to disentangle a
causality relation can be, for example:
P ( c ) > P ( e ) .revreaction. P ( e c )
P ( e c _ ) > P ( e c ) P ( c e _
) . ( 15 ) ##EQU00016##
[0181] Putting together all these exemplary properties, it can be natural
to derive the following equivalent characterization of Suppes Definition:
c can be said to be a prima facie cause of e if c can be a probability
raiser of e, and it occurs more frequently Thus, for example:
ceP(ec)>P(ec)P(c)>P(e). (16)
[0182] The assertion above restates that singlecauses, involving only
persistent events, can lead to a model of real world time (e.g., t.sub.c
and t.sub.e, in Suppes Definition), which can be consistently imputed to
the observed frequencies of events.
[0183] Consequent to this definition, it can be observed that it can be
necessary, but not sufficient to identify the causal real world processes
(e.g., path or branch) and, thus, to solve causality per se. In fact, as
it can be seen in the FIGS. 12A and 12B, arrows 1205 (e.g., consistently
in the real world and in the topology) make this definition necessary,
while arrows 1210 (e.g., spurious, resulting from transitivities, because
of the singlecause hypothesis) render the condition insufficient. Arrows
1210 can be present to indicate potential genuine causes corresponding to
real causes (e.g., which can be the case when observations can be
statistically significant for the real world). Thus, a correct
inferential procedure will have to select real causes among the potential
genuine ones, a subset of prima facie causes.
[0184] As discussed above, spurious causes can manifest through spurious
correlation or chance. In the infinite sample size limit the "law of
large numbers" can eliminate the effect of chance. In other words, with
large enough sample, chance by itself will not suffice to satisfy Suppes
Definition. The former situation for spuriousness can depend on the real
world topology, and can appear under observation like a
primafacie/genuine cause in disguise, even with an infinite sample size
(e.g., edges 1215), for which the "temporal direction" has no causal
interpretation, as it depends on the data and topology). For these
reasons, a singlecause prima facie topology asymptotically will not
contain false negatives (e.g., all real world causes can be in the
topology as Suppes Definition can be necessary) but it might contain,
depending on the real world topology, false positives (e.g., arrows 1210
and edges 1215, as Suppes Definition may not be sufficient).
Exemplary ConjunctiveCause Prima Facie Inference
[0185] A propositional formula composed of conjunctions of a set of
literals can be denoted by, for example: c=c.sub.1 . . . c.sub.n, which
can imply that n events c.sub.1, . . . , c.sub.n have occurred (e.g., in
some unspecified order) so as to collectively cause some effect e (e.g.,
shown as in FIG. 13), and it can be assumed that each c.sub.1
(1.ltoreq..ltoreq.n) can be an atomic event.
[0186] Suppes' notion of probabilistic causation (e.g., Suppes Definition)
can be naturally extended to conjunctive clauses as in the following
definition:
[0187] Definition 5 (Conjunctive Probabilistic Causation).
[0188] For any conjunctive event c=c.sub.1 . . . c.sub.n and e, occurring
respectively at times {t.sub.c.sub.11=1, . . . n} and t.sub.e, under the
mild assumptions that 0<P(c.sub.1),P(e)<1, for any i, the
conjunctive event c can be a prima facie conjunctive cause of e(ce) if
all of its components c.sub.i occur before the effect and their
occurrences collectively raises the probability of the effect as, for
example:
max{t.sub.c.sub.s, . . . t.sub.c.sub.n}<t.sub.e and P(ec)>P(ec)
(17)
[0189] where P(ec)=P(ec.sub.1 . . . c.sub.n) and
P(ec)=P(e)=P(ec.sub.1 . . . c.sub.n).
[0190] This extension follows the semantics of conjunctive connectives,
which states that all causes must occur before the effect, thus
justifying the choice of picking the latest event, in time, prior to e to
generalize Suppes Definition: namely, the max{.cndot.} operation applied
to the causal events. This definition retains the semantics of
singlecause prima facie unchanged, as it can be a special case with c=c
and max{t.sub.c.sub.1}=t.sub.0. Unfortunately, as before, it still has
the same weakness that it can be necessary, but not sufficient, to
identify conjunctivecausal relations, and hence lacks the power to
define causality per se.
[0191] The properties of singlecauses prima facie topologies extend
appropriately to conjunctive topologies.
[0192] Exemplary Proposition 1.
[0193] The properties of statistical dependence, mutuality and natural
ordering for singlecauses can still be valid for conjunctive clauses.
[0194] In this exemplary case, some caution can be exercised in
distinguishing between prima facie single or conjunctive causes. As shown
in FIG. 13, in fact, for a simple conjunctive clause in the real world
(e.g., a and b and c) the following conjunctive clauses
TABLEUS00002
a b d a c d b c d
as well as the single causes ad, bd and cd, can be prima facie. The
single causes can be spurious or transitive, as in FIG. 12. However,
spurious subformulas can be called the conjunctive clauses that can be
syntactically strictly subformulas of abcd, for example, the only
formula it can be beneficial to infer. As in branch processes,
topologydependent spurious causes may appear because of spurious
correlations. These causal relations can include general spurious
formulas constituting of a subformula and any of its parents. Similarly,
spurious causes due to chance can vanish asymptotically as sample size
grows to infinity. In summary, it can be noted that a conjunctive
topology, similarly to the singlecause framework, will not contain false
negatives (e.g., all real world causes in the topology) but it might
contain, depending on the real world topology, false positives (e.g.,
edges 1305, 1310 and 1315 of FIG. 13).
[0195] It can be noted that the total number of potential formulas and
transitivities can be exponential in the size of G=n, which can be, for
example:
i = 1 n  1 ( n  1 1 ) = 2 n  1 1.
##EQU00017##
[0196] This can be a lower bound accounting only for the level of the
connective, and can be expected to grow further when more complex real
world processes can be considered. Finally, as shown in FIGS. 12A and
12B, the number of spurious causes due to topology (e.g., edges 1215),
can be quadratic in the formula size, being, for example
2 ( n  1 2 ) = ( n  1 ) ( n  2 ) .
##EQU00018##
[0197] This complexity hints at the fact that an exhaustive search of all
the possible conjunctive formula may not be feasible, in general.
[0198] In order to generalization to formulas in conjunctive normal form
Next, consider a formula in conjunctive normal form ("CNF"), where, for
example:
.phi.=c.sub.1 . . . c.sub.n,
where each c.sub.i can be a disjunctive clause c.sub.1=c.sub.i,1 . . .
c.sub.i,k over a set of literals, each literal representing an event
(e.g., a Boolean variable) or its negation. By following the same
exemplary approach as used earlier to extend Suppes' Definition from
single to conjunctive clauses, .phi.e.
[0199] Exemplary Definition 6 (e.g., CNF Probabilistic Causation):
[0200] For any CNF formula .phi. and e, occurring respectively at times
t.sub..phi., and t.sub.e, under the mild assumptions that
0<P(.phi.),P(e)<1,.phi. can be a prima facie cause of e if, for
example:
t.sub..phi.<t.sub.e and P(e.phi.)>P(e.phi.). (18)
[0201] As described above, this definition subsumes Definition 5, and can
be necessary, but not sufficient, to identify causal relations, hence
lacking the power to solve causality per se.
[0202] In this exemplary case, the number of prima facie (e.g., including
both genuine and spurious) causes can grow combinatorially much more
rapidly than the simplest case of a unique conjunctive clause. This
situation can be rather alarming, since even the simplest case already
produces an exponentially large set of prima facie causes in terms of the
number of events. In this case, in fact, further causal relations can
emerge as a result of mixing events from all the clauses of .phi.. CNF
formulas follow analogous properties as single and conjunctive
topologies, as shown below.
[0203] The properties of statistical dependence, mutuality and natural
ordering for single and conjunctive prima facie topologies can extend to
CNF formulas mutatis mutandis. For illustrative purposes, consider the
formula (ab)cd, which can be in disjunctive normal form ("DNF"). If, for
example, the claim ad can be evaluated, the background context would be
the atomic event c, being bdependent when a causes d. A symmetric
situation holds, to evaluate bd. In light of this discussion, note that
if the formula to can be converted to its CNF analogue (ac)(bd)d, the
roles of subformulas ae and bc can be interpreted in identifying a
background context, c. It follows that, for any CNF formula, the atomic
events of all the disjunctive clauses in the equivalent DNF formula
provide all the possible background contexts alaCartwright.
[0204] The exemplary system, method, and computeraccessible medium,
according to an exemplary embodiment of the present disclosure, can
include timing in the real world. Consider the CNF formula above, and
denote it as cp, and recall that Definition 6 utilizes
t.sub.o<t.sub.d. One might wonder whether a trivial timeordering
relation exists, whose complexity can be linear with respect to all the
operators in .phi.. Were it so, .phi. can be parsed into its
constituents, and recursively express the temporal relations as a direct
function of those relations that hold for its subformulas.
Unfortunately, this appears not to be the case, except when the
underlying syntax can be restricted to certain specific operators (e.g.,
conjunctions). Thus appropriate care must be taken in implementing a
model of real world time. Thus, an exemplary procedure, working on the
illustrative example of the previous paragraph, cannot conclude any
ordering about and t.sub.d, solely by looking at the observed
probabilities of their atomic events instead it must gather the correct
information for certain subformulas at the level of their connective
(e.g., the V in this case). A general rule that avoids these
difficulties, and devises a correct and efficient timinginference
procedures, can be stated as follows: it can be safe to model
probabilistic causation in terms of whole formulas, while permitting
compositional reasoning over subformulas, only when the syntax can be
restricted to certain Boolean connectives.
Exemplary Inference Procedure
[0205] The exemplary structure of the reconstruction problem can be as
follows. Assume that there is a set G of n mutations (e.g., events, in
probabilistic terminology) and m samples, represented as a
crosssectional dataset, for example, without explicit timing
information, in an m.times.n binary matrix D.dielect
cons.{0,1}.sup.m.times.n in which an entry D.sub.k,l=1 if the mutation l
was observed in sample k, and 0 otherwise. Note that dataset lacking
explicit timing information can typically be, for instance, in cancer
patient data.
[0206] To introduce the exemplary system, method, and computeraccessible
medium, additional notations can be utilized: can denote the universe of
all possible causal claims .phi.e, where .phi. can be a CNF formula over
the events in D (e.g., G.OR right.) and e can be an atomic event. With
.OR right., it all the causal claims whose formulas can be conjunctive
over atomic events may not contain disjunctions. For a general CNF
formula .phi. it can be denoted by chunks (.phi.) its set of disjunctive
clauses. For example, abe.dielect cons. while (a)(cd)ef and chunks
((a)(cd)e)={(a), (cd), e}).
[0207] Inferred Structures.
[0208] The exemplary system, method, and computeraccessible medium,
according to an exemplary embodiment of the present disclosure, can
reconstruct a general DAG from the input data. It can share many
structural and procedureic properties with the Conjunctive Bayesian
Networks approach (see, e.g., Reference 58) especially in the context of
cancer progression models. However, the exemplary system, method, and
computeraccessible medium, according to an exemplary embodiment of the
present disclosure, can face no obstacle in spontaneously inferring from
the input data various substructures of a DAG, for example, forests or,
more specifically, trees although it has no "hardcoded" policies for
doing so. Thus, the exemplary system, method, and computeraccessible
medium, according to an exemplary embodiment of the present disclosure,
can be expected to be applicable in a contextagnostic manner, and can
compete well with other exemplary approaches, which may not be a priori
restricted from having advantageous structural information, (See, e.g.,
References 6265).
[0209] The exemplary DAGs can build on arbitrary CNF formulas, using the
strategy that disjunctive clauses can be first summarized by unique DAG
nodes. As an example, a formula (ab)cd will be modeled with three nodes:
one for (ab), the aggregated disjunction, one for c and one for d. The
reasons disjunctions may not be handled are discussed below.
[0210] In the following, a progression DAG can be denoted as =(N, .pi.)
where N.OR right. can be the set of nodes (e.g., mutations or formulas)
and .pi.:N.fwdarw.(N): can be a function associating to each node j its
parents .pi.(I). This exemplary model can yield the following.
[0211] Exemplary Definition 4 (e.g., DAG Causal Claims).
[0212] A=(N, .pi.) models the causal claims
= j .dielect cons. N { ( c 1 c n )
1 .pi. ( 1 ) = { c 1 , , c n } } ,
##EQU00019##
where c.sub.1 . . . c.sub.n can be a CNF formula and any c.sub.j can
either be a ground event or a disjunction of events.
[0213] Going back to the example above, in the exemplary DAG there can be
.pi.(l)={(ab), c, d} whose underlying causal claim would be (ab)cd.
[0214] Each DAG can be augmented with a labeling function
.varies.:N.fwdarw.[0.1] such that .varies.(1) can be the independent
probability of observing mutation i in a sample, whenever all of its
parent mutations can be observed (e.g., if any). Each DAG can induce a
distribution of observing a subset of events in a set of samples (e.g., a
probability of observing a certain mutational profile, as defined below.
[0215] Exemplary Definition 5 (e.g., DAGInduced Distribution).
[0216] Let be a DAG and .varies.:N.fwdarw.[0,1] a labeling function,
generates a distribution where the probability of observing N*.OR right.N
events can be, for example:
P ( N .rarw. ) = n .dielect cons. N .varies. ( x
) y .dielect cons. N / M * [ 1  .varies. ( y ) ]
( 19 ) ##EQU00020##
whenever .varies..dielect cons., .pi.(x).OR right., and 0 otherwise.
[0217] Notice that this definition, as expected, can be equivalent to the
previouslyused definitions (see, e.g., Reference 58), and can retain a
treeinduced distribution. (See, e.g., References 62, 63 and 65).
Further, notice that a sample which contains an event but not all of its
parents, can have a zero probability, thus subsuming the conjunctive
interpretation of the exemplary DAGs. These types of samples, which can
represent "irregularities" with respect to D, might be generated when
adding false positives/negatives to the sampling strategy. Finally,
because nodes can be disjunctive formulas can extend this exemplary DAG
definition to express causal claims with generic CNF formulas.
[0218] Inference confidence: bootstrap and statistical testing. A
statistical foundation to the exemplary inferences can be provided, which
employ such classical techniques as bootstrap (see, e.g., References 66
and 67), and the MannWhitney U test. (See, e.g., Reference 68).
[0219] In data preprocessing bootstrap with rejection resampling can be
used. This can be used to estimate a distribution of the marginal and
joint probabilities, where for each event: (i) repetitions rows can be
sampled from the input matrix D (e.g., bootstrapped dataset), (ii) the
distributions can be estimated from the observed probabilities, and (iii)
values which do not satisfy 0<P(1)<1 and P(11)<1P(11)<1 can
be rejected, which can be iterated restarting from (i). This can conclude
when there are at least about 100 values.
[0220] Any inequality (e.g., checking temporal priority and probability
raising) can be estimated as follows: the MannWhitney U test with
pvalues set to 0.05 can be performed. This can be a nonparametric test
of the null hypothesis that two populations can be the same against an
alternative hypothesis, and can be especially useful to understand
whether a particular population, for example P(i), tends to assume larger
values than the other, for example, P(j). By employing this exemplary
test, which does not need to assume Gaussian distributions for the
populations, confidence pvalues for both temporal priority and
probability raising can be computed.
[0221] Once a DAG model can be inferred with the exemplary system, method,
and computeraccessible medium, both parametric and nonparametric
bootstrapping methods can be used to assign a confidence level to its
respective claims, and ultimately, to the overall exemplary causal model.
These tests can consist of using the reconstructed model (e.g., in the
parametric case), or the probabilities observed in the dataset (e.g., in
the nonparametric case) to generate new synthetic datasets, which can
then be reused for reconstructing of the progressions (see, e.g.,
Reference 67). The confidence can be given by the number of times the
DAG, or any of its claims can be reconstructed from the generated data.
Exemplary CAPRI: A Hybrid Procedure for General CNF Formulas
[0222] Building upon the framework presented above, the exemplary system,
method, and computeraccessible medium, according to an exemplary
embodiment of the present disclosure, can be used to infer cancer
progression models from crosssectional data. The exemplary procedure can
be hybrid in the sense that it can combine a structurebased approach
(e.g., as of Definition 6) with a likelihoodfit constraint and,
according to its input, can infer causal claims with various logical
expressivity. Its computational complexity, which can be highly dependent
on the expressivity of the claims, as well as its correctness are
discussed below.
[0223] CAncer PRogression Inference (e.g., CAPRI) can utilize its input, a
matrix D and, optionally, a set of k input causal claims
.PHI.)={.phi..sub.1e.sub.1, . . . .phi..sub.ke.sub.k}, where each
.phi..sub.i can be a CNF formula and .phi.e. Here can represent the usual
syntactical ordering relation among atomic events and formulas, for
example, a.OR right.(ab)cd, and can be simply utilized to disallow
malformed input claims, which would vacuously be labeled as prima facie
causality (e.g., as of Definition 6), but would have no real causal
meaning. For example, in the example above, it makes no sense to say that
"a causes (ab)cd." The augmented input .PHI., which can contain claims of
the most complex type CAPRI can infer, can be optional in the sense that,
if .PHI.==0, the exemplary system, method, and computeraccessible medium
can be able to infer "all" conjunctive causal claims over atomic events
(e.g., claims cbce in ), but not general CNF ones.
[0224] CAPRI can begin performing a lifting operation over D, and then
build a DAG D. Lifting operation can evaluate each CNF formula
.phi..sub.i for all input causal claims in .PHI. and its result, a lifted
D, can be an extended input matrix for the exemplary system, method, and
computer accessible medium. As an example, consider a claim
.PHI.={(a)(cd)ef}, the result of lifting for an input matrix D over a, .
. . , f can be
D = [ a b c d e f 1 1 1 1 0 1 0 0 0
0 1 0 1 0 1 0 0 0 1 1 0 1 1 1 1 0
1 1 0 0 ] , D ( .PHI. ) = [ a b c d
e f .PHI. 1 1 1 1 0 1 0 0 0 0 0 1 0 0
1 0 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1
1 0 0 0 ] , ##EQU00021##
since .phi.=(a)(cd)e and, for example, (1 )(1U).ident.U. After the
lifting, can be built by individually including in its set of nodes all
the disjunctive subformulas of such CNF formulas, plus G. In the
preceding example, {(ab), (cd), e} can be nodes in (note that e.dielect
cons.G). Notice that D(.PHI.)=D and N=G if .PHI.=O.
[0225] Subsequently, the parent function (e.g., the edges in ) can be
built by pairwise implementation of exemplary Definition 6, which has
been shown to subsume also Suppes Definition and exemplary Definition 5.
For the sake of simpler exposition, the coefficients .GAMMA..sub.i,j and
.sub.i,j can be used to evaluate temporal priority and probability
raising, respectively, which can be needed to be strictly positive by
exemplary Definition 6. Two cases can be distinguished: (i) when a causal
claim directly involving an atomic event can be evaluated, or (ii) a
chunk of an input formula. When a claim "i causes j" can be evaluated,
and 1.dielect cons.G, it can be beneficial that exemplary Definition 6
can be satisfied. If so i can be a prima facie cause of j and it can be
added to .pi.(). When the same is performed for an input formula .phi.,
if it can be prima facie for an event j, which can add .phi. via all its
constituting chunks to .pi.(l). This can be needed because the DAG can
be built by chunking input formulas, while the lifting operation can be
performed on whole formulas; in reference to the examples above, when
.phi. can be prima facie to f, (ab), (cd) and e to .pi.(f) can be added.
Moreover, since claims with the rightmost part an atomic event can be of
interest, .pi.(l)=O for any .dielect cons.G. In case of the preceding
input, for instance, any incoming edge in (a) and (cd) does not need to
be considered, while edges incoming in e solely from an atomic event can
be considered. As for labeling, note that no label can be assigned to
this kind of nodes. Further, since this construction can be consistent
with the exemplary approach and the conjunctive interpretation of , once
the steps defined in Eqs. 20 and 21 have been performed, can be a prima
facie DAG.
[0226] As prima facie causality can provide only a necessary condition,
filtering out all spurious causes that might have been included in D can
be performed. The underlying intuition can be as follows. For any prima
facie structure, spurious claims can be contribute to reduce the
likelihoodfit relative to true claims, and thus a standard
maximumlikelihood fit can be used to select and prune the prima facie
DAG. Based on all the discussion made above, a regularization term can be
necessary to avoid overfitting. For example, if simple loglikelihood
were used, it can be expected that the best model can actually be the
prima facie structure. For this reason, the regularization score
discussed above can be adopted; namely Bayesian Information Criterion
("BIC"), which can implement Occam's razor by combining loglikelihood
fit with a penalty criterion proportional to the log of the DAG size via
Schwarz Information Criterion. (See, e.g., Reference 69).
[0227] With 4)=0 only conjunctive causal claims in C can be inferred by
the exemplary procedure, since the set of nodes of can be N=G. Analysis
of complexity, correctness and expressivity of CAPRI can now be
presented.
Exemplary Complexity, Correctness and Expressivity of CAPRI
[0228] Exemplary Complexity. The previous sections have stressed the
rapidity with which the set of causal claims (e.g., or formulas) grow for
a given model. Thus making their inference highly intractable. However,
this complexity can be intrinsic to the problem. Or put alternatively, it
can be independent of the underlying theory of causation. Unlike the
heuristic approaches commonly used by many others to infer general causal
claims, the exemplary system, method, and computeraccessible medium,
according to an exemplary embodiment of the present disclosure and
incorporate a twofold approach. To infer simple claims (e.g., single or
conjunctive causes, at most), the exemplary CAPRI's execution can be
selfcontained (e.g., no input besides D can be required) and polynomial
in the size of D. Instead, the number of inferable general causal claims
(e.g., CNF) can be limited, by requiring that they be specified as an
input to the exemplary system, method, and computeraccessible medium,
according to an exemplary embodiment of the present disclosure in .PHI..
In this case the exemplary CAPRI tests, with a polynomial cost, those
claims plus the simple ones, and its complexity spans over many orders of
magnitude according to the structural complexity of the input set .PHI.,
as further elaborated in the following theorem.
[0229] Exemplary Theorem 3 (Asymptotic Complexity):
[0230] Let G=n and D.dielect cons.{0,1}.sup.m.times.n where in
m>>n, and let N the nodes in the DAG returned by CAPRI, the worst
case time and space complexity of building a prima facie topology can be,
ignoring the cost of bootstrap, for example:
TABLEUS00003
Algorithm 1 CAncer PRogression Inference (CAPRI)
1: Input: A set of events G = {g.sub.1, . . . , g.sub.n}, an m .times. n
matrix D .epsilon. {0,1}.sup.m.times.n and k CNF causal
claims .PHI. = {.phi..sub.1 e.sub.1, . . . , .phi..sub.k
e.sub.k}where, for any i, e.sub.i .phi..sub.i and e.sub.i .epsilon. G;
2: [Lifting] Define the lifting of D to D(.PHI.) as the augmented matrix
D ( .PHI. ) = D 1 , 1 D 1 , n .PHI. 1 (
D 1 , . ) .PHI. k ( D 1 , . )
D m , 1 D m , n .PHI. 1 ( D m , . )
.PHI. k ( D m , . ) . ##EQU00022## (9)
by adding a column for each .phi..sub.i c.sub.i .epsilon. .PHI., with
.phi..sub.i evaluated rowbyrow, define the coefficients
.GAMMA..sub.i,j = (i)  (j), and .LAMBDA..sub.i,j = (j  i)  (j 
), (10)
pairwise over D(.PHI.);
3: [DAG structure] Define a DAG = (N, .pi.) where
N = G ( .PHI. i chunks ( .PHI. i ) ) , .pi.
( j G ) = .0. ; ##EQU00023##
.pi.(j .epsilon. G) = {i .epsilon. G  .GAMMA..sub.i,j .LAMBDA..sub.i,j
> 0} .orgate. {chunks (.phi.)  .GAMMA..sub..phi., j
.LAMBDA..sub..phi.,j > 0, .phi. j .epsilon. .PHI.}. (11)
4: [DAG labeling] Define the labeling .alpha. as follows
.alpha. ( j ) = { ( j ) , if .pi. (
j ) = .0. and j .dielect cons. G ; (
j  i 1 i n ) , if .pi. ( j ) = { i
1 , , i n } . ##EQU00024##
5: [Likelihood fit] Filter out all spurious causes from by likelihood
fit with the regularization
BIC score and set .alpha.(j) = 0 for each removed connection.
6: Output: the DAG and .alpha.;
.crclbar.(mn) time and .crclbar.(n.sup.2) space, if .PHI. = ;
.crclbar.(.PHI.mn) time and .crclbar.(.PHI.m) space, if .PHI. .OR
right. and N << m (i.e., there are sufficiently many samples to
characterize the input formulas);
(2.sup.2.sup.n) time and space, if .PHI. =
[0231] As shown above, the procedureic complexity can span over many
orders of magnitude according to the structural complexity of the input
set 1) which can determine the number of nodes in the returned DAG, for
example, N. Hence, aside from the cost of likelihood fit, the cost of
the procedure can be polynomial only if .PHI. can be polynomial in the
number of input samples and atomic events. This observation forewarns one
of the hazard of a brute force approach, which attempts to test all
possible causal claims. Generally speaking, despite the price of possibly
"missing" some real causal claims, one should be able to identify most
relevant causal structures by exploiting domainknowledge, biological
priors, and empirical/statistical estimations in selecting reasonable
input .PHI. (e.g., focusing on certain key drivermutations over the
others). Note that this problem's inherent computational intractability
does not negate the power of the procedureic automation, relative to what
can be achievable with manual analysis.
[0232] Exemplary Correctness and Expressivity.
[0233] Let .OR right. be the set of true causal claims in the real world,
which can be inferred (e.g., in the tests of the exemplary procedure on
synthetic data, W can be known, once a DAG to generate its input data can
be fixed). Here, the relation between and the set of causal claims
retrieved by the exemplary procedure can be investigated as a function of
sample size m and the presence of false positives/negatives which can be
assumed to occur at rates .dielect cons..sub.+ and .dielect
cons..sub..
[0234] Below .SIGMA. denotes the set of causal relations, implicit in the
DAG returned by the exemplary procedure for an input set .PHI. and a
matrix D; this can be written as D(.PHI.).SIGMA.. Such claims can be
evaluated as in exemplary Definition 7. The following can be proved.
[0235] Exemplary Theorem 4 (Soundness and Completeness).
[0236] When the sample size m.fwdarw..infin. and the data can be uniformly
affected by false positives and negatives rates
.epsilon.  ? .epsilon. + .dielect cons. [ 0 , 1 )
, ? indicates text missing or illegible when filed
##EQU00025##
if the input given can be a superset of the true causal claims, then the
exemplary CAPRI can reconstruct exactly the true causal formulas , that
can be, if .OR right..PHI. then D(.PHI.).andgate..PHI..
[0237] Notice that if it could be assumed that .PHI. characterizes well,
then all real causal claims can be in .PHI., and the corollaries below
follow immediately.
[0238] Exemplary Corollary 1 (Exhaustivity).
[0239] Under the hypothesis of the above theorem D().
[0240] Exemplary Corollary 2 (Least Fixed Point).
[0241] can be the lfp of the monotonic transformation as, for example:
.PHI. D ( .PHI. ) .ident. D ( .PHI. .PHI. )
III  W ##EQU00026##
[0242] Since a direct application of this exemplary theorem can incur a
prohibitive computational cost, it only serves to idealize the ultimate
power of the exemplary system, method, and computeraccessible medium.
That can be, the theorem only states that the exemplary CAPRI can be able
to select only the true causal claims asymptotically, as the size of
grows, albeit exponentially. It can also clarify that the exemplary
system, method, and computeraccessible medium, according to an exemplary
embodiment of the present disclosure can "filter out" all the spurious
causal claims (e.g., true negatives), and produce the true positives from
the set of the genuine causal claims more and more reliably as a function
of the computational and data resources.
[0243] Now the exemplary attention can be restricted to conjunctive
clauses in Cfor example, those formulas which can be defined only on
atomic eventsso as to enable a fair comparison. (See, e.g., Reference
58).
[0244] Exemplary Theorem 5 (Inference of Conjunctive Clauses).
[0245] Let .PHI.=O; as before, when the sample size m.fwdarw..infin. and
the data can be uniformly affected by false positives and negatives rates
e.sub.=e+E[0,1), then only conjunctive clauses on atomic events can be
inferred, which can either be true or spurious for general CNF formulas.
That can be: if D(O).SIGMA. then .SIGMA..OR right.. Furthermore, [0246]
1. .SIGMA..andgate. can be true claims and [0247] 2. for any other claim
.varies.e.dielect cons.(.SIGMA./.SIGMA..andgate.) there exist
.beta.e.dielect cons.\ such that .beta. screens off .varies. from e.
[0248] This exemplary theorem states that even if one can be neither
willing to pay the cost of augmenting the input set of formulas nor can a
suitable formula be found to augment, the exemplary system, method, and
computeraccessible medium can still be capable of inferring conjunctive
clauses, whose members can be either genuine or a conjunctive subformula
of a more complex genuine CNF formula .beta. (e.g., regardless of whether
a cause of the second kind can be considered to be spurious).
[0249] An immediate corollary of these two exemplary theorems can be that
the exemplary system, method, and computeraccessible medium works
correctly, when it can be fed with all possible conjunctive formulas.
[0250] Exemplary Corollary 3.
[0251] Under the hypothesis of the above theorems, D(O.SIGMA.D(.SIGMA..
[0252] In practice, though still exponential, the exemplary system,
method, and computeraccessible medium, according to an exemplary
embodiment of the present disclosure, can be less computationally
intensive, when using than with , as it can trade off computational
complexity against expressivity of the inferred causal claims.
[0253] In the context of automatic inference of logical formulas
expressivity of the inferred claims relates to compositional inference.
In particular, it can be easy to see that for a disjunctive formula
c.sub.1 . . . c.sub.n, the following holds where, for example:
c.sub.1 . . . c.sub.ne.Ainverted..sub.cc.sub.le,
which can be the reason why full CNF formulas cannot compositionally
inferred by reasoning over their constituents (e.g., any might not
satisfy the prima facie definition on its own). Thus, the hypothesis set
.PHI. can be relied upon, unless one could assume to know a priori the
formulas and hence the background contexts (e.g., any other c.sub.j, for
.noteq.i), which poses a circularity issue. An instance of this
constraint can be of particular importance with respect to cancer. For
example, in modeling synthetic lethality (see FIG. 14) which can be
expressed as c.sub.1.sym.c.sub.2e where
c.sub.1.sym.c.sub.2=(c.sub.1c.sub.2)(c.sub.1c.sub.2).
[0254] In particular, FIG. 14 illustrates exemplary diagrams providing
caveats in inferring synthetic lethality relations. For a synthetic
lethality causal relation among a and b towards c, if one considers a
dataset of aggregated samples, the risk of misleading the temporal
priority relation among a, b and c can be high. If one were to know, a
priori, that a.RTM. b is part of the claim, one could separate data and
work safely. Unfortunately, being unknown a priori, only domain
knowledge, biological priors or hypothesis testing can be relied upon.
[0255] The exemplary system, method, and computeraccessible medium,
according to an exemplary embodiment of the present disclosure, can be
applied to infer tree or forest models of progression, and can be
evaluated empirically against other approaches in the literature which
can be specifically tailored for tree/forests. (See, e.g., References 62,
63 and 65). All these exemplary approaches can have the same quadratic
complexity (e.g., in the number of events in G and, just as with the
exemplary CAPRI, can be shown to converge asymptotically to the correct
tree, even in the presence of noisy observations. Despite asymptotic
equivalence, the exemplary procedures can differ in performance under
various settings of finite data (e.g., usually, synthetic), as previously
described. The simpler procedure, CAncer PRogression Extraction with
Single Edges (e.g., CAPRESE, (see, e.g., Reference 62), can differ from
CAPRI, as it relies on a score based on probability raising with a
shrinkage estimator, which can intuitively correct for the sample size
and noise. (See, e.g., References 66 and 67).
Exemplary Results: Synthetic Data
[0256] A general pipeline for CAPRI's usage is depicted in a diagram of
FIG. 15. CAPRI can be implemented in the open source R package TRONCO
(e.g., second version, available at standard R repositories). The
pipeline can start with data gathering 1505, either experimentally or via
shared repositories, and genomic analysis to create, for example, somatic
mutation or CopyNumber Variations profiles for each sample. Then, events
can be selected (element 1510) via statistical analysis and biological
priors, to construct a suitable input data matrix D which can satisfy
CAPRI's assumptions. Hypothesis 1515 of any causal claim can then be
generated, based on prior knowledge. CAPRI (element 1520) can then be
executed, which can result in pvalues for temporal priority and
probability raising to be returned, along with the inferred progression
model. Validation (element 1525) concludes the pipeline.
[0257] The performance of all the procedures were assessed with four
different types of topologies: (i) trees, (ii) forests, (iii) DAGs
without disconnected components and (iv) DAGs with disconnected
components. Irrespective of the topology considered, atomic events were
used, which can imply that the kind of causal claims that can be
experimented with, can either be single or conjunctive. Based on
exemplary Corollary 3, it sufficed to run CAPRI with .PHI.=0. This can be
consistent with the fact that the exemplary procedure can infer more
general formulas if an input "set of putative causes, .PHI.=O" can be
given in additiona fact which could have biased the exemplary analysis
in the exemplary favor in the more general situation. For the sake of
completeness, however, specific CNF formulas were also tested, as shown
below.
[0258] Type (iii) topologies can be DAGs constrained to have nodes with a
unique parent; condition (i) further restricts such DAGs to have no
disconnected components, meaning that all nodes can be reachable from a
starting root r. Practically, condition (i) satisfies .pi.(l)=1 for
.noteq.r, and .pi.(r)=O, while in (ii) can be presented. This kind of
topology can be either reconstructed with adhoc procedures (see, e.g.,
References 62, 63 and 65) or general DAGinference techniques. (See,
e.g., References 55, 56, 58, 69 and 70). Type (iiiiv) topologies can be
DAGs which have either a unique starting node r, or a set of independent
subDAGs. Similarly, condition (iii) satisfies .pi.(l).gtoreq.1 for
.noteq.r, and .pi.(r)= , while in (iv) can be facilitated to be present,
as it was in condition (ii). This kind of topology may not be
reconstructed with treespecific procedures, and thus only certain
procedures could be used for comparison. (See e.g., References 55, 56,
58, 69 and 70).
[0259] The selection of these different type of topologies may not be a
mere technical exercise, but rather it can be motivated, in the exemplary
application of primary interest, by heterogeneity of cancer cell types
and possibility of multiple cells of origin. In particular, type (ii)
with respect to (i) and type (iv) with respect to (iii), can be attempts
at modeling independent progressions of a cancer via multiple roots.
Clearly, these variations confound the inference problem further, since
samples generated from such topologies will likely contain sets of
mutations that can be correlated but can be pairwise causally
irrelevanta wellstudied and widely discussed problem. Finally, note
that, to generate synthetic data according to (iiv), the constraints on
.pi.(.cndot.) can be straightforwardly applied to the exemplary, system,
method, and computeraccessible medium.
[0260] Exemplary Generating Synthetic Data.
[0261] Let n be the number of events to include in a DAG and let
p.sub.min=0.05=1p.sub.max, a DAG without disconnected components (e.g.,
an instance of type (iii) topology), maximum depth log n and where each
node has at most w* parents (e.g., .pi.(l)<wfor .noteq.r) can be
generated as follows:
TABLEUS00004
1: pick an event r .epsilon. G as the root of the DAG;
2: assign to each j .noteq. r an integer in the interval [2, [log n] ]
representing its depth in the DAG (e.g., 1 can be reserved for r),
ensure that each level has at least one event;
3: for all events j .noteq. r do
4: let 1 be the level assigned to e;
5: pick .pi.(1) uniformly over (0, w*], and accordingly define .pi.(1)
with events selected among those at which level 1  1 was assigned;
6: end for
7: assign .varies.(r) a random value in the interval
[p.sub.min,p.sub.max];
8: for all events j .noteq. r do
9: let y be a random value in the interval [p.sub.min,p.sub.max], assign
.degree.(f)_y H a(x)
.varies. ( i ) = y x .dielect cons. n ( i )
.varies. ( x ) ##EQU00027##
10: end for
11: return the generated DAG;
[0262] When an instance of type (iv) topology can be generated, the above
exemplary procedure can be repeated to create its constituent DAGs. In
this case, if multiple DAGs can be generated, each one with randomly
sampled n.sub.i events, it can be beneficial that
G = n ? = n . ? indicates text missing
or illegible when filed ##EQU00028##
When instances of type (i) topology can be needed where w*=1, and by
iterating multiple independent sampling instances of type (ii) topology
can be generated. When required DAGs were sampled, these can be used to
generate an instance of the input matrix D for the exemplary
reconstruction procedures.
[0263] To account for noise in the data, a parameter v.dielect cons.(0,1)
can be introduced, which can represent the probability of each entry to
be random in D, thus representing a false positive .dielect cons..sub.+
and a false negative rate .dielect cons..sub. where, for example:
.dielect cons. + = .dielect cons.  = v 2 .
##EQU00029##
[0264] Exemplary Performance Measures.
[0265] Synthetic data was used to evaluate the performance of the
exemplary CAPRI as a function of dataset size, .dielect cons..sub.+ and
.dielect cons..sub..
[0266] In general, since the exemplary interest lies primarily in the
causal structure underlying the progressive phenomenon of cancer
evolution, it can be beneficial to measure the number of genuine claims
inferred (e.g., true positives, TP), and the number of unidentified
spurious causes (e.g., false positives, FP). Similarly, false negative
("FN") can be called a genuine cause that fail to recognize as causal and
true negative, and ("TN") can be a cause correctly identified as
spurious. With these measures we evaluated the rates of precision and
recall as follows:
precision = T P T P + F P '
and recall = T P T P + F N
##EQU00030##
[0267] The overall structural performance was measured in terms of the
Hamming Distance ("HD"), (see, e.g., Reference 71), the minimumcost
sequence of node edit operations (e.g., deletion and insertion) that can
transform the reconstructed topology into the true ones (e.g., those
generating data). This exemplary measure corresponds to just the sum of
false positives and false negative and, for a set of n events, can be
bounded above by n(n1) when the reconstructed topology contains all the
false negatives and positives.
[0268] To estimate reliable statistics, the following exemplary approach
can be used to assess the results. For each type of topology that can be
considered, about 100 distinct progression models can be generated, and
for each value of sample size and noise rate, about 10 datasets from each
topology can be sampled. Thus, every performance entry (e.g., Hamming,
precision or recall) can be the average of about 1000 reconstruction
results. This can be the setting used in most cases, unless differently
specified.
Exemplary Performance with Different Topologies and Small Datasets
[0269] The performance of CAPRI can be estimated for datasets with sizes
that can be likely to be found in currently available cancer databases,
such as The Cancer Genome Atlas, TCGA (see, e.g., Reference 72), for
example, m.apprxeq.250 samples, and about 15 events. The results are
shown in FIG. 16, for topologies (i) and (ii), and FIG. 17, for
topologies (iii) and (iv). There, all the results obtained by running the
procedure with bootstrap resampling are shown, although results without
this preprocessing leave the conclusions unchanged.
[0270] Results suggest a trend that can be expected, which can be that
performance degrades as noise increases and sample size diminishes.
However, it can be particularly interesting to notice that, in various
settings, the exemplary CAPRI almost converges to a perfect score even
with these small datasets. This happens for instance with type (iii)
topologies, where the Hamming distance almost drops to about 0 for
m.gtoreq.150. In general, reconstructing forests can be easier than
trees, when the same number of events n can be considered. This can be a
consequence of the fact that, once n can be fixed, forests can be likely
to have less branches since every tree in the forest has less nodes. When
reconstructing type (iiiiv) topologies, instead, the convergencespeed
of CAPRI to lower Hamming distance can be slower, as one might reasonably
expect. In fact, in those settings the distance never drops below about
3, and more samples would be required to get a perfect score. This can be
considered to be a remarkable result, when compared to the worstcase
Hamming distance value of 1514=210. FIG. 17 also suggest that
disconnected DAGs can be easier to reconstruct than connected ones, when
a fixed number of events can be considered. Similarly to the above, this
could be credited to the fact that the size of the conjunctive claims can
generally be smaller, for fixed n. With respect to the precision and
recall scores, it can be noted that the exemplary CAPRI can be robust to
noise, since the loss in the scorevalues appear nearly unaffected by any
increase in the noise parameter.
Exemplary Comparison with Other Reconstruction Techniques
[0271] For the following exemplary comparison, the following categories
can be used:
[0272] Exemplary Structural approaches include such procedures as
Incremental Association Markov Blanket ("IAMB") and the PC procedure,
both subjected to loglikelihood maximization; (See, e.g., References 55
and 56).
[0273] Exemplary Likelihood: approaches encompass various
maximumlikelihood approaches constrained by either the Bayesian
Dirichlet with likelihood equivalence ("BDE") or the Bayesian Information
Criterion ("BIC") scores; (See, e.g., References 69 and 70).
[0274] Exemplary Hybrid: approaches can be mixed approaches as exemplified
by hidden Conjunctive Bayesian Networks ("CBN"), and Cancer Progression
Inference with Single Edges (e.g., CAPRESE) which can be applied only to
trees and forests. (See, e.g., References 58 and 62).
[0275] For all the exemplary procedures, their standard R implementations
can be used, which can be, for example: for IAMB, BDE and BIC package
bnlearn can be used, for the PC procedure package pcalg can be used, for
CAPRESE TRONCO (e.g., first release) can be used, and for CBN hcbn can
be used. (See, e.g., References 7375). Other exemplary procedures exist,
but those which satisfied at least one of the following exemplary
criteria were selected: they seemed more effective in inferring causal
claims (e.g., IAMB and PC), they regularize the Bayesian overfit (e.g.,
BDE and BIC), they assume a prior (e.g., BDE) or they were developed
specifically for cancer progression inference (e.g., CBN and CAPRESE).
[0276] Notice that all the exemplary procedures capable of inferring
generic DAGs but CAPRESE (see, e.g., Reference 62) were selected, which
can only be applied to infer trees or forests (e.g., type (iii)
topologies). There exist other approaches specifically tailored for such
topologies, (see, e.g., References 63 and 65), however since (see, e.g.,
Reference 62) it can be shown that CAPRESE can be better than other
exemplary approaches. CAPRI can be considered to be in the Hybrid
category, though its performance with all the other approaches was
compared, with the aim of investigating which approach can be more
suitable to reconstruct the topologies defined above.
[0277] The general trend is summarized in graphs of FIG. 18, where these
exemplary procedures were ranked according to the median performance they
achieve, as a function of noise and sample size, and provide the
parameters used for comparison. In FIG. 19, CAPRI can be compared with
the structural approaches (e.g., IAMB and PC). In FIG. 20, it can be
compared with the likelihood approaches (e.g., BIC and BDE) and, in
graphs of FIG. 21, it can be compared with the hybrid ones. Because of
the high computational cost of running CBNs the number of ensembles
performed can be about 100 for CBNs, while it can be about 1000 for all
other exemplary procedures. While this strategy provides less robust
statistics for CBNs (e.g., less "smooth" performance surfaces), it can be
sufficiently accurate to indicate the general comparative trends and
relative performance efficiency.
Exemplary Reconstruction without Hypotheses: Disjunctive Causal Claims
[0278] For example, the exemplary procedure expects, as an input, all the
hypothesized causal claims to infer more expressive logical formulas, for
example, claims with pure CNF formulas or even disjunctive claims over
atomic events. Nonetheless, it can be instructive to investigate its
performance in two specific cases: namely, (i) without hypotheses
(.PHI.=0) and (ii) for datasets sampled from topologies with disjunctive
causal claims.
[0279] To generate the input dataset, the exemplary generative procedure
used for the other tests can be modified to reflect the switch from
conjunctive to disjunctive causal claims. This task can be simple, since
the labeling function a can be changed to account for the probability of
picking any subset of the clauses in the disjunctive claim, and not
picking the others. The exemplary DAGs can be used with about 10 events,
and disjunctive causal claims with at most about 3 atomic events
involved, which can be a reasonable size of a disjunctive claim, given
the events considered. This exemplary setting can generally be harder
than the one shown in FIGS. 1921. Thus the performance can be expected
to be somewhat inferior.
[0280] The exemplary CAPRI can be compared with other exemplary procedures
used so far, the results of which are shown in FIG. 22, where .PHI.=0, as
noted earlier. The graphs confirm the trends suggested by previous
analyses: namely, CAPRI can infer the correct disjunctive claims more
often than the others. Note also that the performance can be measured on
the reconstructed topology only, since, without input hypotheses, the
exemplary procedure can evaluate only conjunctive claims, and does not
facilitate different types of relations (e.g., disjunction) to be
inferred automatically. However, observed performance improvement can be
much lower, and the Hamming distance can fail to rise above about 4.
Furthermore, convergence to optimal performance was not observed for
m.ltoreq.1000, and it appears not to be reachable even for m>>1000
(e.g., at least, when no hypotheses can be used). It can also be possible
that, as n and the number of maximum disjunctive clauses increase, the
result could be an even less satisfactory speed of convergence.
Exemplary Reconstruction with Hypotheses: Synthetic Lethality
[0281] Whether the exemplary CAPRI can infer synthetic lethality
relations, when these can be directly hypothesized in the input set .PHI.
can be considered. This can be confirmed with a test of the simplest
form, for example:
a.sym.bc,
for a set of events G={a, b, c} where progression can be forced from a to
c to be preferential, for example, it appears with about a 0.7
probability while b to c does so with only about a 0.3 probability.
Despite this being the smallest possible causal claim, the goal was to
estimate the probability of such a claim being robustly inferable, when
.PHI.=[a.sym.bc), and its dependence on the sample size and noise. The
performance of all the procedures can be measured, with an input lifted
according to the claim so that all procedures start with the same initial
pieces of information. The performance metric estimates how likely an
edge from ab to c could be found in the reconstructed structures.
[0282] Exemplary results of this exemplary comparison are shown in
exemplary graphs of FIG. 23. For example, the exemplary CAPRI can succeed
in inferring the synthetic lethality relation more than about 93% of the
times, irrespective of the noise and sample size used. In particular,
with m.gtoreq.60, the exemplary procedure can infer the correct claim at
any execution, thus suggesting that the exemplary CAPRI, with the correct
input hypotheses, can infer complicated claims, many of which could have
high biological significance. Naturally, it would be reasonably expected
that the performance of any of these procedures would drop, were the
target relations part of a bigger model.
[0283] Indeed, FIG. 23 illustrate exemplary graphs of the reconstruction
with hypotheses of synthetic lethality. The average probability of
inferring a claim aE b>c (e.g., synthetic lethality) can be seen, when
this is provided in the input set D. Also shown is a probability for
CAPRI, the likelihoodbased algorithms with BIC and BDE scores, and the
structural IAMB and PC procedure. Data can be generated from model 2305
(e.g., unbalanced "exclusive or" with a preferential progression),
samples size ranges from about 30 to about 120, noise rate from about 0%
to about 20% and about 1000 ensembles are generated for each
configuration of noise and sample size. Results suggest that a threshold
level on the number of samples exists such that CAPRI can infer the
correct claim when <1_{a.RTM. b D c}.
[0284] The results of the exemplary reconstruction with other approaches
are shown in exemplary diagrams of FIG. 24, while delineating the
differences in the structures reconstructed by the exemplary CAPRI. FIG.
24 also shows the exemplary reconstruction with the structural procedure
algorithm Incremental Association Markov Blanket with loglikelihood, and
the likelihoodbased procedure with Bayesian Information Criterion score.
For example, only BIC infers the same relations on SETBP1 as those
inferred by CAPRI. Somatic mutations considered here involve the
following genes: SETBP1, NRAS, KRAS, TET2, EZH2, CBL, ASXL1, IDH2, IDH1,
WT1, SUZ, SF3B1, RUNX1, RBBP4, NPM1, JARID 2, JAK2, FLT3, EED, DNMT3A,
Ex23, CEBPA, EPHB3, ETNK1, GATA2, IRAK4, MTA2, CSF3R and KIT. In the plot
we show only those events for which at least a causal claim was inferred.
[0285] FIG. 25 shows a progression model of Copy Number Variants ("CNV''s)
in lung cancer inferred with CAPRI from previouslypublished data.
[0286] As show in exemplary illustrations of FIG. 26, the exemplary CAPRI
procedure can examine cancer patients' genomic data to determine \causal"
relationships among the chromosomal aberrations (mutations, copy number
fluctuations, epigenetic medications, etc.) that modulate the somatic
evolution of a tumor. When CAPRI concludes that aberration a (e.g., EGFR)
2605 causes aberration b (e.g., CDK 2510), it implies that the cells with
amutation initially enjoyed a selective advantage resulting in a clonal
expansion, which in turn created a Malthusian pressure (e.g., a
microenvironment with deregulated glutamine) that allowed for the cells
with bmutations to emerge with higher fitness (e.g., by disabling a G1S
checkpoint). Such causal relations can be succinctly expressed using
Suppes' probabilistic causation, which postulates that if a causes b, in
the sense described here, then a occurs before b (e.g., temporal
priority) and occurrences of a raises the probability of emergence of b
(e.g., probability raising). These properties are checked by the
exemplary CAPRI by combining ideas from model checking and Bayes network
theory, as illustrated in FIG. 26. Since CAPRI uses model checking, it is
capable of also testing complex causal claims: for example, conjunctive
causal claims.
[0287] As shown in exemplary graphs or FIG. 27, CAPRI's accuracy and
performance was calibrated against various competing algorithms via
extensive computer simulation. Hamming distance ("HD"), precision and
recall of CAPRI were assessed with synthetic data generated by DAGs
confluences. A unique progression and number of samples likely to be
found in currently available databases such as TCGA, (e.g., m 250). Lower
values of HD can imply that the exemplary procedure has mislabeled fewer
genuine and spurious causes. Noise can account for both false positives
and negatives. Graph 2705 plot comparison of CAPRI with IAMB, PC, BIC,
BDE, CBN and CAPRESE, and is presented sorted according to the median
performance.
Exemplary Model Description and Structure Learning
Exemplary Bayesian Networks
[0288] BN can be a statistical model that provides a sparse and succinct
representation of a multivariate probability distribution over n random
variables and encodes it into a sparse directed acyclic graph ("DAG"),
G=(V, E) over n=V nodes, one per variable2, and E<<V.sup.2
directed edges. A DAG can consist of a set of nodes (V) and a set of
directed edges (E) between these nodes, such that there may be no
directed cycles between any two nodes. In the exemplary setting, each
node represents a Bernoulli random variable taking values in {0,1}. The
full joint distribution factors as a product of conditional probability
distributions ("CPDs") of each variable, given its parents in the graph.
In a DAG, the set of parents of node Xi consists of all the nodes with
edges that point to Xi and can be written as P a(Xi). CPDs can be
presented in, FIGS. 28A28D, in which show a possible assignment of the
parents and the corresponding probability of the child, which can be, a
Bernoulli random variable .dielect cons.{0,1}, when it takes the value
1.
( x i , , x n ) = X i .dielect cons. V
( X i = x i P a ( X i ) = x P
a ( i ) ) . ( 20 ) ##EQU00031##
[0289] The set of edges E can represent all the conditional independence
relations between the variables. Specifically, an edge between two nodes
Xi and Xj can denote statistical conditional dependence, no matter on
which other variables can be conditioned. Mathematically this means that
for any set of variables S.OR right. V\{Xi, Xj}, it holds that P(Xi,
XjS)/=P(XiS)P(XjS). In the BN, the symmetrical nature of statistical
dependence means that the graphs Xi.fwdarw.Xj and Xi.rarw.Xj encode the
same conditional independence relations. Such graphs can be called
Iequivalent (e.g., independence) and a set of such graphs a Markov
equivalence class. In fact, any graphs that contain the same skeletons
and vstructures can be Markov equivalent. Here, the skeleton can refer
to the undirected set of edges, in which Xi.fwdarw.Xj and Xi.rarw.Xj both
map to XiXj, and a vstructure refers to a node with a set of at least
two parents, in which no pair of parents share an edge. In BN
terminology, a parent with no shared edge can be considered "unwed
parents." For this reason, the vstructure can often be called an
immorality. In other texts, it can be referred to as an unshielded
collider.
Exemplary Monotonic Progression Networks
[0290] A class of Bayesian networks over Bernoulli random variables called
monotonic progression networks ("MPNs") can be defined. (See e.g.,
Reference 86). MPNs formally represent informal and intuitive notions
about the progression of persistent events that accumulate monotonically,
based on the presence of other persistent events. The terms variable and
event can be used interchangeably. The conditions for an event to happen
can be represented in the CPDs of the BN using probabilistic versions of
canonical Boolean operators, namely conjunction ( ), inclusive
disjunction (V), and exclusive disjunction (.sym.), as well as any
combination of propositional logic operators. FIGS. 28A28D show an
example of the CPDs associated with various operators.
[0291] While this exemplary framework can facilitate any formula to define
the conditions of the parent events conducive for the child event to
occur, a simpler design can be chosen to avoid the complexity of the
number of possible logical formulas over a set of parents. Namely, three
types of MPNs can be defined (e.g., a conjunctive MPN ("CMPN"), a
disjunctive MPN ("DMPN"), sometimes referred to as a semimonotonic
progression network ("SMPN") and an exclusive disjunction MPN ("XMPN").
The operator associated with each network type can define the logical
relation among the parents that should hold for the child event to take
place. Arbitrarily complex formulas can still be represented as new
variables, whose parent set can consist of the variables in the formula
and whose value can be determined by the formula itself. This exemplary
design choice assumes that most of the relations in a particular
application fall under one category, while all others can be special
cases that can be accounted for individually. Mathematically, the CPDs
for each of the MPNs are defined below as, for example:
[0292] CMPN:
Pr(X=1.SIGMA.Pa(X)<Pa(X)).ltoreq..dielect cons.,
Pr(X=1.SIGMA.Pa(X)=Pa(X))>.dielect cons..
[0293] DMPN:
Pr(X=1.SIGMA.Pa(X)=0).ltoreq..dielect cons.,
Pr(X=1.SIGMA.Pa(X)>0)>.dielect cons..
[0294] XMPN:
Pr(X=1.SIGMA.Pa(X).noteq.1).ltoreq..dielect cons.,
Pr(X=1.SIGMA.Pa(X)=1)>.dielect cons..
[0295] The inequalities above define the monotonicity constraints specific
to each type of MPN, given a fixed "noise" parameter E. When a particular
event occurs, despite the monotonicity constraint, the sample can be
negative with respect to that event. If the event does not occur or
occurs in compliance with the monotonicity constraint, then it can be a
positive sample of that event. Note that in the case in which E=0, the
monotonicity constraints can be deterministic, and all samples can be
positive. By convention, the rows of a CPD can be referred to as
positive, and negative rows and .theta.+ can refer to the conditional
probability of some positive row I, and .theta. can refer to the
conditional probability of some negative row i.
Exemplary Structure Learning
[0296] Many procedures exist to carry out structure learning of general
Bayesian networks. They usually fall into two families of procedures,
although several hybrid approaches have been recently proposed. (See
e.g., References 83 and 92). The first, constraint based learning,
explicitly tests for pairwise independence of variables conditioned on
the power set of the rest of the variables in the network. The second,
score based learning, constructs a network to maximize the likelihood of
the observed data, with some regularization constraints to avoid
overfitting. Because the data can be assumed to be independent and
identically distributed (e.g., i.i.d), the likelihood of the data can be
the product of the likelihood of each datum, which in turn can be defined
by the factorized joint probability function described above. For
numerical reasons, log likelihood ("LL") can usually be used instead of
likelihood, and thus the likelihood product becomes the log likelihood
sum.
[0297] The latter approach can be built on, specifically relying on the
Bayesian Information Criterion ("BIC") as the regularized likelihood
score. The score can be defined below as, for example:
score BIC ( D , G ) = LL ( D G )  log
M 2 dim ( G ) . ( 21 ) ##EQU00032##
[0298] For example, G can denote the graph (e.g., including both the edges
and CPDs), D can denote the data, M can denote the number of samples, and
dim(G) can denote the number of parameters in the CPDs of G. The number
of parameters in each CPD can grow exponentially with the number of
parents of that node. For the exemplary networks over events, dim(G) for
a single node X can be 2P a(X). Thus, the regularization term dim(G)
can favor nodes with fewer parents or equivalently, graphs with fewer
edges. The coefficient log M/2 essentially weighs the regularization
term, such that the higher the weight, the more sparsity will be favored
over "explaining" the data through maximum likelihood. The likelihood can
be implicitly weighted by the number of data points, since each point
contributes to the score.
[0299] With sample size enlarging, both the weight of the regularization
term and the "weight" of the likelihood can increase. However, the weight
of the likelihood can increase faster than that of the regularization
term. Mathematically, it can be said that the likelihood weight can
increase linearly, while the weight of the regularization term can
increase logarithmically. Thus, with more data, likelihood will
contribute more to the score. Intuitively, with more data, the exemplary
observations can be trusted more, and can have less need for
regularization, although this term never completely vanishes.
[0300] Statistically speaking, BIC can be a consistent score. (See e.g.,
Reference 92). In terms of structure learning, this exemplary property
can imply that for sufficiently large sample sizes, the network with the
maximum BIC score can be Iequivalent to the true structure, G*. From the
above, G can have the same skeleton and vstructures as G*, though
nothing can be guaranteed regarding the orientation of the rest of the
edges. For most graphs, therefore, BIC cannot distinguish among G* plus
all other possible graphs, and thus may not be sufficient for exact
structure learning. In the case of BNs with structured CPDs, such as
MPNs, it can be possible to improve on the performance of BIC. For
example, the BIC score has been modified below to drastically improve
performance in learning the orientations of all edges.
Exemplary Observational Vs. Biological Noise
[0301] The notion of probabilistic logical relations among variables to
represent disease progression has been developed in two families of
models. These two exemplary approaches diverge in the treatment of noise,
or equivalently, in how the model produces negative, or nonmonotonic,
samples. The first approach encodes a notion of experimental, or
observational, noise, in which negative samples can result from incorrect
labeling of the events. (See e.g., References 89 and 96). In the
exemplary system, method, and computeraccessible medium, according to an
exemplary embodiment of the present disclosure, each generated sample can
be initially positive in all variables, and then can have several event
values inverted, with a certain probability. The second approach can
encode biological or causal noise, in which negative samples result from
the activation of events by some noncanonical causes, in the absence of
canonical ones. (See e.g., Reference 86). In exemplary models like these,
the level of noise corresponds to the probability that an event occurs
despite the absence of its parents.
[0302] Observational noise and biological noise have different statistical
properties that affect how the model can be learned. Namely,
observational noise can often be assumed to be unbiased and have a
Gaussian distribution and thus by the strong law of large numbers,
converges to zero for a sufficiently large number of observations. In
contrast, biological noise can be asymmetric, and can persist even with
large sample sizes. One of the key consequences of these differences can
be the following. While the asymptotic marginal probabilities of the
variables can be the same for all levels of noise in the observational
noise model, for biological noise, however, the marginal probabilities
can be very sensitive to the level of noise, irrespective of how large
the sample size can be.
Exemplary Development of Causal Score
[0303] An exemplary score can be presented (e.g., the one used in
Polaris), that can statistically be consistent, like BIC, and can
correctly orient edges based on the monotonicity of the progression
relation, like DiProg, but without knowing the parameter E a priori. The
basic idea behind the score can be a heuristic for the likelihood of each
sample such that the likelihood reflects both the probability of the
sample being generated from its CPD, and the probability that the CPD
obeys the monotonicity constraints of the true model. The latter may not
be computed without knowledge of E, and thus relies on a nonparametric
notion of monotonicity to estimate the underlying CPD. Below, is an
explanation of the development of Polaris and with its philosophical
foundations compared to its asymptotic convergence properties.
Exemplary Suppes Causality
[0304] The score can be modeled after the asymmetrical portion, .alpha.,
of the causal score, presented above. (See e.g., Reference 93). This part
of the score can be based on Suppes's theory of causality for
distinguishing prima facie causes from noncausal correlations. Suppes
stipulates two conditions for event C to cause event E. First, C must
raise the probability of E. In the exemplary statistical model, this
means that P(EC)>P(EC.sup.). Second, C must precede E in time.
Unfortunately, this model, may have no notion of time and may not
directly infer temporal priority.
[0305] However, under the condition that C can be the unique cause of E,
it can be beneficial that C must appear every time E appears but not vice
versa. Therefore, the number of occurrences of C must be larger than that
of E. From this, it can be easy to see that P(C)>P(E). In fact, this
property of temporal priority also holds for conjunctions over several
parents, as E will only appear when all its parents can be present.
[0306] The .alpha. score for a causal relation can be defined as
C .fwdarw. E as ( E C )  ( E C
_ ) ( E C ) + ( E C ) . ##EQU00033##
This definition can be proved to meet both the probability raising and
temporal priority conditions explained above. However, only the tree
structured graphs were considered, in which every node has at most 1
parent and at most 1 negative row in its CPD. (See e.g., Reference 93).
Applied to an MPN, the true .alpha. value for each CPD can be strictly
positive for each edgea consequence of the constraint that
P(EC)>P(EC.sup.) for all MPNs. Thus, when several graphs can be
considered to fit to observed data, an estimated .alpha. with a negative
value (e.g., below a threshold) means that the corresponding CPD breaks
the monotonicity constraint. However, an estimated .alpha. with a
positive value (e.g., above a threshold) puts more faith in the
legitimacy of that CPD. Otherwise, the interpretation of CPD can be
ambiguous. Justified by these intuitive observations, .alpha. can serve
as a faithful proxy for monotonicity in tree structured MPNs. Exemplary
Weighted Likelihood without a Priori Knowledge of Model Parameters
[0307] More general, DAG structured models can be considered in which CPDs
can have more than one negative row. To handle this, a .alpha. score can
be assigned to each row of the CPD, as defined below. A notation of
.alpha.xi can be used to denote the .alpha. value corresponding to row i
of the CPD of variable X. By the exemplary convention, .theta. can
denote the probability of negative row i and .theta.+ the probability of
the one positive row of the CPD of X. This assumption may only be true
for CMPNs. This notation can be extended to DMPNs and XMPNs later.
.alpha. xi = { 1 , for a positive row
; .theta. ^ x +  .theta. ^ xi  .theta. ^ x + +
.theta. ^ xi  , for a negative row .
##EQU00034##
[0308] Thus, as described above, a can now be a heuristic for the
monotonicity of each row of a CPD rather than the CPD as a whole. It
follows that each negative sample has a corresponding .alpha. between 1
and 1. Thus, each negative sample can be weighed by its .alpha. value to
reflect the exemplary belief that its CPD row conforms to the
monotonicity constraints. This strategy leads to CPDs with high
monotonicity to be favored through their samples, whereas CPDs with poor
monotonicity can be penalized through their samples. Moreover, by
handicapping the samples instead of the CPDs directly, rows whose
conditional probabilities were estimated with more samples to have a
larger effect on the score were used. The resulting .alpha.weighted
likelihood score (e.g., score.alpha.WL) for variable X given sample d can
be defined below, where and .theta..sup. +.theta..sup.  can be empirical
estimates of their respective parameters. Note that because of the
indicator function in the exponent of the .alpha. term in the score, only
the .alpha. term of the row that corresponds to the sample can be used to
weigh the likelihood. Specifically, if the sample can be positive, the
likelihood may not be altered, whereas if the sample can be negative, the
likelihood can be penalized in proportion to the .alpha. score for that
sample's corresponding row.
score .alpha. WL ( X : d ) = Pr ( X =
d x P a ( X ) = d P a ( X ) )
i .dielect cons. CPD x .alpha. xi 1 ( d P a
( X ) = C P D x ( i ) ) ##EQU00035##
[0309] The exemplary score used for structure learning can include the BIC
regularization term, so the full combined score for a single variable X
given a datum d is below. The last line defines the composed score for
the all the variables, V, over all the data, D.
score .alpha. WL , BIC ( X : d ) = log
[ Pr ( X = d x P a ( X ) = d P a
( X ) ) i = 1 CPD x .alpha. xi 1 ( d P
a ( X ) = CPD x ( i ) ) ]  log M 2
dim ( X P a ( X ) ) , score .alpha.WL
, BIC ( X : d ) = log [ Pr ( X = d x P a
( X ) = d P a ( X ) ) + i = 1 CPD x
1 ( d P a ( X ) = C P D x ( i
) ) log .alpha. xi ]  log M 2 dim ( X
P a ( X ) ) , score .alpha. WL ,
BIC ( X : d ) = LL ( d x , d P a ( X
) G ) + .alpha. ( X d )  log M 2 dim
( X P a ( X ) ) , and , finally
##EQU00036## score .alpha. WL , BIC ( G
: D ) = LL ( D G ) + d .dielect cons. D X
.dielect cons. V .alpha. ( X d )  log M 2
dim ( G ) . ##EQU00036.2##
[0310] This can be further written as, for example:
.alpha. ( X d ) = i .dielect cons. CPD x 1
( d P a ( x ) = C P D x ( i )
) log .alpha. xi ##EQU00037##
Exemplary Multiplicative Factors
[0311] Asymptotically, the BIC can be known to reconstruct the correct
skeleton and orient edges in immoralities correctly. Since a score to
enhance this result further and orient the remaining edges correctly
without disturbing the correct skeletal structure can be beneficial, a
new weight can be introduced to the whole monotonicity term of the score.
This exemplary weight can be structured to approach zero in the limit, as
the sample size approaches infinity. Thus, for small sample sizes, the
monotonicity component can play a larger role in the overall score. Then,
as the BIC component converges to a more stable structure, the
monotonicity component can choose the exact structure among several
equally likely ones. For these asymptotic results, the simplest weight
can be chosen that can be inversely proportional to the sample size: 1/M.
The final score developed for structure learning of MPNs is below.
score Polaris ( G : D ) = LL ( D G ) + 1 M
d .dielect cons. D X .dielect cons. V .alpha. (
X d )  log M 2 dim ( G ) . ##EQU00038##
[0312] It can be proved mathematically that this score asymptotically
learns the correct exact structure of an MPN under certain
conditionsespecially, conditions enforcing the absence of transitive
edges and a sufficiently low E parameter. In practice, however, it was
found that the exemplary system, method, and computeraccessible medium
according to an exemplary embodiment of the present disclosure, can
converge on the correct structure for graphs with transitive edges and
nonnegligible E values. (See e.g., FIGS. 29A29D).
[0313] Exemplary Definition 8 (e.g., Faithful Temporal Priority):
[0314] In a monotonic progression network G, if there exists a path from
Xj to Xi, then the temporal priority between Xi and Xj can be faithful if
P(Xj)>P(Xi).
[0315] Exemplary Theorem 6 (e.g., Convergence Conditions for Polaris):
[0316] For a sufficiently large sample size, M, under the assumptions of
no transitive edges and faithful temporal priority relations (see e.g.,
Definition 8 above), between nodes and their parents, at least for nodes
that have exactly 1 parent, optimizing Polaris can converge to the exact
structure.
Exemplary Extension to DMPNs and XMPNs.
[0317] The score stated above can work for all three classes of MPNs, with
minor modifications to the definition of .alpha., depending on the
monotonicity constraints. A main difference between CMPNs and the other
two types lies in the fact that each CPD corresponding to a CMPN can have
exactly one positive row. In contrast, the CPDs in DMPNs can have exactly
one negative row, and the CPDs in XMPNs can have multiple positive and
negative rows. (See e.g., FIGS. 28A28D). Specifically, the only negative
row for DMPNs can be the case in which all parent nodes equal zero. For
XMPNs, any row with exactly one parent event equal to one can be a
positive row and all the rest can be negative rows. In order to extend
the definition of a to DMPNs and XMPNs, all events that correspond to the
positive rows of a CPD can be treated as one event. The probability of
this large event can be called .theta.+, just as in the CMPN case, and it
is defined below for both DMPNs and XMPNS.
.theta..sub.DMPN.sup.+(X)=(.SIGMA.Pa(X)>0),
.theta..sub.DMPN.sup.+(X)=(.SIGMA.Pa(X)=1).
Exemplary Temporal Priority in the Presence of Biological Noise
[0318] The .alpha. score for learning models can enforce both probability
raising and, for conjunctive or singleton parent sets, temporal priority.
(See e.g., References 93 and 96). The model of noise considered there has
the property that, for sufficient large sample sizes, by the large of
large numbers, the probability of a negative sample can approach zero.
However, in the exemplary model of noise, .theta.'s can be fixed
parameters and may not approach zero. Thus, temporal priority cannot
always be correctly imputed for all causal relations. That can be,
C.fwdarw.E does not necessary mean that P(C)>P(E).
[0319] Instead, temporal priority can be decided by E, .theta.+ and the
marginal probabilities, as specified in the equation below. Specifically,
high E and correspondingly high .theta., low .theta.+ and close marginal
probabilities can make it easier to reverse the observed temporal
priority.
( X ) = ( P a ( X ) = 1 ) .theta. +
+ i ( 1  ( P a ( X ) = C P
D X ( i ) ) ) .theta. i  . ##EQU00039##
Exemplary MPN Structure Learning
Exemplary Filtering
[0320] Before optimizing the score, there can be certain parent sets may
be eliminated as hypotheses. This preoptimization filtering can be done
for two reasons. First, it can prevent the optimization procedure from
selecting a spurious parent set. Second, it can speed up computation
significantly by not computing the full score for that hypothetical
parent set. The .alpha. score can be used to filter hypotheses, rejecting
those solutions that can create a negative .alpha. for at least one row
of the CPD. This .alpha.filter can be used for all types of MPNs, and
can greatly improve efficiency without eliminating too many true
hypotheses. In fact, it can be proven mathematically that asymptotically,
the .alpha. filter will be free of any mistakes.
Exemplary Lemma 1 (Convergence of AFilter).
[0321] For a sufficiently large sample size, M, the .alpha.filter
produces no false negatives for CMPNs, DMPNs and XMPNs.
Exemplary Optimizing the Score with GOBNILP.
[0322] After pruning the hypothesis space with the .alpha. filter, an
exemplary GOBNILP can be used, a free, publicly available BN structure
learning package, to find the network with the highest Polaris score.
(See e.g., References 80, 85 and 91). Given an upper bound on the maximum
number of parents (e.g., by default 3), GOBNILP can expect as input the
scores for each node given each possible combination of parents. For each
node, the exemplary code produces this information with a depth first
search through the power set of the rest of the nodes in the graph. Any
hypothetical parent set that can be filtered may not simply be included
as a possible solution for that node in the input to GOBNILP.
Further Exemplary Results
Exemplary Performance on Synthetic Data
[0323] Several experiments were conducted to test the performance of the
exemplary Polaris on data generated from synthetic networks, all on ten
variables. The network topologies were generated randomly, and the CPDs
were generated according to the monotonic constraints imposed by the type
of MPN and the value of E. These networks were sampled with different
sample sizes. In all experiments, the performance metrics were measured
over fifty synthetic topologies sampled ten times, for each value of E
and sample size.
[0324] The performance of Polaris was compared against two standards, the
optimization of the BIC score and the clairvoyant DiProg procedure,
across a variety of biologically and clinically realistic E values and
sample sizes. Clairvoyant can mean that the procedure has a priori
knowledge of c. To evaluate the performance of each procedure, both the
recall, the fraction of true edges recovered, and the precision were
measured, and the fraction of recovered edges that can be true. FIGS.
29A29D illustrate the exemplary results concisely for all three types of
MPNs by using AUPR, or the area under the precisionrecall curve, as the
exemplary performance metric. It was expected that the exemplary Polaris
performs significantly better than BIC, which can be nonspecific for
monotonic relations and slightly worse than the clairvoyant DiProg
algorithms, as Polaris does not have access to the correct value of E.
The results showed this exact trend for recall, precision and AUPR. The
gap between the clairvoyant DiProg and Polaris remained consistent across
all parameter values and relatively low, as opposed to the gap between
Polaris and BIC optimization
[0325] The performance of Polaris against a nonclairvoyant DiProg can be
considered by passing DiProg one of about fifty randomly sampled values
of E. Because of the cost of running DiProg fifty times, the exemplary
model can be limited to CMPN, E to about 0.15, and sample size to about
200. The box plot in FIG. 29B shows the variance of performance for
Polaris (e.g., 2905), the average performance of the nonclairvoyant
DiProg, the performance of the nonclairvoyant DiProg (e.g., 2910) with
the most incorrect value of E (e.g., 2915), and finally, the performance
of the clairvoyant DiProg (e.g., 2920). Again using AUPR as the
performance metric, it was found that the average performance of the
nonclairvoyant DiProg had a significantly lower mean and considerably
larger variance than those of the exemplary Polaris. Moreover, the mean
of the worst case performance of DiProg was almost twice as low as that
of the exemplary Polaris, and the variance was slightly larger. From
these analyses, it can be concluded that when E may not be known, more
accurate and more consistent results can be expected from the exemplary
Polaris than from DiProg.
[0326] The description below demonstrates the efficacy and accuracy of the
.alpha.filter for CMPNs, DMPNs, and XMPNs. On average, the filter can
eliminate approximately half of all possible hypotheses and makes
considerably less than one mistake per network. In fact, for sufficiently
large sample sizes, the false negative rate can drop to almost zero.
Exemplary Biological Example
[0327] The use of the exemplary Polaris on prostate cancer ("PCA") data
can be demonstrated. From the experimental observations, an exemplary
progression model with 3 distinct subprogressions can be posited. (See
e.g., References 81, 82, 88, 90, 97, 99 and 101). To test this theory, a
CMPN was learned based on the copy number alteration ("CAN"), mutation,
and fusion event data on the genes discussed above. The TCGA prostate
adenocarcinoma dataset of 246 sequenced tumors, available through MSKCC's
cBioPortal interface, was used. (See e.g., References 84, 87 and 94).
[0328] It was found that the exemplary learned model, shown in FIG. 30,
validates and unifies the observations above in one triprogression
model. First, it was found that two major progressions, one centered on
TMPRSS2ERG fusion (e.g., below referred to as just "ERG") and another
around CHD1 and SPOP. This confirms the theory of two distinct
progressions defined by SPOP and ERG. (See e.g., Reference 82). Moreover,
the exemplary model captures the associated genes predicted in each
progression. Namely, CHD1, FOXO3 and PRDM1 can be involved in the SPOP
progression and PTEN and TP53 in the ERG progression. Next, it was
postulated that MYC, NCOA2 and NCOR2 can be all involved in a third
progression, even though NCOR2 appears isolated from the other two in the
graph. This decision can be justified by noting previouslyknown
observations. (See e.g., Reference 88, 90 and 99), where it was predicted
that there can be a third progression that includes neither CHD1 nor ERG.
It has also been predicted that there can be a subtype with poor
prognosis that involves the amplification of MYC and NCOA2. It was also
predicted that early onset PCA involves the Androgen receptor ("AR")
pathway and NCOR2 mutation but does not include ERG, CHD1, or PTEN. Other
work has shown an experimental connection between MYC and AR expression,
strengthening the MYC/NCOA2 involvement in the third pathway. Lastly,
FIG. 30 shows several key driver genes (e.g., NKX31, APC, ZFH3, THSD7B,
FOXP1, SHQL, RB, RYBP) in the progression of PCA that have not been
assigned to either the SPOP or ERG progressions. The model proposes an
assignment of these genes to their respective progressions that can be
experimentally tested. It can be noted that FOXP1, SHQ1 and RYBP, all
genes in the 3p14 region, can be closely related in the progression.
Further Exemplary Discussion
[0329] The exemplary Polaris accomplishes its intended tasks effectively
and efficiently. To quantify its efficacy, a theoretical analysis is
provided below, containing a proof of its asymptotic convergence under
some mild conditions. Moreover, the exemplary procedure was empirically
tested on a variety of noise levels and sample sizes. It was found that
it outperforms the standard score for structure learning and closely
trails behind the clairvoyant one. It can be the case, however, that the
exemplary Polaris, by virtue of its machine learning abilities, can
solely and completely solve all the underlying problems in cancer systems
biology.
[0330] FIGS. 28A28D illustrate an exemplary procedure according to an
exemplary embodiment of the present disclosure. The Polaris exemplary
procedure accepts raw cross sectional genomic data and computes a causal
progression model with logical relations among the variables. Initially
(e.g., FIG. 28A), each patient's tumor can be sampled during surgery and
sequenced afterwards. From the sequencing, it can be found that each
tumor has genomic aberrations in certain genes and not others. Most genes
will be common among the tumors, although some may be outliers (e.g.,
gene 2805). This data can then be projected into a high dimensional space
(FIG. 28B) and the genes' cooccurrence frequencies can be encoded as a
joint distribution over the gene variables. The exemplary Polaris can
mine this data for causal relations (FIG. 28C) and can encode the major
causal progressions among the genes in a graphical model. The minor
causes 2810 can account for the outliers in the data and often reflect a
varying spectrum in cancer types among the patients. These minor causes
2810 can be averaged and collapsed into a causal or biological noise
parameter in the model. Finally, many genomic events, for instance CDK
2815 mutation, seem to precipitate from the occurrence two or more
events, for instance EGFR 2820 and MYC 2825 mutations. A language for
expressing this dependence is shown in FIG. 28D. Using the examples in
the FIGS. 28A28D, CDK 2825 can be facilitated to occur only when both
EGFR 2820 and MYC 2825 occur CMPN, when either one occurs DMPN, or when
only one but not both occur XMPN. The examples of conditional probability
distributions CPDs reflect these logical relations.
[0331] As shown in exemplary graphs of FIGS. 29A29D, the performance of
the exemplary Polaris was tested against the optimization of a standard
symmetric score, BIC and a clairvoyant procedure for learning MPNs,
DiProg. Each procedure was tested across several different levels of
noise (e.g. about 0% to about 30%) and across several realistic number of
training samples (e.g. about 50 to about 500). In each case, the network
contained ten variables, common for progression models, although each
procedure can handle a great deal more.
[0332] The exemplary surface plots (e.g., FIGS. 29A, 29C and 29D) show the
performance of each procedure for different MPN types, CMPN (e.g., FIG.
29A), DMPN (e.g., FIG. 29C) and XMPN (e.g., FIG. 29D). The box plots on
the top right demonstrate the dependence of DiProg performance on a
priori knowledge of E. A network with ten variables was learned (e.g.,
about 15% noise and about 200 samples with Polaris, DiProg with the
correct E, and DiProg with a random E). Element 2910 shows the average
performance across the random E values. Element 2915 shows the worst
performance with a random E value. Element 2920 shows the performance
with knowledge of the correct E value. For all four plots, the rate of
both true positives (e.g. recall) and true negatives (e.g. precision) can
be measured by computing the area under the precisionrecall curve, or
the AUPR.
[0333] As shown in FIG. 30, the exemplary Polaris model was used to learn
a CMPN model for prostate cancer. The most commonly implicated oncogenes,
tumor suppressor genes, and gene fusion events were selected from the
literature and used copy number variation and point mutation data from
the TCGA database. Each edge is labeled with the fold change in the
network score when the edge is left out. Based on the topology and the
exemplary literature survey, three distinct progressions within the graph
can be defined, and each is labeled 3005, 3010 and/or 3015.
Exemplary Detailed Comparison of Performance Results on Synthetic Data
[0334] Here, the performance results for the comparison of Polaris to the
optimization BIC and the clairvoyant DiProg can be included. FIGS. 28A30
show the comparison results using recall and precision as performance
metrics and both small and asymptotic sample sizes, for CMPNs, DMPNs and
XMPNs, respectively. The recall and precision can be separated in order
to highlight the asymmetry in Polaris's performance. That can be, the
exemplary Polaris performs considerably better in recall and consistently
introduces a slightly higher number of false edges in the reconstructed
graph. The asymptotic sample size can be included to experimentally
verify the convergence of Polaris. Note that theorem 6 only guaranteed
convergence on graphs without transitive edges, but even with transitive
edges, the exemplary Polaris can converge almost completely at only about
2000 samples.
[0335] FIGS. 31A31D illustrate exemplary graphs providing experimental
performance results for Polaris, BIC, and clairvoyant DiProg on CMPNs,
measured in terms of recall (e.g., FIGS. 31A and 31C) and precision
(e.g., FIGS. 31B and 31D). To show the asymptotic behavior of the three
algorithms, the performance can be plotted for sample sizes up to about
2000 (FIGS. 31C and 31D). For comparison, the performance on more
realistic sample sizes was included (FIGS. 31A and 31B).
[0336] FIGS. 32A32D show exemplary graphs providing experimental
performance results for the Polaris, BIC, and clairvoyant DiProg on
DMPNs, measured in terms of recall (FIGS. 32A and 32C) and precision
(FIGS. 32B and 32D). To show the asymptotic behavior of the three
algorithms, the performance for sample sizes up to about 2000 (FIGS. 32C
and 32D) were plotted. For comparison, the performance on more realistic
sample sizes was included (FIGS. 32A and 32B).
[0337] FIGS. 33A33D illustrate exemplary graphs providing experimental
performance results for Polaris, BIC, and clairvoyant DiProg on XMPNs,
measured in terms of recall (FIGS. 33A and 33C) and precision (FIGS. 33B
and 33D). To show the asymptotic behavior of the three algorithms, the
performance for sample sizes up to about 2000 was plotted. (See FIGS. 33C
and 33D). For comparison, the performances on more realistic sample sizes
were included. (See FIGS. 33A and 33B). FIGS. 33A33D demonstrates the
efficacy and correctness of the .alpha.filter in rejecting hypotheses
prior to optimization of the score, in each of the three types of MPNs.
For each type of MPN, the average number of rejected true hypotheses can
be considerably smaller than one and converges to zero for medium sample
sizes. The .alpha.filter can be particularly effective at pruning the
hypothesis space of XMPNs, rejecting approximately 1000 hypotheses on
average, out of a possible 1300 hypotheses. It can be slightly less
effective for CMPNs, rejecting between about 500 and about 1000
hypotheses. Finally, it can be least effective for DMPNs, rejecting
between about 150 and about 350 hypotheses.
[0338] FIGS. 34A34F illustrate exemplary graphs providing the
.alpha.filter rejects hypotheses prior to optimization of the score.
FIGS. 34A, 34C and 34F show the efficacy, measured in terms of the number
of hypotheses eliminated prior to optimization. FIGS. 34B, 34D, and 34F
show the error rate, measured in terms of the average number of true
hypotheses rejected.
Exemplary Time Complexity of Polaris Optimization
[0339] The evaluation of the exemplary Polaris scores for all hypotheses
can dominate the computational complexity of the exemplary procedure. The
asymptotic complexity of this computation can be analyzed, and it can be
shown that its parametric complexity can be exponential, where the
exponent can be determined by the parameter. For a fixed (e.g., small)
value of the parameter, Polaris can be polynomial and tractable. To
estimate the complexity, the complexity of computing the score for any
single hypothesis can be determined. Then, this function can be
multiplied by the number of hypotheses to get the total cost, which can
be O(MN2(N1)k).
[0340] Here, the parameter k can be the maximum number of parents for any
node (and can be safely bounded by 3, in practice), and the input size
can be determined by M and N: respectively, the number of samples, and
the number of variables. In practice, the .alpha. filter helps
performance tremendously, as it avoids the log likelihood ("LL")
computation for at least nearly half of the hypotheses. (See e.g., FIGS.
34A34F).
Exemplary Computing the Score for a Single Hypothesis
[0341] A large part of the score computation effort can be expended in
computing .alpha. and the LL. The .alpha. computation can be divided into
computing .theta.+'s and .theta.+'s, which can be just the probabilities
of each row in the matrix, encoding Conditional Probability
Distributions, ("CPD"). Both computations can entail counting the number
of samples that correspond to each row and thus in total, take O(MN)
time. The maximum likelihood ("ML") parameters in the LL score can be
precisely the .theta.+'s and .theta.'s computed for .alpha.. Actually
computing the LL given the ML parameters can benefit from iterating
through the samples one more time and matching each sample to its
corresponding CPD row. Thus, LL computation also takes O(MN) time.
Combining all, the total local score computation for one node still takes
O(MN) time.
Exemplary Proofs of Theorems on Asymptotic Convergence
[0342] Provided below is a description of exemplary properties about the
asymptotic performance of Polaris.
[0343] Lemma 2 (Convergence of .alpha.Filter):
[0344] For a sufficiently large sample size, M, the .alpha.filter
produces no false negatives for Conjunctive, Disjunctive and Exclusive
Disjunctive Monotonic Progressive Networks: CMPNs, DMPNs, and XMPNs,
respectively.
[0345] Exemplary Proof:
[0346] By the law of large numbers, the empirical estimates for all rows
of the CPDs will converge to their corresponding true parameter values.
To show that the .alpha. filter will not create false negatives, it can
be shown that a for all true parent sets must be strictly positive for
all rows of the CPDs. The .alpha. values for positives rows can be always
1, and will thus never be negative. The .alpha. values for negative rows
can be negative, if .theta.+<.theta., for negative row I of a CPD and
.theta.+ as appropriately defined for each of the MPN types. Thus, it can
be shown that for all 3 types of MPNs, each negative row will have a
strictly positive .alpha.. In all three cases, the fact that the
conditional probability for all negative rows of all CPDs can be strictly
below and that for the positive rows can be strictly above .dielect
cons. can be used.
[0347] Exemplary Case I:
[0348] Here, .theta.+ refers to the conditional probability of 1 positive
row, which can be by definition larger than , or restated,
.theta.+>0. Combined with the fact that .theta.<, it follows that
.theta.+>.theta. and thus, .alpha. will never be negative.
[0349] Exemplary Case II:
[0350] The derivation below establishes that .theta.+ can be always
strictly larger than for the true parents sets in a DMPN. The summation
can be over all values of the parents that may not be all zeroes. Here, n
refers to the number of parents in Pa(X). That can be, n=Pa(X). The
inequality can exploit the fact that each conditional probability
corresponds to a positive row and can be thus strictly larger than
.dielect cons..
[0351] Case III:
[0352] The derivation below shows, just like in the DMPN, that
.theta.+> for all true parents sets in the XMPN. The reasoning behind
this can be similar to that above, except for the summation can be over
the rows in which exactly one parent takes value 1 and the rest take
value 0. To denote this, the standard notation Pai(X) can be used to mean
the ith parent of X and Pai(X) to mean all parents except for the ith
parent of X.
[0353] Lemma 3 (Consistency of Polaris):
[0354] Polaris can be a statistically consistent score.
[0355] Exemplary Proof:
[0356] Let M be the number of samples generated by the graph G*=(V, E*).
Let G=(V, E) be the graph learned by maximizing the Polaris score, and
GBIC be the graph learned by maximizing the BIC score, both for a
sufficiently large M. The exemplary Polaris score can consist of three
terms: (i) the loglikelihood (LL) term, (ii) the regularization term
from BIC and (iii) the monotonicity term. Each of these terms can grow at
different rates. The LL term can grow linearly (O(M)) with the number of
samples. The regularization term can grow logarithmically (O(log M)). The
monotonicity term does not grow (O(1)), since the sum of a scores can
grow linearly with the number of samples, M, but it can be weighted by
1/M. Consequently, it can be subsumed by the other two terms. Thus, any
perturbation to the graph G that would increase the monotonicity score
but decrease the BIC score can also decrease the Polaris score. From the
consistence of BIC theorem, it can be known that any perturbation to the
undirected skeleton or vstructures of GBIC can result in a lower BIC
score. It follows that for sufficiently large M, the addition of the
monotonicity term may not change the undirected skeleton or vstructures
of GBIC. Therefore, G can be Iequivalent to GBIC and by transitivity, G
can be Iequivalent to G*
[0357] Exemplary Theorem 6 (Convergence Conditions for Polaris):
[0358] For a sufficiently large sample size, M, under the assumptions of
no transitive edges and faithful temporal priority relations between
nodes and their parents at least for nodes that have exactly one parent,
optimizing Polaris convergences to the exact structure for MPNs. Proof:
Let G*=(V, E*) be the graph that generates the data and G, the graph
learned by optimizing the Polaris score. By the Polaris consistency
Lemma, for sufficiently large M, the undirected skeleton and vstructures
of G can be the same as those of G*. Below, it is shown that under
assumptions of temporal priority for all parentchild relations, G=G*.
[0359] Next, it can be shown that the parent set of each node can be
learned correctly, by considering nodes that have zero parents, one
parent or two or more parents. It then follows that all of the edges in
the undirected skeleton of G* can be oriented correctly and thus G=G*.
[0360] Exemplary Case IV:
[0361] Xi has 0 parents. If Xi has no parents, then the undirected
skeleton around Xi will only include the edges to the children of Xi.
Thus, the empty parent set can be learned correctly.
[0362] Exemplary Case V:
[0363] Xi has 1 parent. Let Xj be the parent of Xi.
[0364] Exemplary Case V (a):
[0365] Xj has 0 parents. By definition, Xj has 0 parents and Xi has
exactly 1 parent, Xj. Reorienting the edge Xj.fwdarw.Xi to Xj.fwdarw.Xi
results in an Iequivalent graph globally, because the edge may not be
involved in a vstructure in either orientation. Thus, the BIC score for
both orientations can be the same, and in order for Polaris to correctly
choose Xj.fwdarw.Xi over Xi.fwdarw.Xj, it must be the case that
.alpha.Xi.fwdarw.Xj.fwdarw..alpha.Xj Xi. In the derivation below, it can
be shown that this condition can be equivalent to the condition for
temporal priority. Namely, .alpha.Xi.fwdarw.Xj.fwdarw..alpha.Xj.fwdarw.Xi
can be equivalent to P(Xi)<P(Xj). To conserve space, let
P(XiXj)=.theta.+ and P(XiX.sup.j)=.theta.. Also, the identity
P(Xi)=P(XiXj)P(Xj)+P(XiX.sup.j)P(X.sup.j)=.theta.+P(Xj)+.theta.P(X.s
up.j) can be used. The following statements can be all equivalent:
[0366] Exemplary Case V (b):
[0367] Xj has 1 or more parents. Incorrectly reorienting the edge
Xj.fwdarw.Xi to Xj.rarw.Xi makes Xi a parent of Xj. Because G* can be
acyclic and has no transitive edges, there can be no edges between Xi and
the true parents of Xj. Thus, making Xi a new parent of Xj creates a new
vstructure (e.g., case VI proves that if Xj has 2 or more parents, then
they can be all unwed), consisting of Xi, Xj, and the true parents of Xj,
that may not be in G*. This can contradict the consistency of Polaris,
and thus the edge Xj.fwdarw.Xi will never be reoriented.
[0368] Case VI:
[0369] Xi has 2 or more parents. Because G* has no transitive edges, there
cannot be any edge between any two parents of Xi. Thus, the parents of Xi
can be unwed and form a vstructure with Xi. Because Polaris can be
consistent, this vstructure can be learned correctly.
[0370] Exemplary Corollary 1 (Convergence Conditions for Polaris with
Filtering):
[0371] For a sufficiently large sample size, M, under the assumptions of
no transitive edges and faithful temporal priority relations, filtering
with the .alpha.filter and then optimizing Polaris convergences to the
exact structure for MPNs. Proof:
[0372] In Lemma 1, it was shown that .alpha.filtering removes no true
parent sets. In Theorem 6, it was shown that given a hypothesis space
that includes the true parent sets, optimizing Polaris returns the true
graph. Because the .alpha.filter does not remove the true parent sets
from the hypothesis space, optimizing Polaris will still return the
correct structure on the filtered hypothesis space.
[0373] FIG. 35 illustrates a flow diagram of an exemplary method for
generating a model of progression about at disease. For example, at
procedure 3505, biomedical data about one or more patients can be
obtained. A graph can be generated from the biomedical data at procedure
3510. At procedure 3515, states of the disease can be determined, and at
procedure 3520, transitions among the states can be determined. At
procedure 3525, the model of progression can be generated. At procedure
3530, further biomedical data from a further patient can be obtained, and
information about a disease that the further patient may have can be
generated at procedure 3535.
[0374] FIG. 9 shows a block diagram of an exemplary embodiment of a system
according to the present disclosure, which can implement the exemplary
embodiments of the method and procedures described herein. For example,
exemplary procedures in accordance with the present disclosure described
herein can be performed by a processing arrangement and/or a computing
arrangement 902. Such processing/computing arrangement 902 can be, for
example, entirely or a part of, or include, but not limited to, a
computer/processor 904 that can include, for example, one or more
microprocessors, and use instructions stored on a computeraccessible
medium (e.g., RAM, ROM, hard drive, or other storage device).
[0375] As shown in FIG. 9, for example, a computeraccessible medium 906
(e.g., as described herein above, a storage device such as a hard disk,
floppy disk, memory stick, CDROM, RAM, ROM, etc., or a collection
thereof) can be provided (e.g., in communication with the processing
arrangement 902). The computeraccessible medium 906 can contain
executable instructions 908 thereon. In addition or alternatively, a
storage arrangement 910 can be provided separately from the
computeraccessible medium 906, which can provide the instructions to the
processing arrangement 902 so as to configure the processing arrangement
to execute certain exemplary procedures, processes and methods, as
described herein above, for example.
[0376] Further, the exemplary processing arrangement 902 can be provided
with or include an input/output arrangement 914, which can include, for
example, a wired network, a wireless network, the internet, an intranet,
a data collection probe, a sensor, etc. For example, anatomical data 920
can be provided to the input/output arrangement 914. As shown in FIG. 9,
the exemplary processing arrangement 902 can be in communication with an
exemplary display arrangement 912, which, according to certain exemplary
embodiments of the present disclosure, can be a touchscreen configured
for inputting information to the processing arrangement in addition to
outputting information from the processing arrangement, for example.
Further, the exemplary display 912 and/or a storage arrangement 910 can
be used to display and/or store data in a useraccessible format and/or
userreadable format.
[0377] The foregoing merely illustrates the principles of the disclosure.
Various modifications and alterations to the described embodiments will
be apparent to those skilled in the art in view of the teachings herein.
It will thus be appreciated that those skilled in the art will be able to
devise numerous systems, arrangements, and procedures which, although not
explicitly shown or described herein, embody the principles of the
disclosure and can be thus within the spirit and scope of the disclosure.
Various different exemplary embodiments can be used together with one
another, as well as interchangeably therewith, as should be understood by
those having ordinary skill in the art. In addition, certain terms used
in the present disclosure, including the specification, drawings and
claims thereof, can be used synonymously in certain instances, including,
but not limited to, for example, data and information. It should be
understood that, while these words, and/or other words that can be
synonymous to one another, can be used synonymously herein, that there
can be instances when such words can be intended to not be used
synonymously. Further, to the extent that the prior art knowledge has not
been explicitly incorporated by reference herein above, it is explicitly
incorporated herein in its entirety. All publications referenced are
incorporated herein by reference in their entireties.
EXEMPLARY REFERENCES
[0378] The following references are hereby incorporated by reference in
their entirety. [0379] [1] BEERENWINKEL, N.; ERIKSSON, N., AND STURMFELS,
B. Conjunctive bayesian networks. Bernoulli (2007), 893909. [0380] [2]
BEERENWINKEL, N., RAHNENFUHRER, J.; DAUMER, M., HOFFMANN, D., KAISER, R.;
SELBIG, J., AND LENGAUER, T. Learning multiple evolutionary pathways from
crosssectional data. Journal of Computational Biology 12, 6 (2005),
584598. [0381] [3] BELL, D.; BERCHUCK, A.; BIRRER, M.; CHIEN, J.;
CRAMER, D., DAO, F., DHIR, R.; DISAIA, P.; GABRA, H.; AND GLENN, P.
Integrated genomic analyses of ovarian carcinoma. [0382] [4] DESPER, R.,
JIANG, F.; KALLIONIEMI, 0., MOCH, H., PAPADIMITRIOU, C., AND SCHAFFER, A.
Inferring tree models for oncogenesis from comparative genome
hybridization data. Journal of Computational Biology 6, 1 (1999), 3751.
[0383] [5] DESPER, R., JIANG, F., KALLIONIEMI, 0., MOCH, H.,
PAPADIMITRIOU, C., AND SCHAFFER, A. Distancebased reconstruction of tree
models for oncogenesis. Journal of Computational Biology 7, 6 (2000),
789803. [0384] [6] EDMONDS, J. Optimum branchings. Journal of Research
of the National Bureau of Standards B 71 (1967), 233240. [0385] [7]
EFRON, B. Bootstrap methods: another look at the jackknife. The annals of
Statistics (1979), 126. [0386] [8] EFRON, B. The jackknife, the
bootstrap and other resampliny plans, vol. 38. SIAM, 1982. [0387] [9]
EFRON, B. LargeScale Inference: Empirical Bayes Methods for Estimation,
Testing, and Prediction. Cambridge University Press, 2013. [0388] [10]
EFRON, B., AND MORRIS, C. Stein's estimation rule and its competitorsan
empirical bayes approach. Journal of the American Statistical Association
68, 341 (1973), 117130. [0389] [11] GERSTUNG, M., BAUDIS, M., MOCH, H.,
AND BEERENWINKEL, N. Quantifying cancer progression with conjunctive
bayesian networks. Bioinformatics 25, 21 (2009), 28092815. [0390] [12]
GERSTUNG, M., ERIKSSON, N., LIN, J., VOGELSTEIN, B., AND BEERENWINKEL, N.
The temporal order of genetic and pathway alterations in tumorigenesis.
PloS one 6, 11 (2011), e27136. [0391] [13] GUNAWAN, B., AND ET AL. An
oncogenetic tree model in gastrointestinal stromal tumours (gists)
identifies different pathways of cytogenetic evolution with prognostic
implications. The Journal of pathology 211, 4 (2007), 463470. [0392]
[14] HANAHAN, D., AND WEINBERG, R. A. The hallmarks of cancer. Cell 100,
1 (2000), 5770. [0393] [15] HANAHAN, D., AND WEINBERG, R. A. Hallmarks
of cancer: The next generation. Cell 144 (2011), 646674. [0394] [16]
HITCHCOCK, C. Probabilistic causation. In The Stanford Encyclopedia of
Philosophy, E. Zalta, Ed., winter 2012 ed. 2012. [0395] [17] HJELM, M.
New probabilistic network models and algorithms for oncogenesis.
[0396] Journal of Computational Biology, 853865 (13). [0397] [18] HUANG,
Q., Yu, G.; MCCORMICK, S., MO; J., DATTA, B., MAHIMKAR, M., LAZARUS, P.,
SCHAFFER, A. A., DESPER, R., AND SCHANTZ, S. Genetic differences detected
by comparative genomic hybridization in head and neck squamous cell
carcinomas from different tumor sites: construction of oncogenetic trees
for tumor progression. Genes, Chromosomes and Cancer 34, 2 (2002),
224233. [0398] [19] Ki, M., AND ET AL. Mapping the hallmarks of lung
adenocarcinoma with massively parallel sequencing. Cell 150, 6 (2012),
11071120. [0399] [20] IONITA, I., DARUWALA, R., AND MISHRA, B. Mapping
TumorSuppressor genes with multipoint statistics from
CopyNumberVariation data. American Journal of Human Genetics 79, 1
(July 2006), 1322. PMID: 16773561 PMCID: 1474131. [0400] [21] KAINU, T.,
AND ET AL. Somatic deletions in hereditary breast cancers implicate 13q21
as a putative novel breast cancer susceptibility locus. Proceedings of
the National Academy of Sciences 97,17 (2000), 96039608. [0401] [22]
KLEINBERG, S. Causality, Probability, and Time. Cambridge University
Press, 2012. [0402] [23] KNUTSEN, T., GOBU, V., KNAUS, R., PADILLANASH,
H., AUGUSTUS, M., STRAUSBERG, R., KIRSCH, I., SIROTKIN, K., AND RIED, T.
The interactive online sky/mfish & database and the entrez cancer
chromosomes search database: Linkage of chromosomal aberrations with the
genome sequence. Genes, Chromosomes and Cancer 44, 1 (2005), 5264.
[0403] [24] LONGERICH, T., MUELLER, M., BREUHAHN, K., SCHIRMACHER, P.,
BENNER, A., AND HEISS, C. Oncogenetic tree modeling of human
hepatocarcinogenesis. International Journal of Cancer 130, 3 (2012),
575583. [0404] [25] Luo, J., SOLIMINI, N. L., AND ELLEDGE, S. J.
Principles of cancer therapy: Oncogene and nononcogene addiction. Cell
136, 5 (March 2009), 823837. [0405] [26] PATHARE, S., SCHAFFER, A.,
BEERENWINKEL, N., AND MAHIMKAR, M. Construction of oncogenetic tree
models reveals multiple pathways of oral cancer progression.
International journal of cancer 124, 12 (2009), 28642871. [0406] [27]
RADMACHER, M., SIMON, R., DESPER, R., TAETLE, R., SCHAFFER, A., AND
NELSON, M. Graph models of oncogenesis with an application to melanoma.
Journal of theoretical biology 212, 4 (2001), 535548. [0407] [28]
REICHENBACH, H. The Direction of Time. University of California Press,
1956. [0408] [29] SAMUELSON, E.; KARLSSON; S., PARTHEEN, K., NILSSON, S.,
SZPIRER, C., AND BEHBOUDI, A. Baccgharray identified specific
smallscale genomic imbalances in diploid dmbainduced rat mammary
tumors. BMC cancer 12, 1 (2012), 352. [0409] [30] SUPPES, P. A
probabilistic theory of causality. North Holland Publishing Company,
1970. [0410] [31] TIBSHIRANI, R. Regression shrinkage and selection via
the lasso. Journal of the Royal Statistical Society: Series B 58,1
(1996), 267288. [0411] [32] VOGELSTEIN, B.; FEARON, E. R., HAMILTON, S.
R., KERN, S. E., PREISINGER, A. C., LEPPERT, M., SMITS, A. M., AND Bos,
J. L. Genetic alterations during colorectaltumor development. New
England Journal of Medicine 319, 9 (1988), 525532. [0412] [33]
VOGELSTEIN, B., AND KINZLER, K. Cancer genes and the pathways they
control. Nature medicine 10, 8 (2004), 789799. [0413] [34] XUE, W., AND
ET AL. A cluster of cooperating tumorsuppressor gene candidates in
chromosomal deletions. Proceedings of the National Academy of Sciences
109, 21 (2012), 82128217. [0414] [35] ZHANG, K., AND SHASHA, D. Simple
fast algorithms for the editing distance between trees and related
problems. SIAM journal on computing 18, 6 (1989), 12451262. [0415] [36]
P. M. Illari, F. Russo, and J. Williamson, eds., Causality in the
Sciences. Oxford University Press, 2011. [0416] [37] C. Hitchcock,
"Probabilistic causation," in The Stanford Encyclopedia of Philosophy (E.
N. Zalta, ed.), winter 2012 ed., 2012. [0417] [38] J. B. Haldane, The
Causes of Evolution. Princeton University Press, 1990. [0418] [39] D.
Hume, An Enquiry Concerning Human Understanding. 1748. [0419] [40] H.
Kyburg, "Discussion: Salmon's paper," Philosophy of Science, 1965. [0420]
[41] P. Suppes, A Probabilistic Theory of Causality. NorthHolland
Publishing Company, 1970. [0421] [42] H. Reichenbach, The Direction of
Time. University of California Press, 1956. [0422] [43] N. Cartwright,
Causal Laws and Effective Strategies. Noes, 1979. [0423] [44] B. Skyrms,
Causal Necessity. Yale University Press, 1980. [0424] [45] E. Fells,
Probabilistic Causality. Cambridge University Press, 1991. [0425] [46] J.
Pearl, Causality: Models, Reasoning, and Inference. Cambridge University
Press, 2000. [0426] [47] P. Menzies, "Counterfactual theories of
causation," in The Stanford Encyclopedia of Philosophy (E. N. Zalta,
ed.), spring 2014 ed., 2014. [0427] [48] D. Lewis, "Causation," Journal
of Philosophy, 1973. [0428] [49] J. Woodward, "Causation and
manipulability," in The Stanford Encyclopedia of Philosophy (E. N. Zalta,
ed.), winter 2013 ed., 2013. [0429] [50] D. Koller and N. Friedman,
Probabilistic Graphical Models: Principles and TechniquesAdaptive
Computation and Machine Learning. The MIT Press, 2009. [0430] [51] J.
Pearl, Probabilistic reasoning in intelligent systems: networks of
plausible inference. Morgan Kaufmann, 1988. [0431] [52] T. Verma and J.
Pearl, "Equivalence and synthesis of causal models," in Uncertainty in
Artifical Intelligence Proceedings of the Sixth Conference (M. Henrion,
R. Shachter, L. Kanal, and J. Lemmer, eds.), (San Francisco, Calif.,
USA), pp. 220227, Morgan Kaufmann, 1990. [0432] [53] D. M. Chickering,
"Learning bayesian networks is npcomplete," in Learning from data, pp.
121130, Springer, 1996. [0433] [54] D. M. Chickering, D. Heckerman, and
C. Meek, "Largesample learning of bayesian networks is nphard," The
Journal of Machine Learning Research, vol. 5, pp. 12871330, 2004. [0434]
[55] P. Spirtes, C. N. Glymour, and R. Scheines, Causation, prediction,
and search, vol. 81. MIT press, 2000. [0435] [56] I. Tsamardinos, C. F.
Aliferis, A. R. Statnikov, and E. Statnikov, "Algorithms for large scale
markov blanket discovery.," in FLAIRS Conference, vol. 2003, pp. 376381,
2003. [0436] [57] A. M. Carvalho, "Scoring functions for learning
bayesian networks," Inescid Tec. Rep, 2009. [0437] [58] N. Beerenwinkel,
N. Eriksson, and B. Sturmfels, "Conjunctive bayesian networks,"
Bernoulli, 2007. [0438] [59] M. Gerstung, M. Baudis, H. Moch, and N.
Beerenwinkel, "Quantifying cancer progression with conjunctive bayesian
networks," Bioinformatics, vol. 25, no. 21, pp. 28092815, 2009. [0439]
[60] T. K. Moon, "The expectationmaximization algorithm," Signal
processing magazine, IEEE, vol. 13, no. 6, pp. 4760, 1996. [0440] [61]
S. Kirkpatrick, "Optimization by simulated annealing: Quantitative
studies," Journal of statistical physics, vol. 34, no. 56, pp. 975986,
1984. [0441] [62] L. O. Loohuis, G. Caravagna, A. Graudenzi, D.
Ramazzotti, G. Mauri, M. Antoniotti, and B. Mishra, "Inferring tree
causal models of cancer progression with probability raising." Submitted
for publication (available at arXiv.org)., 2013. [0442] [63] R. Desper,
F. Jiang, O.P. Kallionierni, H. Moch, C. Papadimitriou, and A. Schaffer,
"Inferring tree models for oncogenesis from comparative genome
hybridization data," Journal of Computational Biology, 1999. [0443] [64]
N. Beerenwinkel, J. R.ahnenfiihrer, Ni. Daumer, D. Hoffmann, R. Kaiser,
J. Selbig, and T. Lengauer, "Learning multiple evolutionary pathways from
crosssectional data," Journal of Computational Biology, 2005. [0444]
[65] A. Szabo and K. Boucher, "Estimating an oncogenetic tree when false
negatives and positives are present," Mathematical biosciences, 2002.
[0445] [66] B. Efron, The Jackknife, the Bootstrap, and Other Resampling
Plans. SIAM, 1982. [0446] [67] B. Efron, LargeScale Inference: Empirical
Bayes Methods for Estimation, Testing, and Prediction. Cambridge
University Press, 2013. [0447] [68] H. B. Mann and D. R. Whitney, "On a
test of whether one of two random variables is stochastically larger than
the other," Annals of Mathematical Statistics, vol. 18, no. 1, pp. 5060,
1947. [0448] [69] G. Schwarz, "Estimating the dimension of a model,"
Annals of Statistics, 1978. [0449] [70] D. Heckerman, D. Geiger, and D.
Chickering, "Learning bayesian networks: The combination of knowledge and
statistical data," Machine Learning, 1995. [0450] [71] R. W. Hamming,
"Errordetecting and errorcorrecting codes," Bell System Technical
Journal, 1950. [0451] [72] "The cancer genome atlas."
http://cancergenome.nih.gov/, 2005. [0452] [73] M. Scutari, "Learning
bayesian networks with the bnlearn r package," Journal of Statistical
Software, 2010. [0453] [74] "The TRONCO package for translational
oncology." Available at standard R repositories. [0454] [75] "Hidden
conjunctive bayesian networks."
http://www.silva.bsse.ethz.ch/cbg/software/ctcbn. [0455] [76] D.
Margaritis, Learning Bayesian Network Model Structure from Data. PhD
thesis, School of Computer Science, CarnegieMellon University,
Pittsburgh, Pa., 2003. [0456] [77] H. S. Farahani and J. Lagergren,
"Learning oncogenetic networks by reducing to mixed integer linear
programming," PLoS ONE, 2013. [0457] [78] R. Piazza, S. Valletta, N.
Winkelmann, S. Redaelli, R. Spinelli, A. Pirola, L. Antolini, L. Mologni,
C. Donadoni, E. Papaemmanuil, S. Schnittger, D.W. Kim, J. Boultwood, F.
Rossi, G. Gaipa, G. P. D. Martini, P. F. di Celle, H. G. Jang, V. Fantin,
C. R. Bignell, V. Magistroni, T. Haferlach, E. M. Pogliani, P. J.
Campbell, A. J. Chase, W. J. Tapper, N. C. P. Cross, and C.
GambacortiPasserini, "Recurrent setbpl mutations in atypical chronic
myeloid leukemia," Nature Genetics, 2013. [0458] [79] M. Imielinski et
al., "Mapping the hallmarks of lung adenocarcinoma with massively
parallel sequencing," Cell, vol. 150, no. 6, pp. 11071120, 2012.
* * * * *