Register or Login To Download This Patent As A PDF
| United States Patent Application |
20110239315
|
| Kind Code
|
A1
|
|
Bonas; Ulla
;   et al.
|
September 29, 2011
|
MODULAR DNA-BINDING DOMAINS AND METHODS OF USE
Abstract
The present invention refers to methods for selectively recognizing a
base pair in a DNA sequence by a polypeptide, to modified polypeptides
which specifically recognize one or more base pairs in a DNA sequence
and, to DNA which is modified so that it can be specifically recognized
by a polypeptide and to uses of the polypeptide and DNA in specific DNA
targeting as well as to methods of modulating expression of target genes
in a cell.
| Inventors: |
Bonas; Ulla; (Halle, DE)
; Boch; Jens; (Halle, DE)
; Streubel; Jana; (Halle, DE)
|
| Serial No.:
|
019526 |
| Series Code:
|
13
|
| Filed:
|
February 2, 2011 |
| Current U.S. Class: |
800/13; 435/188; 435/252.3; 435/254.11; 435/320.1; 435/325; 435/419; 530/333; 536/23.1; 536/24.1; 800/295; 800/298 |
| Class at Publication: |
800/13; 435/188; 435/325; 435/419; 435/252.3; 435/254.11; 435/320.1; 530/333; 536/23.1; 536/24.1; 800/295; 800/298 |
| International Class: |
A01K 67/00 20060101 A01K067/00; C12N 9/96 20060101 C12N009/96; C12N 5/10 20060101 C12N005/10; C12N 1/15 20060101 C12N001/15; C12N 1/21 20060101 C12N001/21; C12N 15/63 20060101 C12N015/63; C07K 1/00 20060101 C07K001/00; C07H 21/04 20060101 C07H021/04; A01H 15/00 20060101 A01H015/00; A01H 5/00 20060101 A01H005/00 |
Foreign Application Data
| Date | Code | Application Number |
| Jan 12, 2009 | DE | 102009004659.3 |
| Jul 13, 2009 | EP | 09165328 |
Claims
1. A method for producing a polypeptide that selectively recognizes at
least one base pair in a target DNA sequence, the method comprising
synthesizing a polypeptide comprising a repeat domain, wherein the repeat
domain comprises at least one repeat unit derived from a transcription
activator-like (TAL) effector, wherein the repeat unit comprises a
hypervariable region which determines recognition of a base pair in the
target DNA sequence, wherein the repeat unit is responsible for the
recognition of one base pair in the DNA sequence, and wherein the
hypervariable region comprises a member selected from the group
consisting of: (a) HD for recognition of C/G; (b) NI for recognition of
A/T; (c) NG for recognition of T/A; (d) NS for recognition of C/G or A/T
or T/A or G/C; (e) NN for recognition of G/C or A/T; (f) IG for
recognition of T/A; (g) N for recognition of C/G or T/A; (h) HG for
recognition of C/G or T/A; (i) H for recognition of T/A; (j) NK for
recognition of G/C; (k) NH for recognition of G/C; (l) NP for recognition
of A/T or C/G or T/A; (m) NT for recognition of A/T or G/C; (n) HN for
recognition of A/T or G/C; (o) SH for recognition of G/C; (p) SN for
recognition of G/C; and (q) IS for recognition of A/T; wherein the repeat
domain comprises at least one repeat unit which comprises a hypervariable
region comprising (k), (l), (m), (n), (o), (p), or (q).
2. (canceled)
3. The method of claim 1, wherein the hypervariable region corresponds to
amino acids 12 and 13 in the repeat unit.
4. The method of claim 1, wherein the repeat domain comprises 1.5 to 40.5
repeat units.
5. The method of claim 1, wherein the repeat domain comprises 11.5 to
33.5 repeat units.
6. The method of claim 1, wherein the polypeptide further comprises at
least one additional domain that is operably linked to the repeat domain.
7. The method of claim 6, wherein the additional domain comprises a
bacterial, viral, fungal, oomycete, human, animal, plant, or artificial
protein, or part thereof.
8. The method of claim 6, wherein the additional domain comprises a
protein or functional part or domain thereof, that is capable of
modifying DNA or RNA.
9. The method of claim 6, wherein the additional domain comprises a
protein or functional part or domain thereof selected from the group
consisting of: a transcription activator, a transcription repressor, a
resistance-mediating protein, a nuclease, a topoisomerase, a ligase, an
integrase, a recombinase, a resolvase, a methylase, an acetylase, a
demethylase, and a deacetylase.
10. The method of claim 1, wherein the repeat domain of the polypeptide
is synthesized by expressing a DNA sequence encoding the polypeptide and
where the DNA sequence encoding the polypeptide is assembled by
preassembling the repeat units in one or more target vectors that can
subsequently be assembled into a final vector comprising the DNA sequence
encoding the polypeptide.
11. The method of claim 1, wherein the repeat unit comprises 30 to 40
amino acids.
12. The method of claim 11, wherein the repeat unit comprises 33, 34, 35
or 39 amino acids.
13. The method of claim 1, wherein the polypeptide recognizes at least 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, or 20 base
pairs in the target DNA sequence.
14. The method of claim 13, wherein the polypeptide recognizes all of the
base pairs in the target DNA sequence.
15. The method of claim 13, wherein the polypeptide is capable of binding
to the target DNA sequence.
16. A polypeptide produced by the method of claim 1.
17. The polypeptide of claim 16, wherein the polypeptide is not naturally
occurring.
18. A polynucleotide molecule comprising a coding sequence for the
polypeptide produced by the method of claim 1.
19. The polynucleotide molecule of claim 18, wherein the polynucleotide
molecule is not naturally occurring.
20. An expression cassette comprising a promoter operably linked to the
polynucleotide molecule of claim 18.
21. A non-human host cell comprising the expression cassette of claim 20.
22. The host cell of claim 21, wherein the host cell is a bacterial cell,
a fungal cell, an animal cell, or a plant cell.
23. A transformed, non-human organism comprising the expression cassette
of claim 20.
24. The transformed organism of claim 23, wherein the organism is a
fungus, an animal, or a plant.
25. A method for selectively recognizing a base pair in a DNA sequence by
a polypeptide, the method comprising constructing a polypeptide
comprising a repeat domain, wherein the repeat domain comprises at least
one repeat unit derived from a TAL effector, wherein the repeat unit
comprises a hypervariable region which determines recognition of a base
pair in the DNA sequence, wherein the repeat unit is responsible for the
recognition of one base pair in the DNA sequence, and wherein the
hypervariable region comprises a member selected from the group
consisting of: (a) HD for recognition of C/G; (b) NI for recognition of
A/T; (c) NG for recognition of T/A; (d) NS for recognition of C/G or A/T
or T/A or G/C; (e) NN for recognition of G/C or A/T; (f) IG for
recognition of T/A; (g) N for recognition of C/G or T/A; (h) HG for
recognition of C/G or T/A; (i) H for recognition of T/A; (j) NK for
recognition of G/C; (k) NH for recognition of G/C; (l) NP for recognition
of A/T or C/G or T/A; (m) NT for recognition of A/T or G/C; (n) HN for
recognition of A/T or G/C; (o) SH for recognition of G/C; (p) SN for
recognition of G/C; and (q) IS for recognition of A/T; wherein the repeat
domain comprises at least one repeat unit which comprises a hypervariable
region comprising (k), (l), (m), (n), (o), (p), or (q).
26. (canceled)
27. The method of claim 25, wherein the hypervariable region corresponds
to amino acids 12 and 13 in the repeat unit.
28. The method of claim 25, wherein the repeat domain comprises 1.5 to
40.5 repeat units.
29. The method of claim 25, wherein the repeat domain comprises 11.5 to
33.5 repeat units.
30. The method of claim 25, wherein the polypeptide further comprises at
one additional domain that is operably linked to the repeat domain.
31. The method of claim 25, wherein the repeat unit comprises 30 to 40
amino acids.
32. The method of claim 31, wherein the repeat unit comprises 33, 34, 35
or 39 amino acids.
33. The method of claim 25, wherein the repeat domain comprising repeat
units is inserted in a bacterial, viral, fungal, oomycete, human, animal
or plant polypeptide to achieve a targeted recognition and preferably
binding of one or more specified base pairs in a DNA sequence, and
optionally wherein the repeat unit is derived from the repeat domains of
AvrBs3-like effectors which are further optionally modified in order to
obtain a pre-selected specific activity to one or more base pairs in a
DNA sequence.
34. The method of claim 25, wherein the repeat domain comprising the
repeat unit is contained in a polypeptide controlling the transcription
of a gene, optionally in transcription activator or repressor proteins,
optionally in AvrBs3-like proteins, e.g. in AvrBs3 or Hax effector
proteins.
35. The method of claim 25, wherein the N-terminal region of a repeat
domain confers a recognition specificity for a T/A 5' of the recognition
specificity of the repeat unit.
36. The method of claim 25, wherein the base pair in the DNA sequence is
inserted into an expression control element combined with a gene, the
expression control element being targeted by a transcription control
protein comprising the hypervariable region in the repeat unit
recognizing the base pair located in the expression control element to
specifically control the expression of the gene, wherein the expression
control element is preferably a promoter.
37. The method of claim 36, wherein the gene is a resistance mediating
gene in order to obtain a disease resistant organism, the expression
control element being optionally the target sequence for an AvrBs3-like
effector protein.
38. A method of modulating expression of a target gene in a cell, wherein
cells are provided which contain a polypeptide wherein the polypeptide
comprises a repeat domain, wherein the repeat domain comprises at least
one repeat unit derived from a TAL effector, wherein the repeat unit
comprises a hypervariable region which determines recognition of a base
pair in a DNA sequence, wherein the repeat unit is responsible for the
recognition of one base pair in the DNA sequence, and wherein the
hypervariable region comprises a member selected from the group
consisting of: (a) HD for recognition of C/G; (b) NI for recognition of
A/T; (c) NG for recognition of T/A; (d) NS for recognition of C/G or A/T
or T/A or G/C; (e) NN for recognition of G/C or A/T; (f) IG for
recognition of T/A; (g) N for recognition of C/G or T/A; (h) HG for
recognition of C/G or T/A; (i) H for recognition of T/A; (j) NK for
recognition of G/C; (k) NH for recognition of G/C; (l) NP for recognition
of A/T or C/G or T/A; (m) NT for recognition of A/T or G/C; (n) HN for
recognition of A/T or G/C; (o) SH for recognition of G/C; (p) SN for
recognition of G/C; and (q) IS for recognition of A/T; wherein the repeat
domain comprises at least one repeat unit which comprises a hypervariable
region comprising (k), (l), (m), (n), (o), (p), or (q).
39. (canceled)
40. A polypeptide comprising a repeat domain, wherein the repeat domain
comprises at least one repeat unit derived from a TAL effector, wherein
the repeat unit comprises a hypervariable region which determines
recognition of a base pair in a DNA sequence, wherein the repeat unit is
responsible for the recognition of one base pair in the DNA sequence, and
wherein the hypervariable region comprises a member selected from the
group consisting of: (a) HD for recognition of C/G; (b) NI for
recognition of A/T; (c) NG for recognition of T/A; (d) NS for recognition
of C/G or A/T or T/A or G/C; (e) NN for recognition of G/C or A/T; (f) IG
for recognition of T/A; (g) N for recognition of C/G or T/A; (h) HG for
recognition of C/G or T/A; (i) H for recognition of T/A; (j) NK for
recognition of G/C; (k) NH for recognition of G/C; (l) NP for recognition
of A/T or C/G or T/A; (m) NT for recognition of A/T or G/C; (n) HN for
recognition of A/T or G/C; (o) SH for recognition of G/C; (p) SN for
recognition of G/C; and (q) IS for recognition of A/T; wherein the repeat
domain comprises at least one repeat unit which comprises a hypervariable
region comprising (k), (l), (m), (n), (o), (p), or (q).
41. (canceled)
42. A polynucleotide molecule comprising a coding sequence for the
polypeptide of claim 40.
43. A DNA which is modified to include a at least one base pair located
in a target DNA sequence so that the base pair can be specifically
recognized by a polypeptide comprising a repeat domain, wherein the
repeat domain comprises at least one repeat unit derived from a TAL
effector, wherein the repeat unit comprises a hypervariable region which
determines recognition of a base pair in the DNA sequence, wherein the
repeat unit is responsible for the recognition of one base pair in the
DNA sequence, and wherein the base pair is selected from the group
consisting of: (a) G/C for recognition of NH; (b) A/T or C/G or T/A for
recognition of NP; (c) A/T or G/C for recognition of NT; (d) A/T or G/C
for recognition of HN; (e) G/C for recognition of SH; (f) G/C for
recognition of SN; and (g) A/T for recognition of IS.
44. The DNA of claim 43, wherein at least one additional base pair is
selected from the group consisting of: (h) C/G for recognition by HD; (i)
A/T for recognition by NI; (j) T/A for recognition by NG; (k) CT or A/T
or T/A or G/C for recognition by NS; (l) G/C or A/T for recognition by
NN; (m) T/A for recognition by IG; (n) C/G or T/A for recognition by N;
(o) T/A for recognition by HG; (p) T/A for recognition by H; and (q) G/C
for recognition by NK.
45. The DNA of claim 43, wherein the base pair is located in a promoter
or other gene regulatory sequence.
46. The DNA of claim 43, wherein the DNA is not naturally occurring.
47. A vector comprising the DNA of claim 43.
48. A non-human host cell comprising the DNA of claim 43.
49. The host cell of claim 43, wherein the host cell is a bacterial cell,
a fungal cell, an animal cell, or a plant cell.
50. A transformed, non-human organism comprising the DNA of claim 43.
51. The transformed organism of claim 50, wherein the organism is a
fungus, an animal, or a plant.
52. A method for producing a DNA comprising a target DNA sequence that is
selectively recognized by a polypeptide comprising a repeat domain,
wherein the repeat domain comprises at least one repeat unit derived from
a TAL effector, wherein the repeat unit comprises a hypervariable region
which determines recognition of a base pair in the target DNA sequence,
wherein the repeat unit is responsible for the recognition of one base
pair in the target DNA sequence, the method comprising synthesizing a DNA
comprising a base pair that is capable of being recognized by the repeat
unit, and wherein the base pair is selected from the group consisting of:
(a) C/G for recognition by HD; (b) A/T for recognition by NI; (c) T/A for
recognition by NG; (d) CT or A/T or T/A or G/C for recognition by NS; (e)
G/C or A/T for recognition by NN; (f) T/A for recognition by IG; (g) C/G
or T/A for recognition by N; (h) T/A for recognition by HG; (i) T/A for
recognition by H; (j) G/C for recognition by NK; (k) G/C for recognition
of NH; (l) A/T or C/G or T/A for recognition of NP; (m) A/T or G/C for
recognition of NT; (n) A/T or G/C for recognition of HN; (o) G/C for
recognition of SH; (p) G/C for recognition of SN; and (q) A/T for
recognition of IS and wherein the repeat domain comprises at least one
repeat unit which comprises a hypervariable region comprising at least
one member selected from the group consisting of (i) NH for recognition
of G/C; (ii) NP for recognition of A/T or C/G or T/A; (iii) NT for
recognition of A/T or G/C; (iv) HN for recognition of A/T or G/C; (v) SH
for recognition of G/C; (vi) SN for recognition of G/C; and (vii) IS for
recognition of A/T.
53. (canceled)
Description
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation-in-part of U.S. Ser. No.
13/016,297, filed Jan. 28, 2011 which is a continuation of International
Application PCT/IB2010/000154, filed Jan. 12, 2010, which designates the
U.S and was published by the International Bureau in English on Jul. 15,
2010, and which claims the benefit of U.S. Provisional Patent Application
No. 61/225,043, filed Jul. 12, 2009, European (EP) Patent Applicaiton No.
09165328.7, filed Jul. 13, 2009, German (DE) Patent Application No.
102009004659.3, filed Jan. 12, 2009; all of which are hereby incorporated
herein in their entirety by reference.
TECHNICAL FIELD OF THE INVENTION
[0002] The present invention refers to methods for selectively recognizing
a base pair in a target DNA sequence by a polypeptide, to modified
polypeptides which specifically recognize one or more base pairs in a
target DNA sequence and, to DNA which is modified so that it can be
specifically recognized by a polypeptide and to uses of the polypeptide
and DNA in specific DNA targeting as well as to methods of modulating
expression of target genes in a cell.
BACKGROUND OF THE INVENTION
[0003] Phytopathogenic bacteria of the genus Xanthomonas cause severe
diseases on many important crop plants. The bacteria translocate an
arsenal of effectors including members of the large transcription
activator-like (TAL)/AvrBs3-like effector family via the type III
secretion system into plant cells (Kay & Bonas (2009)Curr. Opin.
Microbiol. 12:37-43, White & Yang (2009) Plant Physiol. doi:10.1104/pp.
1109.139360; Schornack et al. (2006) J. Plant Physiol. 163:256-272). TAL
effectors, key virulence factors of Xanthomonas, contain a central domain
of tandem repeats, nuclear localization signals (NLSs), and an activation
domain (AD) and act as transcription factors in plant cells (Kay et al.
(2007)Science 318:648-651; Romer et al. (2007) Science 318:645-648; Gu et
al. (2005)Nature 435, 1122-1125; FIG. 1a). The type member of this
effector family, AvrBs3 from Xanthomonas campestris pv. vesicatoria,
contains 17.5 repeats and induces expression of UPA (upregulated by
AvrBs3) genes including the Bs3 resistance gene in pepper plants (Kay et
al. (2007)Science 318:648-651; Romer et al. (2007)Science 318:645-648;
Marois et al. (2002)Mol. Plant-Microbe Interact. 15:637-646). The number
and order of repeats in a TAL effector determine its specific activity
(Herbers et al. (1992) Nature 356:172-174). The repeats were shown to be
essential for DNA-binding of AvrBs3 and constitute a novel DNA-binding
domain (Kay et al. (2007)Science 318:648-651). How this domain contacts
DNA and what determines specificity has remained enigmatic.
[0004] Selective gene expression is mediated via the interaction of
protein transcription factors with specific nucleotide sequences within
the regulatory region of the gene. The manner in which DNA-binding
protein domains are able to discriminate between different DNA sequences
is an important question in understanding crucial processes such as the
control of gene expression in differentiation and development.
[0005] The ability to specifically design and generate DNA-binding domains
that recognize a desired DNA target is highly desirable in biotechnology.
Such ability can be useful for the development of custom transcription
factors with the ability to modulate gene expression upon target DNA
binding. Examples include the extensive work done with the design of
custom zinc finger DNA-binding proteins specific for a desired target DNA
sequence (Choo et al. (1994) Nature 372:645; Pomerantz et al., (1995)
Science 267:93-96; Liu et al., Proc. Natl. Acad. Sci. USA 94:5525-5530
(1997); Guan et al. (2002)Proc. Natl. Acad. Sci. USA 99:13296-13301; U.S.
Pat. No. 7,273,923; U.S. Pat. No. 7,220,719). Furthermore, polypeptides
containing designer DNA-binding domains can be utilized to modify the
actual target DNA sequence by the inclusion of DNA modifying domains,
such as a nuclease catalytic domain, within the polypeptide. Examples of
such include the DNA binding domain of a meganuclease/homing endonuclease
DNA recognition site in combination with a non-specific nuclease domain
(see US Pat. Appl. 2007/0141038), modified meganuclease DNA recognition
site and/or nuclease domains from the same or different meganucleases
(see U.S. Pat. App. Pub. 20090271881), and zinc finger domains in
combination with a domain with nuclease activity, typically from a type
IIS restriction endonuclease such as FokI (Bibikova et al. (2003)Science
300:764; Urnov et al. (2005)Nature 435, 646; Skukla, et al. (2009) Nature
459, 437-441; Townsend et al. (2009)Nature 459:442445; Kim et al. (1996)
Proc. Natl. Acad. Sci. USA 93:1156-1160; U.S. Pat. No. 7,163,824). The
current methods utilized for identifying custom zinc finger DNA-binding
domains employ combinatorial selection-based methods utilizing large
randomized libraries (typically >10.sup.8 in size) to generate
multi-finger domains with desired DNA specificity (Greisman & Pabo (1997)
Science 275:657-661; Hurt et al. (2003)Proc Natl Acad Sci USA
100:12271-12276; Isalan et al. (2001)Nat Biotechnol 19:656-660. Such
methods are time intensive, technically demanding and potentially quite
costly. The identification of a simple recognition code for the
engineering of DNA-binding polypeptides would represent a significant
advancement over the current methods for designing DNA-binding domains
that recognize a desired nucleotide target.
BRIEF SUMMARY OF THE INVENTION
[0006] The present invention provides a method for producing a polypeptide
that selectively recognizes a base pair in a DNA sequence, the method
comprising synthesizing a polypeptide comprising a repeat domain, wherein
the repeat domain comprises at least one repeat unit derived from a
transcription activator-like (TAL) effector, wherein the repeat unit
comprises a hypervariable region which determines recognition of a base
pair in the DNA sequence, wherein the repeat unit is responsible for the
recognition of one base pair in the DNA sequence. These polypeptides of
the invention comprise repeat units of the present invention and can be
constructed by a modular approach by preassembling repeat units in target
vectors that can subsequently be assembled into a final destination
vector. The invention provides the polypeptide produced the this method
as well as DNA sequences encoding the polypeptides and host organisms and
cells comprising such DNA sequences.
[0007] The present invention provides a method for selectively recognizing
a base pair in a target DNA sequence by a polypeptide wherein said
polypeptide comprises at least a repeat domain comprising repeat units
wherein in said repeat units each comprise a hypervariable region which
determines recognition of a base pair in said target DNA sequence.
[0008] More specifically, the inventors have determined those amino acids
in a DNA-binding polypeptide responsible for selective recognition of
base pairs in a target DNA sequence. With elucidation of the recognition
code, a general principle for recognizing specific base pairs in a target
DNA sequence by selected amino acids in a polypeptide has been
determined. The inventors have found that distinct types of repeat units
that are part of a repeat unit array of varying length have the capacity
to recognize one defined/specific base pair. Within each repeat unit
forming a repeat domain, a hypervariable region is responsible for the
specific recognition of a base pair in a target DNA sequence.
[0009] Thus, the present invention provides not only a method for
selectively recognizing a base pair in a target DNA sequence by a
polypeptide comprising at least a repeat domain comprising repeat units
but also methods wherein target DNA sequences can be generated which are
selectively recognized by repeat domains in a polypeptide.
[0010] The invention also provides for a method for constructing
polypeptides that recognize specific DNA sequences. These polypeptides of
the invention comprise repeat units of the present invention and can be
constructed by a modular approach by preassembling repeat units in target
vectors that can subsequently be assembled into a final destination
vector.
[0011] The invention also provides a method for targeted modulation of
gene expression by constructing modular repeat units specific for a
target DNA sequence of interest, modifying a polypeptide by the addition
of said repeat units so as to enable said polypeptide to now recognize
the target DNA, introducing or expressing said modified polypeptide in a
prokaryotic or eurkaryotic cell so as to enable said modified polypeptide
to recognize the target DNA sequence, and modulation of the expression of
said target gene in said cell as a result of such recognition.
[0012] The invention also provides a method for directed modification of a
target DNA sequence by the construction of a polypeptide including at
least a repeat domain of the present invention that recognizes said
target DNA sequence and that said polypeptide also contains a functional
domain capable of modifying the target DNA (such as via site specific
recombination, restriction or integration of donor target sequences)
thereby enabling targeted DNA modifications in complex genomes.
[0013] The invention further provides for the production of modified
polypeptides including at least a repeat domain comprising repeat units
wherein a hypervariable region within each of the repeat units determines
selective recognition of a base pair in a target DNA sequence.
[0014] In a further embodiment of the invention, DNA is provided which
encodes for a polypeptide containing a repeat domain as described above.
[0015] In a still further embodiment of the invention, DNA is provided
which is modified to include one or more base pairs located in a target
DNA sequence so that said each of the base pairs can be specifically
recognized by a polypeptide including a repeat domain having
corresponding repeat units, each repeat unit comprising a hypervariable
region which determines recognition of the corresponding base pair in
said DNA.
[0016] In a still further embodiment of the invention, uses of those
polypeptides and DNAs are provided. Additionally provided are plants,
plant parts, seeds, plant cells and other non-human host cells
transformed with the isolated nucleic acid molecules of the present
invention and the proteins or polypeptides encoded by the coding
sequences of the present invention. Still further, the polypeptides and
DNA described herein can be introduced into animal and human cells as
well as cells of other organisms like fungi or plants.
[0017] In summary, the invention focuses on a method for selectively
recognizing base pairs in a target DNA sequence by a polypeptide wherein
said polypeptide comprises at least a repeat domain comprising repeat
units wherein each repeat unit contains a hypervariable region which
determines recognition of a base pair in said target DNA sequence wherein
consecutive repeat units correspond to consecutive base pairs in said
target DNA sequence.
BRIEF DESCRIPTION OF THE FIGURES
[0018] FIG. 1. Model for DNA-target specificity of TAL effectors.
(A) TAL effectors contain central tandem repeat units (red), nuclear
localization signals (NLS) and an activation domain (AD). Amino acid
sequence of the first repeat of AvrBs3. Hypervariable amino acids 12 and
13 are shaded in gray. (B) Hypervariable amino acids at position 12 and
13 of the 17.5 AvrBs3 repeat units are aligned to the UPA-box consensus
(21). (C) Repeat units of TAL effectors and predicted target sequences in
promoters of induced genes were aligned manually. Nucleotides in the
upper DNA strand that correspond to the hypervariable amino acids in each
repeat were counted based on the following combinations of eight
effectors and experimentally identified target genes: AvrBs3/Bs3, UPA10,
UPA12, UPA14, UPA19, UPA20, UPA21, UPA23, UPA25,
AvrBs3.DELTA.rep16/Bs3-E, AvrBs3.DELTA.rep109/Bs3, AvrHah1/Bs3,
AvrXa27/Xa27, PthXo1/Xa13, PthXo6/OsTFX1, PthXo7/OsTFIIA.gamma.1 (see
FIG. 5). Predominant combinations (n>4) are shaded in gray. An
asterisk indicates that amino acid 13 is missing in this repeat type. (D)
DNA target specificity code (R=A/G; N=A/C/G/T) of repeat types based on
the hypervariable amino acids 12 and 13 (experimentally proven in this
study).
[0019] FIG. 2. Target DNA sequences of Hax2, Hax3, and Hax4.
(A) Amino acids 12 and 13 of the Hax2, Hax3, and Hax4 repeat units and
predicted target DNA specificities (Hax-box). (B) Hax-boxes were cloned
in front of the minimal Bs4 promoter into a GUS reporter vector. (C)
Specific inducibility of the Hax-boxes by Hax effectors. GUS reporter
constructs were codelivered via A. tumefaciens into N. benthamiana with
35S-driven hax2, hax3, hax4, and empty T-DNA (-), respectively (error
bars indicate SD; n=3 samples; 4-MU, 4-methyl-umbelliferone). 35S::uidA
(+) served as control. Leaf discs were stained with X-Gluc
(5-bromo-4-chloro-3-indolyl-.beta.-D-glucuronide).
[0020] FIG. 3. DNA base pair recognition specificities of repeat types.
(A) Hax4- and ArtX-box-derivatives were cloned in front of the minimal
Bs4 promoter into a GUS reporter vector. (B) Specificity of NG-, HD-,
NI-, and NS-repeat units. Hax4-inducibility of Hax4-box derivatives
permutated in repeat type target bases (gray background). (C) Specificity
of NN-repeat units. Artificial effector ArtX1 and predicted target DNA
sequences. ArtX1-inducibility of ArtX1 box derivatives permutated in
NN-repeat target bases (gray background). (D) Artificial effectors ArtX2
and ArtX3 and derived DNA target sequences. (E) Specific inducibility of
ArtX-boxes by artificial effectors. (A)-(E) GUS reporter constructs were
co-delivered via A. tumefaciens into N. benthamiana with 35S-driven hax4,
artX1, artX2, or artX3 genes, and empty T-DNA (-), respectively.
35S::uidA (+) served as control. Leaf discs were stained with X-Gluc. For
quantitative data see FIG. 11.
[0021] FIG. 4. A minimal number of repeat units is required for
transcriptional activation.
(A) Artificial ArtHD effectors with different numbers (0.5-15.5) of
HD-repeat units (total 1.5 to 16.5 repeat units). (B) An ArtHD target box
consisting of TA and 17 C was cloned in front of the minimal Bs4 promoter
into a GUS reporter vector. (C) Promoter activation by ArtHD effectors
with different number of repeat units. 35S-driven effector gene or empty
T-DNA (-) were codelivered via A. tumefaciens with the GUS-reporter
construct into N. benthamiana (error bars indicate SD; n=3 samples;
4-MU). 35S::uidA (+) served as control. Leaf discs were stained with
X-Gluc.
[0022] FIG. 5. Alignment of DNA target sequences in promoters of induced
genes with the hypervariable amino acids 12 and 13 of TAL effector repeat
units.
(A) Repeat units of AvrBs3, AvrBs3.DELTA.rep16, AvrBs3.DELTA.rep109, and
AvrHah1 were aligned to the UPA-box in the promoter of the pepper ECW-30R
Bs3 gene (accession: EU078684). AvrBs3.DELTA.rep16 and
AvrBs3.DELTA.rep109 are deletion derivatives of AvrBs3 in which repeat
units 11-14 and repeat units 12-14 were deleted, respectively. AvrBs3,
AvrBs3.DELTA.rep109, and AvrHah1, but not AvrBs3.DELTA.rep16 induce the
HR in ECW-30R plants. (B) Repeat units of AvrBs3, AvrBs3.DELTA.rep16,
AvrBs3.DELTA.rep109, and AvrHah1 were aligned to the non-functional
UPA-box in the promoter of the pepper ECW Bs3-E gene (accession:
EU078683). AvrBs3.DELTA.rep16, but not AvrBs3, AvrBs3.DELTA.rep109, or
AvrHah1 induce the HR in pepper ECW plants. (C) Repeat units of AvrXa27
were aligned to a putative target sequence in the promoter of the rice
Xa27 gene. Xa27 (accession: AY986492) is induced by AvrXa27 in rice
cultivar IRBB27 leading to an HR, but not xa27 (accession: AY986491) in
rice cultivar IR24. (D) Repeat units of PthXo1 were aligned to a putative
target sequence in the promoter of the rice Xa13/Os8N3 gene. Xa13
(accession: DQ421396) is induced by PthXo1 in rice cultivar IR24 leading
to susceptibility, but not xal3 (accession: DQ421394) in rice cultivar
IRBB13. (E) Repeat units of PthXo6 were aligned to a putative target
sequence in the promoter of the rice OsTFX1 gene (accession: AK108319).
OsTFX1 is induced by PthXo6 in rice cultivar IR24. (F) Repeat units of
PthXo7 were aligned to a putative target sequence in the promoter of the
rice OsTFIIA.gamma.1 gene (CB097192). OsTFIIA.gamma.1 is induced by
PthXo7 in rice cultivar IR24. (A)-(F) Numbers above the DNA sequences
indicate nucleotide distance to the first ATG in the coding region.
Repeat/base combinations not matching our predicted target specificity
(amino acids 12/13: NI=A; HD=C; NG=T; NS=A/C/G/T; NN=A/G; IG=T) are
coloured in red. Repeat units with unknown target DNA specificity are
coloured in green.
[0023] FIG. 6. The DNA region protected by AvrBs3.DELTA.rep16 is 4 bp
shorter than with AvrBs3.
[0024] Summary of DNaseI footprint analyses with AvrBs3 and
AvrBs3.DELTA.rep16 (see FIGS. 7, 8).
(A) Bs3 (top) and Bs3-E (middle) promoter sequences protected by AvrBs3
and AvrBs3.DELTA.rep16, respectively. DNaseI footprinting revealed that
AvrBs3 protected 37 nucleotides of the sense strand and 36 nucleotides of
the antisense strand of the Bs3 promoter, and AvrBs3.DELTA.rep16
protected 30 nucleotides of the sense strand and 32 nucleotides of the
antisense strand of the Bs3-E promoter. The UPA-box and the predicted
AvrBs3.DELTA.rep16-box are underlined. UPA20-ubm-r16 (lower part)
promoter sequences protected by AvrBs3 and AvrBs3.DELTA.rep16. The
UPA20-ubmr16 promoter is a UPA20 promoter derivative with a 2 bp
substitution (GA to CT, bold italic) that results in recognition by both,
AvrBs3 and AvrBs3.DELTA.rep16. DNaseI footprinting revealed that 35
nucleotides of the sense strand and 34 nucleotides of the antisense
strand are protected by AvrBs3 (UPA-box is underlined), and 31
nucleotides of the sense strand and 32 nucleotides of the antisense
strand are protected by AvrBs3.DELTA.rep16 (AvrBs3.DELTA.rep16-box is
underlined). DNA regions shaded in green (AvrBs3) or red
(AvrBs3.DELTA.rep16) refer to the core footprints which were protected by
AvrBs3 and AvrBs3.DELTA.rep16, respectively, in every experiment, even
with low protein amounts (equal molarity of DNA and protein dimers). DNA
regions shaded in gray refer to nucleotides which were not protected in
all of the 4 experiments at all protein concentrations by the given
proteins. Please note that the 5' ends of the AvrBs3- and
AvrBs3.DELTA.rep16-protected regions are identical. Dashed vertical lines
indicate the differences between the 3' ends of the AvrBs3- and
AvrBs3.DELTA.rep16-protected promoter regions which corroborates our
model that one repeat contacts one base pair in the DNA. (B) Alignment of
AvrBs3 and AvrBs3.DELTA.rep16 target DNA sequences in the UPA20-ubm-r16
promoter with AvrBs3 and AvrBs3.DELTA.rep16 repeat regions (hypervariable
amino acids at position 12 and 13). Repeat/base combinations not matching
our predicted target specificity (amino acids 12/13: NI=A; HD=C; NG=T;
NS=A/C/G/T) are coloured in red.
[0025] FIG. 7. Bs3 and Bs3-E promoter sequences protected by AvrBs3 and
AvrBs3.DELTA.rep16, respectively.
[0026] A representative DNaseI footprint experiment is shown. AvrBs3
DNaseI footprint on the Bs3 promoter sequence (A, upper/sense DNA strand;
B, lower/antisense DNA strand). AvrBs3.DELTA.rep16 DNaseI footprint on
the Bs3-E promoter sequence (C, upper, sense DNA strand; D, lower
antisense DNA strand).
(A)-(D) (top) Fluorescently labelled PCR product was incubated with a
5.times. molar excess (calculated for protein dimers) of His6::AvrBs3,
His6::AvrBs3.DELTA.rep16, and BSA, respectively, treated with DNaseI and
analyzed on a capillary sequencer. The y axis of the electropherogram
shows the relative fluorescence intensity corresponding to the
5'-6-FAM-labelled sense strand (a, c) or the 5'-HEX-labelled antisense
strand (b, d) of the PCR product on an arbitrary scale. The traces for
the reactions with His6::AvrBs3 (green) or His 6::AvrBs3.DELTA.rep16
(red), respectively, and BSA (black, negative control) were superimposed.
A reduction of peak height in the presence of AvrBs3 or
AvrBs3.DELTA.rep16, respectively, in comparison to the negative control
corresponds to protection. The protected region is indicated by green
(AvrBs3) or red (AvrBs3.DELTA.rep16) vertical lines, (middle)
Electropherogram of the DNA sequence. Orange coloured peaks with numbers
correspond to the DNA nucleotide size standard. The predicted target
boxes of the effectors in the DNA sequence are underlined. Nucleotides
covered are marked by a green (AvrBs3) or red (AvrBs3.DELTA.rep16) box.
Numbers below refer to nucleotide positions relative to the transcription
start (+1) in the presence of AvrBs3 (a, b) or AvrBs3.DELTA.rep16 (c, d),
respectively. (bottom) DNA PCR product used for DNaseI footprinting,
amplified from the Bs3 (a, b) or Bs3-E (c, d) promoters, respectively.
The protected regions on the single DNA strands are indicated by gray
boxes. Numbers below refer to nucleotide positions relative to the
transcription start (+1) in the presence of AvrBs3 (a, b) or
AvrBs3.DELTA.rep16 (c, d), respectively. The experiments were repeated
three times with similar results.
[0027] FIG. 8. UPA20-ubm-r16 promoter sequence protected by AvrBs3 and
AvrBs3.DELTA.rep16.
[0028] A representative DNaseI footprint experiment. AvrBs3 and
AvrBs3.DELTA.rep16 DNaseI footprint on the UPA20-ubm-r16 promoter
sequence (A), upper, sense DNA strand; (B) lower, antisense DNA strand).
(top Fluorescently labelled PCR product was incubated with a 5.times.
molar excess of His6::AvrBs3, His6::AvrBs3.DELTA.rep16 and BSA
(calculated for protein dimers), respectively, treated with DNaseI and
analyzed on a capillary sequencer. The y axis of the electropherogram
shows the relative fluorescence intensity corresponding to the
5'-6-FAM-labelled sense strand (a) or the 5'-HEX-labelled antisense
strand (b) of the PCR product on an arbitrary scale. The traces for the
reactions with His6::AvrBs3 (green), His6::AvrBs3.DELTA.rep16 (red) and
the negative control BSA (black) were superimposed. A reduction of peak
height in the presence of AvrBs3 and AvrBs3.DELTA.rep16 in comparison to
the negative control corresponds to protection. The protected regions are
indicated by green (AvrBs3) and red (AvrBs3.DELTA.rep16) vertical lines.
(middle) Electropherogram of the DNA sequence. Orange coloured peaks with
numbers correspond to the DNA nucleotide size standard. Nucleotides
covered by AvrBs3 are marked by green lines and a green box (with the UPA
box underlined), nucleotides covered by AvrBs3.DELTA.rep16 are marked by
red lines and a red box (with the AvrBs3.DELTA.rep16-box underlined). The
UPA20-ubm-r16 mutation (GA to CT) is indicated in italics. (bottom) DNA
PCR product used for DNaseI footprinting, amplified from the
UPA20-ubm-r16 promoter. The protected regions on the single DNA strands
are indicated by gray boxes. Numbers below refer to nucleotide positions
relative to the transcription start (+1) of the UPA20 wildtype promoter
in the presence of AvrBs3. The experiment was repeated three times with
similar results.
[0029] FIG. 9. GUS reporter constructs.
[0030] Target DNA sequences (TAL effector-box) were inserted 5' of the
minimal tomato Bs4 promoter (41) (pBs4; -50 to +25) sequence and
transferred by GATEWAY recombination into the A. tumefaciens T-DNA vector
pGWB330 constructing a fusion to a promoterles uidA
(.beta.-glucuronidase, GUS) gene. attB1, attB2; GATEWAY recombination
sites.
[0031] FIG. 10. Recognition specificity of the putative repeat 0 in Hax3.
(A) Amino acids 12 and 13 of Hax3-repeat units and four possible target
Hax3-boxes with permutations in the position corresponding to repeat 0.
(B) The target boxes were cloned in front of the minimal tomato Bs4
promoter into a GUS reporter vector. (C) GUS activities with 35S-driven
hax3 or empty T-DNA (-) codelivered via A. tumefaciens with the GUS
reporter constructs into N. benthamiana leaf cells (4-MU,
4-methyl-umbelliferone; n=3; error bars indicate SD). For qualitative
assays, leaf discs were stained with X-Gluc. The experiment was performed
twice with similar results.
[0032] FIG. 11. DNA base pair recognition specificities of repeat types.
[0033] Hax4- and ArtX-box-derivatives were cloned in front of the minimal
Bs4 promoter into a GUS reporter vector. Quantitative data to FIG. 3.
(A) Specificity of NG-, HD-, NI-, and NS-repeat units. Hax4-inducibility
of Hax4-box derivatives permutated in repeat type target bases. (B)
Specificity of NN-repeat units. ArtX1-inducibility of ArtX1-box
derivatives permutated in NN-repeat target bases. (C) Specific
inducibility of ArtX-boxes by artificial effectors ArtX1, ArtX2, and
ArtX3, respectively.
[0034] (A)-(C) GUS reporter constructs were codelivered via A. tumefaciens
into N. benthamiana leaf cells together with 35S-driven hax4, artX1,
artX2, artX3 genes (gray bars), and empty T-DNA (a, b, white bars; c, -),
respectively (n=3; error bars indicate SD). 35S::uidA (+) served as
control. The experiments were performed three times with similar results.
[0035] FIG. 12. Predicted target DNA sequences for AvrXa10.
(A) Amino acids 12 and 13 of the AvrXa10-repeat units and two possible
target boxes with predicted NN type repeat-specificity A or G. (B)
AvrXa10 target boxes were cloned in front of the minimal Bs4 promoter
into a GUS reporter vector. (C) GUS assay of 35S-driven avrXa10, hax3
(specificity control), or empty T-DNA (-) codelivered via A. tumefaciens
with GUS reporter constructs into N. benthamiana leaf cells. 35S::uidA
(+) served as constitutive control (n=3; error bars indicate SD). For
qualitative assays, leaf discs were stained with X-Gluc. The experiment
was performed three times with similar results.
[0036] FIG. 13. Recognition specificity of the repeat type IG in Hax2.
(A) Amino acids 12 and 13 of Hax2 repeat units and four possible target
Hax2-boxes for repeat type IG. (B) The Hax2 target boxes were cloned in
front of the minimal Bs4 promoter into a GUS reporter vector. (C) GUS
assay of 35S promoter-driven hax2 or empty T-DNA (-) codelivered via A.
tumefaciens with the GUS reporter constructs into N. benthamiana leaf
cells. 35S::uidA (+) served as constitutive control (n=3; error bars
indicate SD. For qualitative assays, leaf discs were stained with X-Gluc.
The experiment was performed three times with similar results.
[0037] FIG. 14. Hax2 induces expression of PAP1 in A. thaliana.
(A) Leaves of A. thaliana were inoculated with A. tumefaciens strains
delivering T-DNA constructs for 35S-driven expression of hax2, hax3, and
hax4, respectively. Expression of hax2, but not of hax3 and hax4 induced
purple pigmentation suggestive of anthocyanin production. The p
hotograph
was taken 7 days post inoculation. (B) Transgenic A. thaliana line
carrying hax2 under control of an ethanol-inducible promoter. Plants of a
segregating T2 population were sprayed with 10% ethanol to induce
expression of the transgene. Only hax2-transgenic plants accumulated
anthocyanin. The p
hotograph was taken 6 days post treatment. (C)
Semiquantitative RT-PCR of hax2 (29 cycles), PAP1 (32 cycles), and
elongation factor Tu (EF-Tu, 32 cycles) with cDNA from hax2-transgenic
plants of three independent A. thaliana lines before (-) and 24 h after
(+) spraying with 10% ethanol. (D) Amino acids 12 and 13 of Hax2 repeat
units and target DNA sequence of Hax2. (E) The promoter of PAP1 from A.
thaliana Col-0 contains an imperfect Hax2-box. Mismatches to the
predicted Hax2-box are coloured in red. A putative TATA-box, the natural
transcription start site (+1), and the first codon of the PAP1 coding
sequence are indicated.
[0038] FIG. 15. Table I. Predicted DNA target sequences of TAL effectors
[0039] The table shows repeat sequences of TAL effectors and the predicted
DNA target sequences used from amino acids 12 and 13 of the repeat units.
[0040] The annotations show:
(A) Xcv, Xanthomonas campestris pv. vesicatoria; Xg, Xanthomonas
gardneri; Xca, Xanthomonas campestris pv. armoraciae; Xoo, Xanthomonas
oryzae pv. oryzae; Xac, Xanthomonas axonopodis pv. citri; Xau,
Xanthomonas citri pv. aurantifolii; Xcm, Xanthomonas campestris pv.
malvacearum; Xam, Xanthomonas axonopodis pv. manihotis; Xoc, Xanthomonas
oryzae pv. oryzicola (B) A star (*) indicates a deletion of amino acid
13 (C) Target DNA specificity deduced from amino acids 12 and 13 of the
repeat units. A thymidine nucleotide is added at the 5' end due to the
specificity of the putative repeat 0. The sequence of the upper (sense)
strand of the double stranded DNA is given in ambiguous code (R=A/G;
N=A/C/G/T; .cndot.=unknown specificity)
[0041] FIG. 16. Protein sequences of AvrBs3, Hax2, Hax3, Hax4
[0042] For each of the protein sequences, the N-terminus, C-terminus as
well as the single repeat sequences are shown.
[0043] FIG. 17. The effector ARTBs4 induces expression of the minimal Bs4
promoter
(A) Amino acids 12 and 13 of the Hax4 repeat units and predicted target
DNA specificity (Hax4 box). The Hax4(mut) box contains four base pair
exchanges in comparison to the Hax4 box. (B) Amino acids 12 and 13 of the
artificial effector ARTBs4 repeat units and predicted target DNA
specificity (ARTBs4 box). (C) The Hax4 box was cloned in front of the
minimal Bs4 promoter into a GUS reporter vector. The ARTBs4 box is
naturally present in the minimal Bs4 promoter. (D) Specific inducibility
of the Hax4 and ARTBs4 boxes by Hax4 and ARTBs4, respectively. GUS
reporter constructs were codelivered via Agrobacterium tumefaciens into
N. benthamiana with 35S-driven hax4 (grey bars), ARTBs4 (white bars) and
empty T-DNA (ev, black bars), respectively (error bars indicate SD).
4-MU, 4-methyl-umbelliferone. 35S::uidA (GUS, grey bar) served as
control. Leaf disks were stained with X-Gluc
(5-bromo-4-chloro-3-indolyl-.beta.-D-glucuronide).
[0044] FIG. 18. Diagram for "Golden gate" cloning of repeat domains and
effectors
(A) Building blocks consisting of individual repeat units (or other
protein domains) are subcloned with flanking type II restriction enzyme
target sites (e.g. BsaI) that generate specific overhangs. Matching
overhangs are indicated with identical letters (A to O). Different repeat
types are cloned as building blocks for each position (e.g. repeat 1,
repeat 2, etc.). The repeat specificities are: NI=A, HD=C, NG=T, NN=G or
A. (B) The building blocks are assembled into a target vector by ligation
of matching overhangs using "Golden gate" cloning (restriction-ligation).
In general, the resulting assembly product does not contain any of the
target sites used for cloning.
[0045] FIG. 19. Alternative method for generation of designer effectors
via Golden Gate cloning
[0046] FIGS. 19 A-D depict various vectors described in the methods
disclosed in Example 3 below as well as provide a schematic of the
method.
[0047] FIG. 20. Experiments to analyze novel repeat specificities
[0048] Artificial TALs were assembled with the first six repeats of the
TAL Hax3. Repeat 7 to 11.5 were assembled using one repeat type with
unknown specificity. Four possible target DNA boxes were used containing
six A, C, G, or T, respectively. Similarly, artificial TALs and reporter
were constructed with 2, 3, or 4 repeats to test. The target DNA boxes
were inserted into the Bs4 minimal promoter upstream of a promoterless
uidA reporter gene.
[0049] FIG. 21. TAL repeat specificities
[0050] Agrobacterium-mediated expression of artificial TALs and
corresponding reporter constructs in Nicotiana benthamiana. Leaf disks
were sampled two days post transformation, stained for GUS reporter
activity and destained with ethanol. A blue colour indicates expression
of the reporter construct and therefore, an activity of the TAL. Empty
vector (ev) and constitutively expressed GUS were used as negative
control, respectively. Novel repeat specificities are colored in red.
Repeat types with strong DNA recognition properties are: NH, NP, NT, and
HN. Repeat types with weak DNA recognition properties are: NG, N*, NK,
SH, SN, IS.
[0051] FIG. 22. Quantitative analysis of known repeat specificities.
[0052] Artificial TALs were assembled with the first six repeats of the
TAL Hax3. Repeat 7 to 11.5 were assembled using one repeat type. Four
possible target DNA boxes were used containing six A, C, G, or T,
respectively upstream of the Bs4 minimal promoter and a promoterless uidA
reporter gene. The data show that repeat type NN has much stronger
DNA-recognition properties than the other repeat types. Repeat type NI is
very weak and does not show a preference in this setup. Repeat type NS
was shown to recognice all four DNA bases, before, but does show a
preference for A and G, here. EV: empty vector control.
[0053] FIG. 23. Quantitative analysis of novel repeats with multiple
specificities
[0054] Quantitative analysis of novel repeats with multiple specificities.
Artificial TALs were assembled with the first six repeats of the TAL
Hax3. Repeat 7 to 11.5 were assembled using one repeat type. Four
possible target DNA boxes were used containing six A, C, G, or T,
respectively upstream of the Bs4 minimal promoter and a promoterless uidA
reporter gene (see, FIG. 20).
[0055] FIG. 24. Quantitative analysis of novel repeats with only one
specificity
[0056] Artificial TALs were assembled with the first six repeats of the
TAL Hax3. Repeat 7 to 11.5 were assembled using one repeat type. Four
possible target DNA boxes were used containing six A, C, G, or T,
respectively upstream of the Bs4 minimal promoter and a promoterless uidA
reporter gene. The data show that repeat type NH is much stronger than
repeat type NK, but also recognizes only one specific base (G).
[0057] FIG. 25. Quantitative analysis of novel repeats with novel
specificities
[0058] Artificial TALs were assembled with the first six repeats of the
TAL Hax3. Repeat 7 to 11.5 were assembled using one repeat type. Four
possible target DNA boxes were used containing six A, C, G, or T,
respectively upstream of the Bs4 minimal promoter and a promoterless uidA
reporter gene. These repeat types show only very low activity in the
reporter assay, likely due to their weak DNA interaction potential.
[0059] FIG. 26. Experimental setup to study specificity of repeat types
with low DNA recognition potential
[0060] The artificial effectors were assembled to contain 6, 4, 3, or 2
repeats, respectively, with unknown specificity (designated XX) in
addition to Hax3 repeats. Target boxes in the reporter constructs contain
A, C, G, or T, respectively, at positions corresponding to the "XX"
repeats. The rest of the target DNA boxes is equivalent to the Hax3 box.
[0061] FIG. 27A-C. Experimental setup to study specificity of repeat types
with low DNA recognition potential
[0062] The artificial effectors were assembled to contain 4, 3, or 2
repeats, respectively, as "test repeats" with unknown specificity
(designated X) in addition to Hax3 repeats (see, FIG. 26 for details).
Target boxes in the reporter constructs contain A, C, G, or T,
respectively, at positions corresponding to the test repeats. The rest of
the target DNA boxes is equivalent to the Hax3 box. Although TALs with
four or more combined N* repeats do not show a specificity, a combination
of three or two N* repeats indicates a specificity for T, or T and C,
respectively. N* and NI are obviously repeat types with weak DNA
recognition properties. FIG. 27A: HD; FIG. 27B: N*; and FIG. 27C: NI.
SEQUENCE LISTING
[0063] The nucleotide and amino acid sequences listed in the accompanying
figures and the sequence listing are shown using standard letter
abbreviations for nucleotide bases, and one-letter code for amino acids.
The nucleotide sequences follow the standard convention of beginning at
the 5' end of the sequence and proceeding forward (i.e., from left to
right in each line) to the 3' end. Only one strand of each nucleic acid
sequence is shown, but the complementary strand is understood to be
included by any reference to the displayed strand. The amino acid
sequences follow the standard convention of beginning at the amino
terminus of the sequence and proceeding forward (i.e., from left to right
in each line) to the carboxy terminus.
DETAILED DESCRIPTION OF THE INVENTION
[0064] The present invention now will be described more fully hereinafter
with reference to the accompanying drawings, in which some, but not all
embodiments of the inventions are shown. Indeed, these inventions may be
embodied in many different forms and should not be construed as limited
to the embodiments set forth herein; rather, these embodiments are
provided so that this disclosure will satisfy applicable legal
requirements. Like numbers refer to like elements throughout.
[0065] Many modifications and other embodiments of the inventions set
forth herein will come to mind to one skilled in the art to which these
inventions pertain having the benefit of the teachings presented in the
foregoing descriptions and the associated drawings. Therefore, it is to
be understood that the inventions are not to be limited to the specific
embodiments disclosed and that modifications and other embodiments are
intended to be included within the scope of the appended claims. Although
specific terms are employed herein, they are used in a generic and
descriptive sense only and not for purposes of limitation.
[0066] A number of terms that are used throughout this disclosure are
defined hereinbelow.
[0067] The term "repeat domain" is used to describe the DNA recognition
domain from a TAL effector, or artificial version thereof that is made
using the methods disclosed, consisting of modular repeat units that when
present in a polypeptide confer target DNA specificity. A repeat domain
comprised of repeat units can be added to any polypeptide in which DNA
sequence targeting is desired and are not limited to use in TAL
effectors.
[0068] The term "repeat unit" is used to describe the modular portion of a
repeat domain from a TAL effector, or an artificial version thereof, that
contains one amino acid or two adjacent amino acids that determine
recognition of a base pair in a target DNA sequence. Repeat units taken
together recognize a defined target DNA sequence and constitute a repeat
domain. Repeat units can be added to any polypeptide in which DNA
sequence targeting is desired and are not limited to use in TAL
effectors.
[0069] The term "recognition code" is used to describe the relationship
between the amino acids in positions 12 and 13 of a repeat unit and the
corresponding DNA base pair in a target DNA sequence that such amino
acids confer recognition of, as follows: HD for recognition of C/G; NI
for recognition of A/T; NG for recognition of T/A; NS for recognition of
C/G or A/T or T/A or G/C; NN for recognition of G/C or A/T; IG for
recognition of T/A; N for recognition of C/G or T/A; HG for recognition
of C/G or T/A; H for recognition of T/A; NK for recognition of G/C; NH
for recognition of G/C; NP for recognition of A/T, C/G, or T/A; NT for
recognition of A/T or G/C; NH for recognition of A/T or G/C; SH for
recognition of G/C; SN for recognition of G/C; and IS for recognition of
A/T. Additional specificities for the amino acids in positions in
positions 12 and 13 of a repeat unit and the corresponding DNA base pair
in a target DNA sequence have been reported: HA for recognition of C/G;
ND for recognition of C/G; HI for recognition of C/G; HN for recognition
of G/C; and NA for recognition of G/C (Moscou & Bogdanove (2009) Science
326:1501).
[0070] As used herein, "effector" (or "effector protein" or "effector
polypeptide") refers to constructs or their encoded polypeptide products
in which said polypeptide is able to recognize a target DNA sequence. The
effector protein includes a repeat domain comprised of 1.5 or more repeat
units and also may include one or more functional domains such as a
regulatory domain. In preferred embodiments of the invention, the
"effector" is additionally capable of exerting an effect, such as
regulation of gene expression. Although the present invention is not
dependent on a particularly biological mechanism, it is believe that the
proteins or polypeptides of the invention that recognize a target DNA
sequence bind to the target DNA sequence.
[0071] The term "naturally occurring" is used to describe an object that
can be found in nature as distinct from being produced by man. For
example, a polypeptide or polynucleotide sequence that is present in an
organism (including viruses) that can be isolated from a source in nature
and which has not been intentionally modified by man in the laboratory is
naturally occurring. Generally, the term naturally occurring refers to an
object as-present in a wild-type individual, such as would be typical for
the species.
[0072] The terms "modulating expression" "inhibiting expression" and
"activating expression" of a gene refer to the ability of a polypeptide
of the present invention to activate or inhibit transcription of a gene.
Activation includes prevention of subsequent transcriptional inhibition
(i.e., prevention of repression of gene expression) and inhibition
includes prevention of subsequent transcriptional activation (i.e.,
prevention of gene activation). Modulation can be assayed by determining
any parameter that is indirectly or directly affected by the expression
of the target gene. Such parameters include, e.g., changes in RNA or
protein levels, changes in protein activity, changes in product levels,
changes in downstream gene expression, changes in reporter gene
transcription (luciferase, CAT, beta-galactosidase, GFP (see, e.g.,
Mistili & Spector (1997) Nature Biotechnology 15:961-964); changes in
signal transduction, phosphorylation and dephosphorylation,
receptor-ligand interactions, second messenger concentrations (e.g.,
cGMP, cAMP, IP3, and Ca2+), cell growth, neovascularization, in vitro, in
vivo, and ex vivo. Such functional effects can be measured by any means
known to those skilled in the art, e.g., measurement of RNA or protein
levels, measurement of RNA stability, identification of downstream or
reporter gene expression, e.g., via chemiluminescence, fluorescence,
calorimetric reactions, antibody binding, inducible markers, ligand
binding assays; changes in intracellular second messengers such as cGMP
and inositol triphosphate (IP3); changes in intracellular calcium levels;
cytokine release, and the like.
[0073] A "regulatory domain" refers to a protein or a protein subsequence
that has transcriptional modulation activity. Typically, a regulatory
domain is covalently or non-covalently linked to a polypeptide of the
present invention to modulate transcription. Alternatively, a polypeptide
of the present invention can act alone, without a regulatory domain, or
with multiple regulatory domains to modulate transcription. Transcription
factor polypeptides from which one can obtain a regulatory domain include
those that are involved in regulated and basal transcription. Such
polypeptides include transcription factors, their effector domains,
coactivators, silencers, nuclear hormone receptors (see, e.g., Goodrich
et al. (1996) Cell 84:825 30 for a review of proteins and nucleic acid
elements involved in transcription; transcription factors in general are
reviewed in Barnes & Adcock (1995) Clin. Exp. Allergy 25 Suppl. 2:46 9
and Roeder (1996) Methods Enzymol. 273:165 71). Databases dedicated to
transcription factors are known (see, e.g., Science (1995) 269:630).
Nuclear hormone receptor transcription factors are described in, for
example, Rosen et al. (1995) J. Med. Chem. 38:4855 74. The C/EBP family
of transcription factors are reviewed in Wedel et al. (1995)
Immunobiology 193:171 85. Coactivators and co-repressors that mediate
transcription regulation by nuclear hormone receptors are reviewed in,
for example, Meier (1996) Eur. J. Endocrinol. 134(2):158 9; Kaiser et al.
(1996) Trends Biochem. Sci. 21:342 5; and Utley et al. (1998) Nature
394:498 502). GATA transcription factors, which are involved in
regulation of hematopoiesis, are described in, for example, Simon (1995)
Nat. Genet. 11:9 11; Weiss et al. (1995) Exp. Hematol. 23:99-107. TATA
box binding protein (TBP) and its associated TAF polypeptides (which
include TAF30, TAF55, TAF80, TAF110, TAF150, and TAF250) are described in
Goodrich & Tjian (1994) Curr. Opin. Cell Biol. 6:403 9 and Hurley (1996)
Curr. Opin. Struct. Biol. 6:69 75. The STAT family of transcription
factors are reviewed in, for example, Barahmand-Pour et al. (1996) Curr.
Top. Microbiol. Immunol. 211:121 8. Transcription factors involved in
disease are reviewed in Aso et al. (1996) J. Clin. Invest. 97:1561 9.
Kinases, phosphatases, and other proteins that modify polypeptides
involved in gene regulation are also useful as regulatory domains for
polypeptides of the present invention. Such modifiers are often involved
in switching on or off transcription mediated by, for example, hormones.
Kinases involved in transcription regulation are reviewed in Davis (1995)
Mol. Reprod. Dev. 42:459 67, Jackson et al. (1993) Adv. Second Messenger
Phosphoprotein Res. 28:279 86, and Boulikas (1995) Crit. Rev. Eukaryot.
Gene Expr. 5:1 77, while phosphatases are reviewed in, for example,
Schonthal & Semin (1995) Cancer Biol. 6:239 48. Nuclear tyrosine kinases
are described in Wang (1994) Trends Biochem. Sci. 19:373 6. Useful
domains can also be obtained from the gene products of oncogenes (e.g.,
myc, jun, fos, myb, max, mad, rel, ets, bcl, myb, mos family members) and
their associated factors and modifiers. Oncogenes are described in, for
example, Cooper, Oncogenes, 2nd ed., The Jones and Bartlett Series in
Biology, Boston, Mass., Jones and Bartlett Publishers, 1995. The ets
transcription factors are reviewed in Waslylk et al. (1993) Eur. J.
Biochem. 211:7 18 and Crepieux et al. (1994) Crit. Rev. Oncog. 5:615 38.
Myc oncogenes are reviewed in, for example, Ryan et al. (1996) Biochem.
J. 314:713 21. The jun and fos transcription factors are described in,
for example, The Fos and Jun Families of Transcription Factors, Angel &
Herrlich, eds. (1994). The max oncogene is reviewed in Hurlin et al. Cold
Spring Harb. Symp. Quant. Biol. 59:109 16. The myb gene family is
reviewed in Kanei-Ishii et al. (1996) Curr. Top. Microbiol. Immunol.
211:89 98. The mos family is reviewed in Yew et al. (1993) Curr. Opin.
Genet. Dev. 3:19 25. Polypeptides of the present invention can include
regulatory domains obtained from DNA repair enzymes and their associated
factors and modifiers. DNA repair systems are reviewed in, for example,
Vos (1992) Curr. Opin. Cell Biol. 4:385 95; Sancar (1995) Ann. Rev.
Genet. 29:69 105; Lehmann (1995) Genet. Eng. 17:1 19; and Wood (1996)
Ann. Rev. Biochem. 65:135 67. DNA rearrangement enzymes and their
associated factors and modifiers can also be used as regulatory domains
(see, e.g., Gangloff et al. (1994) Experientia 50:261 9; Sadowski (1993)
FASEB J. 7:760 7).
[0074] Similarly, regulatory domains can be derived from DNA modifying
enzymes (e.g., DNA methyltransferases, topoisomerases, helicases,
ligases, kinases, phosphatases, polymerases) and their associated factors
and modifiers. Helicases are reviewed in Matson et al. (1994) Bioessays
16:13 22, and methyltransferases are described in Cheng (1995) Curr.
Opin. Struct. Biol. 5:4 10. Chromatin associated proteins and their
modifiers (e.g., kinases, acetylases and deacetylases), such as histone
deacetylase (Wolffe Science 272:371 2 (1996)) are also useful as domains
for addition to the effector of choice. In one preferred embodiment, the
regulatory domain is a DNA methyl transferase that acts as a
transcriptional repressor (see, e.g., Van den Wyngaert et al. FEBS Lett.
426:283 289 (1998); Flynn et al. J. Mol. Biol. 279:101 116 (1998); Okano
et al. Nucleic Acids Res. 26:2536 2540 (1998); and Zardo & Caiafa, J.
Biol. Chem. 273:16517 16520 (1998)). In another preferred embodiment,
endonucleases such as FokI are used as transcriptional repressors, which
act via gene cleavage (see, e.g., WO95/09233; and PCT/US94/01201).
Factors that control chromatin and DNA structure, movement and
localization and their associated factors and modifiers; factors derived
from microbes (e.g., prokaryotes, eukaryotes and virus) and factors that
associate with or modify them can also be used to obtain chimeric
proteins. In one embodiment, recombinases and integrases are used as
regulatory domains. In one embodiment, histone acetyltransferase is used
as a transcriptional activator (see, e.g., Jin & Scotto (1998) Mol. Cell.
Biol. 18:4377 4384; Wolffe (1996) Science 272:371 372; Taunton et al.
Science 272:408 411 (1996); and Hassig et al. PNAS 95:3519 3524 (1998)).
In another embodiment, histone deacetylase is used as a transcriptional
repressor (see, e.g., Jin & Scotto (1998) Mol. Cell. Biol. 18:4377 4384;
Syntichaki & Thireos (1998) J. Biol. Chem. 273:24414 24419; Sakaguchi et
al. (1998) Genes Dev. 12:2831 2841; and Martinez et al. (1998) J. Biol.
Chem. 273:23781 23785).
[0075] As used herein, "gene" refers to a nucleic acid molecule or portion
thereof which comprises a coding sequence, optionally containing introns,
and control regions which regulate the expression of the coding sequence
and the transcription of untranslated portions of the transcript.
[0076] Thus, the term "gene" includes, besides coding sequence, regulatory
sequence such as the promoter, enhancer, 5' untranslated regions, 3'
untranslated region, termination signals, poly adenylation region and the
like. Regulatory sequence of a gene may be located proximal to, within,
or distal to the coding region.
[0077] As used herein, "target gene" refers to a gene whose expression is
to be modulated by a polypeptide of the present invention.
[0078] As used herein, "plant" refers to any of various photosynthetic,
eucaryotic multi-cellular organisms of the kingdom Plantae,
characteristically producing embryos, containing chloroplasts, having
cellulose cell walls and lacking locomotion. As used herein, "plant"
includes any plant or part of a plant at any stage of development,
including seeds, suspension cultures, embryos, meristematic regions,
callus tissue, leaves, roots, shoots, gametophytes, sporophytes, pollen,
microspores, and progeny thereof. Also included are cuttings, and cell or
tissue cultures. As used in conjunction with the present invention, the
term "plant tissue" includes, but is not limited to, whole plants, plant
cells, plant organs, e.g., leafs, stems, roots, meristems, plant seeds,
protoplasts, callus, cell cultures, and any groups of plant cells
organized into structural and/or functional units.
[0079] As used herein, "modulate the expression of a target gene in plant
cells" refers to increasing (activation) or decreasing (repression) the
expression of the target gene in plant cells with a polypeptide of the
present invention, alone or in combination with other transcription
and/or translational regulatory factors, or nucleic acids encoding such
polypeptide, in plant cells.
[0080] As used herein, a "target DNA sequence" refers to a portion of
double-stranded DNA to which recognition by a protein is desired. In one
embodiment, a "target DNA sequence" is all or part of a transcriptional
control element for a gene for which a desired phenotypic result can be
attained by altering the degree of its expression. A transcriptional
control element includes positive and negative control elements such as a
promoter, an enhancer, other response elements, e.g., steroid response
element, heat shock response element, metal response element, a repressor
binding site, operator, and/or a silencer. The transcriptional control
element can be viral, eukaryotic, or prokaryotic. A "target DNA sequence"
also includes a downstream or an upstream sequence which can bind a
protein and thereby modulate, typically prevent, transcription.
[0081] The use of the term "DNA" or "DNA sequence" herein is not intended
to limit the present invention to polynucleotide molecules comprising
DNA. Those of ordinary skill in the art will recognize that the methods
and compositions of the invention encompass polynucleotide molecules
comprised of deoxyribonucleotides (i.e., DNA), ribonucleotides (i.e.,
RNA) or combinations of ribonucleotides and deoxyribonucleotides. Such
deoxyribonucleotides and ribonucleotides include both naturally occurring
molecules and synthetic analogues including, but not limited to,
nucleotide analogs or modified backbone residues or linkages, which are
synthetic, naturally occurring, and non-naturally occurring, which have
similar binding properties as the reference nucleic acid, and which are
metabolized in a manner similar to the reference nucleotides. Examples of
such analogs include, without limitation, phosphorothioates,
phosphoramidates, methyl phosphonates, chiral-methyl phosphonates,
2-O-methyl ribonucleotides, peptide-nucleic acids (PNAs). The
polynucleotide molecules of the invention also encompass all forms of
polynucleotide molecules including, but not limited to, single-stranded
forms, double-stranded forms, hairpins, stem-and-loop structures, and the
like. Furthermore, it is understood by those of ordinary skill in the art
that the DNA sequences disclosed herein also encompasses the complement
of that exemplified nucleotide sequence.
[0082] As used herein, "specifically binds to a target DNA sequence" means
that the binding affinity of a polypeptide of the present invention to a
specified target DNA sequence is statistically higher than the binding
affinity of the same polypeptide to a generally comparable, but
non-target DNA sequence. It also refers to binding of a repeat domain of
the present invention to a specified target DNA sequence to a detectably
greater degree, e.g., at least 1.5-fold over background, than its binding
to non-target DNA sequences and to the substantial exclusion of
non-target DNA sequences. A polypeptide of the present invention's Kd to
each DNA sequence can be compared to assess the binding specificity of
the polypeptide to a particular target DNA sequence.
[0083] As used herein, a "target DNA sequence within a target gene" refers
to a functional relationship between the target DNA sequence and the
target gene in that recognition of a polypeptide of the present invention
to the target DNA sequence will modulate the expression of the target
gene. The target DNA sequence can be physically located anywhere inside
the boundaries of the target gene, e.g., 5' ends, coding region, 3' ends,
upstream and downstream regions outside of cDNA encoded region, or inside
enhancer or other regulatory region, and can be proximal or distal to the
target gene.
[0084] As used herein, "endogenous" refers to nucleic acid or protein
sequence naturally associated with a target gene or a host cell into
which it is introduced.
[0085] As used herein, "exogenous" refers to nucleic acid or protein
sequence not naturally associated with a target gene or a host cell into
which it is introduced, including non-naturally occurring multiple copies
of a naturally occurring nucleic acid, e.g., DNA sequence, or naturally
occurring nucleic acid sequence located in a non-naturally occurring
genome location.
[0086] As used herein, "genetically modified plant (or transgenic plant)"
refers to a plant which comprises within its genome an exogenous
polynucleotide. Generally, and preferably, the exogenous polynucleotide
is stably integrated within the genome such that the polynucleotide is
passed on to successive generations. The exogenous polynucleotide may be
integrated into the genome alone or as part of a recombinant expression
cassette. "Transgenic" is used herein to include any cell, cell line,
callus, tissue, plant part or plant, the genotype of which has been
altered by the presence of exogenous nucleic acid including those
transgenics initially so altered as well as those created by sexual
crosses or asexual propagation from the initial transgenic. The term
"transgenic" as used herein does not encompass the alteration of the
genome (chromosomal or extra-chromosomal) by conventional plant breeding
methods or by naturally occurring events such as random
cross-fertilization, non-recombinant viral infection, non-recombinant
bacterial transformation, non-recombinant transposition, or spontaneous
mutation.
[0087] As used herein, "minimal promoter" or substantially similar term
refers to a promoter element, particularly a TATA element, that is
inactive or that has greatly reduced promoter activity in the absence of
upstream activation. In the presence of a suitable transcription factor,
the minimal promoter functions to permit transcription.
[0088] As used herein, "repressor protein" or "repressor" refers to a
protein that binds to operator of DNA or to RNA to prevent transcription
or translation, respectively.
[0089] As used herein, "repression" refers to inhibition of transcription
or translation by binding of repressor protein to specific site on DNA or
mRNA. Preferably, repression includes a significant change in
transcription or translation level of at least 1.5 fold, more preferably
at least two fold, and even more preferably at least five fold.
[0090] As used herein, "activator protein" or "activator" refers to a
protein that binds to operator of DNA or to RNA to enhance transcription
or translation, respectively.
[0091] As used herein, "activation" refers to enhancement of transcription
or translation by binding of activator protein to specific site on DNA or
mRNA. Preferably, activation includes a significant change in
transcription or translation level of at least 1.5 fold, more preferably
at least two fold, and even more preferably at least five fold.
[0092] As used herein, "derivative" or "analog" of a molecule refers to a
portion derived from or a modified version of the molecule.
[0093] As used herein, a "repeat unit derived from a transcription
activator-like (TAL) effector" refers to a repeat unit from a TAL
effector or a modified or artificial version of one or more TAL effectors
that is produced by any of the methods disclosed herein.
[0094] In the following, the invention is specifically described with
respect to the transcription activator-like (TAL) effector family which
are translocated via the type III secretion system into plant cells. The
type member of this effector family is AvrBs3. Hence, the TAL effector
family is also named AvrBs3-like family of proteins. Both expressions are
used synonymously and can be interchanged. Non-limiting examples of the
AvrBs3-like family are as follows: AvrBs4 and the members of the Hax
sub-family Hax2, Hax3, and Hax4 as well as Brg11. AvrBs3 and the other
members of its family are characterized by their binding capability to
specific DNA sequences in promoter regions of target genes and induction
of expression of these genes. They have conserved structural features
that enable them to act as transcriptional activators of plant genes.
AvrBs3-like family and homologous effectors typically have in their
C-terminal region nuclear localisation sequences (NLS) and a
transcriptional activation domain (AD). The central region contains
repeat units of typically 34 or 35 amino acids. The repeat units are
nearly identical, but variable at certain positions and it has now been
found how these positions determine the nucleotide sequence binding
specificity of the proteins.
[0095] It was shown for AvrBs3 that the repeat units are responsible for
binding to DNA. The DNA-binding specificity of AvrBs3 and probably other
members of the AvrBs3-family seems to be mediated by the central repeat
domain of the proteins. This repeat domain consists in AvrBs3 of 17.5
repeat units and in homologous proteins is comprised of 1.5 to 33.5
repeat units which are typically 34 amino acids each. Other repeat unit
lengths are also known (e.g. 30, 33, 35, 39, 40, 42 amino acids). The
last repeat in the repeat domain is usually only a half repeat of 19 or
20 amino acids length. The individual repeat units are generally not
identical. They vary at certain variable amino acid positions, among
these positions 12 and 13 are hypervariable while positions 4, 11, 24,
and 32 vary with high frequency but at a lower frequency than 12 and 13
(variations at other positions occur also, but at lower frequency). The
comparison of different AvrBs3-like proteins from Xanthomonas reveals 80
to 97% overall sequence identity with most differences confined to the
repeat domain. For example, AvrBs3 and the AvrBs3-like family member
AvrBs4 differ exclusively in their repeat domain region, with the
exception of a four amino acid deletion in the C-terminus of AvrBs4 with
respect to AvrBs3.
[0096] In FIG. 16, the amino acid sequences of AvrBs3 as well as the amino
acid sequences of the members of the Hax-sub family are shown. Of
particular importance for the present invention is the repeat units,
which are identical except for the hypervariable amino acids at positions
12 and 13 and the variable amino acids at positions 4 and 24. Hence, each
repeat unit of these proteins is given separately.
[0097] As stated above, it has already been described that the repeat
units within the repeat domains determine recognition or binding
capability and specificity of type III effector proteins of
AvrBs3-family. However, the principle underlying was not known until the
present invention.
[0098] The inventors have discovered that one repeat unit within a repeat
domain is responsible for the recognition of one specific DNA base pair
in a target DNA sequence. This finding is, however, only one element of
the invention. The inventors additionally discovered that a hypervariable
region within each repeat unit of a repeat domain is responsible for
recognition of one specific DNA base pair in a target DNA sequence.
Within a repeat unit, the hypervariable region (corresponds to amino acid
positions 12 and 13) are typically responsible for this recognition
specificity. Hence each variation in these amino acids reflects a
corresponding variation in target DNA recognition and preferably also
recognition capacity.
[0099] As used herein, "hypervariable region" is intended to mean
positions 12 and 13 or equivalent position in a repeat unit of the
present invention. It is recognized that positions 12 and 13 of the
invention correspond to positions 12 and 13 in the full-length repeat
units of AvrBs3 and other TAL effectors as disclosed herein. It is
further recognized that by "equivalent positions" is intended positions
that corresponds to positions 12 and 13, respectively, in a repeat unit
of the present. One can readily determine such equivalent positions by
aligning any repeat unit with a full-length repeat unit of AvrBs3.
[0100] It has, therefore, been shown for the first time that one repeat
unit in a repeat domain of a DNA-binding protein recognizes one base pair
in the target DNA, and that one amino acid or two adjacent amino acid
residues in a repeat unit, typically within the hypervariable regions of
a repeat unit, determine which base pair in the target DNA is recognized.
Based on this finding, a person skilled in the art would be able to
specifically target base pairs in a target DNA sequence of interest by
modifying a polypeptide within its repeat units of the repeat domain to
specifically target base pairs in the desired target DNA sequence. Based
on this finding, the inventors have identified a recognition code for
DNA-target specificities of different repeat types and were able to
predict target DNA sequences of several TAL effectors which could be
confirmed experimentally. This will additionally facilitate the
identification of host genes that are regulated by TAL effectors. The
linear array of repeat units which recognizes a linear sequence of bases
in the target DNA is a novel DNA-protein interaction. The modular
architecture of the repeat domain and the recognition code identified by
the inventors for targeting DNA with high specificity allows the
efficient design of specific DNA-binding domains for use in a variety of
technological fields.
[0101] In one embodiment of the present invention, the repeat domains are
included in a transcription factor, for instance in transcription factors
active in plants, particularly preferred in type III effector proteins,
e.g. in effectors of the AvrBs3-like family. However, after having
uncovered the correlation between the repeat units in a repeat domain on
the one hand and the base sequence in the target DNA on the other hand,
the modular architecture of the repeat domain can be used in any protein
which shall be used for targeting specific target DNA sequences. By
introducing repeat domains comprising repeat units into a polypeptide
wherein the repeat units are modified in order to comprise one
hypervariable region per repeat unit and wherein the hypervariable region
determines recognition of a base pair in a target DNA sequence, the
recognition of a large variety of proteins to pre-determined target DNA
sequences will be available.
[0102] As one repeat unit within a repeat domain has been found to be
responsible for the specific recognition of one base pair in a DNA,
several repeat units can be combined with each other wherein each repeat
unit includes a hypervariable region that is responsible for the
recognition of each repeat unit to a particular base pair in a target DNA
sequence.
[0103] Techniques to specifically modify DNA sequences in order to obtain
a specified codon for a specific amino acid are known in the art.
[0104] Methods for mutagenesis and polynucleotide alterations have been
widely described. See, for example, Kunkel (1985) Proc. Natl. Acad. Sci.
USA 82:488-492; Kunkel et al. (1987) Methods in Enzymol. 154:367-382;
U.S. Pat. No. 4,873,192; Walker and Gaastra, eds. (1983) Techniques in
Molecular Biology (MacMillan Publishing Company, New York) and the
references cited therein. All these publications are herein incorporated
by reference.
[0105] The following examples provide methods for constructing new repeat
units and testing the specific binding activities of artificially
constructed repeat units specifically recognizing base pairs in a target
DNA sequence.
[0106] The number of repeat units to be used in a repeat domain can be
ascertained by one skilled in the art by routine experimentation.
Generally, at least 1.5 repeat units are considered as a minimum,
although typically at least about 8 repeat units will be used. The repeat
units do not have to be complete repeat units, as repeat units of half
the size can be used. Moreover, the methods and polypeptides disclosed
herein do depend on repeat domains with a particular number of repeat
units. Thus, a polypeptide of the invention can comprise, for example,
1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, 10,
10.5, 11, 11.5, 12, 12.5, 13, 13.5, 14, 14.5, 15, 15.5, 16, 16.5, 17,
17.5, 18, 18.5, 19, 19.5, 20, 20.5, 21, 21.5, 22, 22.5, 23, 23.5, 24,
24.5, 25, 25.5, 26, 26.5, 27, 27.5, 28, 28.5, 29, 29.5, 30, 30.5, 31,
31.5, 32, 32.5, 33, 33.5, 34, 34.5, 35, 35.5, 36, 36.5, 37, 37.5, 38,
38.5, 39, 39.5, 40, 40.5, 41, 41.5, 42, 42.5, 43, 43.5, 44, 44.5, 46,
46.5, 47, 47.5, 48, 48.5, 49, 49.5, 50, 50.5 or more repeat units.
Typically, AvrBs3 contains 17.5 repeat units and induces expression of
UPA (up-regulated by AvrBs3) genes. The number and order of repeat units
will determine the corresponding activity and DNA recognition
specificity. As further examples, the AvrBs3 family members Hax2 includes
21.5 repeat units, Hax3 11.5 repeat units and Hax4 14.5 repeat units.
Preferably, a polypeptide of the invention comprises about 8 and to about
39 repeat units. More preferably, a polypeptide of the invention
comprises about 11.5 to about 33.5 repeat units.
[0107] A typical consensus sequence of a repeat with 34 amino acids (in
one-letter code) is shown below:
TABLE-US-00001
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAHG
[0108] A further consensus sequence for a repeat unit with 35 amino acids
(in one-letter code) is as follows:
TABLE-US-00002
LTPEQVVAIASNGGGKQALETVQRLLPVLCQAPHD
[0109] The repeat units which can be used in one embodiment of the
invention have an identity with the consensus sequences described above
of at least 35%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90% or 95%. In
preferred embodiments, the repeat sequences of AvrBs3, Hax2, Hax3 and
Hax4 and further members of the AvrBs3-family are used. The repeat unit
sequences of these members are indicated in FIG. 16. These repeat unit
sequences can be modified by exchanging one or more of the amino acids.
The modified repeat unit sequences have an identity with the original
repeat sequence of the original member of the AvrBs3-family sequence of
at least 35%, 40%, 50%, 60%, 70%, 75%, 80%, 85%, 90% or 95%. In preferred
embodiments, the amino acids in positions
[0110] 12 and 13 are altered. In still further embodiments, amino acids in
positions 4, 11, 24, and 32 are altered. Preferably, the number of amino
acids per repeat are in a range between 20-45 amino acids, furthermore
32-40 amino acids, still further 32-39 amino acid, and further optionally
32, 34, 35 or 39 amino acids per repeat unit.
[0111] Specifically, the hypervariable region in a repeat unit determines
the specific recognition of one base pair in a target DNA sequence. More
specifically, the inventors have found the following correlation of
recognition specificity between amino acids found at positions 12 and 13
in a repeat unit and base pairs in the target DNA sequence: [0112] HD
for recognition of C/G [0113] NI for recognition of A/T [0114] NG for
recognition of T/A [0115] NS for recognition of C/G or A/T or T/A or G/C
[0116] NN for recognition of G/C or A/T [0117] IG for recognition of T/A
[0118] N for recognition of C/G or T/A [0119] HG for recognition of T/A
[0120] H for recognition of T/A [0121] NK for recognition of G/C [0122]
NH for recognition of G/C [0123] NP for recognition of A/T or C/G or T/A
[0124] NT for recognition of A/T or G/C [0125] HN for recognition of A/T
or G/C [0126] SH for recognition of G/C [0127] SN for recognition of G/C
and [0128] IS for recognition of A/T.
[0129] It has to be noted that the amino acids are represented in the
single letter code. The nucleotides are given as base pairs, wherein the
first base is located in the upper strand and the second base in the
lower strand; for example C/G means that C is located in the upper
strand, G in the lower strand.
[0130] The methods of the present invention can further comprise making a
repeat unit in which one or more of the hypervariable regions is selected
from the following group in order to determine recognition of one of the
following base pairs: HA for recognition of C/G; ND for recognition of
C/G; HI for recognition of C/G; HN for recognition of G/C; and NA for
recognition of G/C.
[0131] With respect to the single amino acids N and H, respectively, amino
acid 13 of AvrBs3 appears to be missing from the repeat unit when
compared by multiple amino acid sequence alignments with the other repeat
units.
[0132] In one embodiment of the invention, the N-terminal domain of
AvrBs3-like proteins confers recognition specificity for a T, 5' of the
recognition specificity of said repeat.
[0133] In a particularly preferred embodiment of the invention, repeat
units of the protein family AvrBs3 are used. Examples for the members of
this protein family have been specified above. Particularly, the members
of the protein family have an amino acid homology of at least 95%, at
least 90%, at least 80%, at least 85%, at least 70%, at least 75%, at
least 60%, at least 50%, at least 40% or at least 35% to the amino acid
sequence of AvrBs3, particularly to the amino acid sequence of the repeat
unit of AvrBs3. Having this in mind, the hypervariable region in a repeat
unit can be deduced by an amino acid comparison between the members of
the AvrBs3 family. In particularly preferred embodiments, the amino acids
are in positions 12 and 13 of a repeat unit of AvrBs3. However, variable
regions may also be located in different amino acid positions. Examples
for variable positions are amino acids numbers 4, 11, 24, and 32. In a
further embodiment of the invention, the amino acids responsible for the
specific recognition of a base pair in a DNA sequence are located in
positions which typically do not vary between the members of the AvrBs3
family or in positions which are variable but not hypervariable.
[0134] To summarize, the inventors have found that repeat units determine
the recognition of one base pair on a DNA sequence and that the
hypervariable region within a repeat unit determines the recognition
specificity of the corresponding repeat unit. Hence, the sequence of
repeat units correlates with a specific linear order of base pairs in a
target DNA sequence. The inventors have found this correlation with
respect to AvrBs3 and verified it with respect to a representative number
of members of the AvrBs3-like family of proteins. With respect to
AvrBs3-like family members, amino acid residues in positions 12 and 13 in
a repeat unit of 34 or other amino acids length correlate with defined
binding specificities of AvrBs3-like proteins. The discovery of this core
principle provides a powerful tool to customize a polypeptide with its
cognate target DNA template for a variety of applications including, but
not limited to, modulation of gene expression and targeted genome
engineering.
[0135] In the present invention, polypeptides can be designed which
comprise a repeat domain with repeat units wherein in the repeat units
hypervariable regions are included which determine recognition of a base
pair in a target DNA sequence. In one embodiment of the invention, each
repeat unit includes a hypervariable region which determine recognition
of one base pair in a target DNA sequence. In a further embodiment, 1 or
2 repeat units in a repeat domain are included which do not specifically
recognize a base pair in a target DNA sequence. Considering the
recognition code found by the inventors, a modular arrangement of repeat
units is feasible wherein each repeat unit is responsible for the
specific recognition of one base pair in a target DNA sequence.
Consequently, a sequence of repeat units corresponds to a sequence of
base pairs in a target DNA sequence so that 1 repeat unit matches to one
base pair.
[0136] Provided that a target DNA sequence is known and to which
recognition by a protein is desired, the person skilled in the art is
able to specifically construct a modular series of repeat units,
including specific recognition amino acid sequences, and assemble these
repeat units into a polypeptide in the appropriate order to enable
recognition of and binding to the desired target DNA sequence. Any
polypeptide can be modified by being combined with a modular repeat unit
DNA-binding domain of the present invention. Such examples include
polypeptides that are transcription activator and repressor proteins,
resistance-mediating proteins, nucleases, topoisomerases, ligases,
integrases, recombinases, resolvases, methylases, acetylases,
demethylases, deacetylases, and any other polypeptide capable of
modifying DNA, RNA, or proteins.
[0137] The modular repeat unit DNA-binding domain of the present invention
can be combined with cell compartment localisation signals such as
nuclear localisation signals, to function at any other regulatory
regions, including but not limited to, transcriptional regulatory regions
and translational termination regions.
[0138] In a further embodiment of the invention, these modularly designed
repeat units are combined with an endoneclease domain capable of cleaving
DNA when brought into proximity with DNA as a result of binding by the
repeat domain. Such endonucleolytic breaks are known to stimulate the
rate of homologous recombination in eukaryotes, including fungi, plants,
and animals. The ability to simulate homologous recombination at a
specific site as a result of a site-specific endonucleolytic break allows
the recovery of transformed cells that have integrated a DNA sequence of
interest at the specific site, at a much higher frequency than is
possible without having made the site-specific break. In addition,
endonucleolytic breaks such as those caused by polypeptides formed from a
repeat domain and an endonuclease domain are sometimes repaired by the
cellular DNA metabolic machinery in a way that alters the sequence at the
site of the break, for instance by causing a short insertion or deletion
at the site of the break compared to the unaltered sequence. These
sequence alterations can cause inactivation of the function of a gene or
protein, for instance by altering a protein-coding sequence to make a
non-functional protein, modifying a splice site so that a gene transcript
is not properly cleaved, making a non-functional transcript, changing the
promoter sequence of a gene so that it can no longer by appropriately
transcribed, etc.
[0139] Breaking DNA using site specific endonucleases can increase the
rate of homologous recombination in the region of the breakage. In some
embodiments, the Fok I (Flavobacterium okeanokoites) endonuclease may be
utilized in an effector to induce DNA breaks. The Fok I endonuclease
domain functions independently of the DNA binding domain and cuts a
double stranded DNA typically as a dimer (Li et al. (1992) Proc. Natl.
Acad. Sci. U.S.A 89 (10):4275-4279, and Kim et al. (1996) Proc. Natl.
Acad. Sci. U.S.A 93 (3):1156-1160; the disclosures of which are
incorporated herein by reference in their entireties). A single-chain
FokI dimer has also been developed and could also be utilized (Mino et
al. (2009) J. Biotechnol. 140:156-161). An effector could be constructed
that contains a repeat domain for recognition of a desired target DNA
sequence as well as a FokI endonuclease domain to induce DNA breakage at
or near the target DNA sequence similar to previous work done employing
zinc finger nucleases (Townsend et al. (2009) Nature 459:442-445; Shukla
et al. (2009) Nature 459, 437-441, all of which are herein incorporated
by reference in their entireties). Utilization of such effectors could
enable the generation of targeted changes in genomes which include
additions, deletions and other modifications, analogous to those uses
reported for zinc finger nucleases as per Bibikova et al. (2003) Science
300, 764; Urnov et al. (2005) Nature 435, 646; Wright et al. (2005) The
Plant Journal 44:693-705; and U.S. Pat. Nos. 7,163,824 and 7,001,768, all
of which are herein incorporated by reference in their entireties.
[0140] The FokI endonuclease domain can be cloned by PCR from the genomic
DNA of the marine bacteria Flavobacterium okeanokoites (ATCC) prepared by
standard methods. The sequence of the FokI endonuclease is available on
Pubmed (Acc. No. M28828 and Acc. No J04623, the disclosures of which are
incorporated herein by reference in their entireties). The I-Sce I
endonuclease from the yeast Saccharomyces cerevisiae has been used to
produce DNA breaks that increase the rate of homologous recombination.
I-Sce I is an endonuclease encoded by a mitochondrial intron which has an
18 bp recognition sequence, and therefore a very low frequency of
recognition sites within a given DNA, even within large genomes (Thierry
et al. (1991) Nucleic Acids Res. 19 (1):189-190; the disclosure of which
is incorporated herein by reference in its entirety). The infrequency of
cleavage sites recognized by I-SceI makes it suitable to use for
enhancing homologous recombination. Additional description regarding the
use of I-Sce Ito induce said DNA breaks can be found in U.S. Pat. Appl.
20090305402, which is incorporated herein by reference in its entirety.
[0141] The recognition site for I-Sce I has been introduced into a range
of different systems. Subsequent cutting of this site with I-Sce I
increases homologous recombination at the position where the site has
been introduced. Enhanced frequencies of homologous recombination have
been obtained with I-Sce I sites introduced into the extra-chromosomal
DNA in Xenopus oocytes, the mouse genome, and the genomic DNA of the
tobacco plant Nicotiana plumbaginifolia. See, for example, Segal et al.
(1995) Proc. Natl. Acad. Sci. U.S.A. 92 (3):806-810; Choulika et al.
(1995) Mol. Cell. Biol. 15 (4):1968-1973; and Puchta et al. (1993)
Nucleic Acids Res. 21 (22):5034-5040; the disclosures of which are
incorporated herein by reference in their entireties. It will be
appreciated that any other endonuclease domain that works with
heterologous DNA binding domains can be utilized in an effector and that
the I-Sce I endonuclease is one such non-limiting example. The limitation
of the use of endonucleases that have a DNA recognition and binding
domain such as I-Sce I is that the recognition site has to be introduced
by standard methods of homologous recombination at the desired location
prior to the use of said endonuclease to enhance homologous recombination
at that site, if such site is not already present in the desired
location. Methods have been reported that enable the design and synthesis
of novel endonucleases, such as by modifying known endonucleases or
making chimeric versions of one or more such endonucleases, that
recognize novel target DNA sequences, thus paving the way for generation
of such engineered endonuclease domains to cleave endogenous target DNA
sequences of interest (Chevalier et al. (2002) Molecular Cell 10:895-905;
WO2007/060495; WO2009/095793; Fajardo-Sanchez et al. (2008) Nucleic Acids
Res. 36:2163-2173, both of which are incorporated by reference in their
entireties). As such, it could be envisioned that such endonuclease
domains could be similarly engineered so as to render the DNA-binding
activity non-functional but leaving the DNA cleaving function active and
to utilize said similarly engineered endonuclease cleavage domain in an
effector to induce DNA breaks similar to the use of FokI above. In such
applications, target DNA sequence recognition would preferably be
provided by the repeat domain of the effector but DNA cleavage would be
accomplished by the engineered endonuclease domain.
[0142] As mentioned above, an effector includes a repeat domain with
specific recognition for a desired specific target sequence. In preferred
embodiments, the effector specifically binds to an endogenous chromosomal
DNA sequence. The specific nucleic acid sequence or more preferably
specific endogenous chromosomal sequence can be any sequence in a nucleic
acid region where it is desired to enhance homologous recombination. For
example, the nucleic acid region may be a region which contains a gene in
which it is desired to introduce a mutation, such as a point mutation or
deletion, or a region into which it is desired to introduce a gene
conferring a desired phenotype.
[0143] Further embodiments relate to methods of generating a modified
plant in which a desired addition has been introduced. The methods can
include obtaining a plant cell that includes an endogenous target DNA
sequence into which it is desired to introduce a modification; generating
a double-stranded cut within the endogenous target DNA sequence with an
effector that includes a repeat domain that binds to an endogenous target
DNA sequence and an endonuclease domain;
[0144] introducing an exogenous nucleic acid that includes a sequence
homologous to at least a portion of the endogenous target DNA into the
plant cell under conditions which permit homologous recombination to
occur between the exogenous nucleic acid and the endogenous target DNA
sequence; and generating a plant from the plant cell in which homologous
recombination has occurred. Other embodiments relate to genetically
modified cells and plants made according to the method described above
and herein. It should be noted that the target DNA sequence could be
artificial or naturally occurring. It will be appreciated that such
methods could be used in any organism (such non-limiting organisms to
include animals, humans, fungi, oomycetes bacteria and viruses) using
techniques and methods known in the art and utilized for such purposes in
such organisms.
[0145] In a further embodiment of the invention, these modularly designed
repeat domains are combined with one or more domains responsible for the
modulation or control of the expression of a gene, for instance of plant
genes, animal genes, fungal genes, oomycete genes, viral genes, or human
genes. Methods for modulating gene expression by generating DNA-binding
polypeptides containing zinc finger domains is known in the art (U.S.
Pat. Nos. 7,285,416, 7,521,241, 7,361,635, 7,273,923, 7,262,054,
7,220,719, 7,070,934, 7,013,219, 6,979,539, 6,933,113, 6,824,978, each of
which is hereby herein incorporated by reference in its entirety). For
instance, these effectors of the AvrBs3-like family are modified in order
to bind to specific target DNA sequences. Such polypeptides might for
instance be transcription activators or repressor proteins of
transcription which are modified by the method of the present invention
to specifically bind to genetic control regions in a promoter of or other
regulatory region for a gene of interest in order to activate, repress or
otherwise modulate transcription of said gene.
[0146] In a still further embodiment of the invention, the target DNA
sequences are modified in order to be specifically recognized by a
naturally occurring repeat domain or by a modified repeat domain. As one
example, the target DNA sequences for members of the AvrBs3-like family
can be inserted into promoters to generate novel controllable promoters
that can be induced by the corresponding AvrBs3 effector. Secondary
inducible systems can be constructed using a trans-activator and a target
gene, wherein the trans-activator is a polypeptide wherein said
polypeptide comprises at least a repeat domain comprising repeat units of
the present invention that bind to said target gene and induce
expression. The trans-activator and the target gene can be introduced
into one cell line but may also be present in different cell lines and
later be introgressed. In a further embodiment, disease-resistant plants
can be constructed by inserting the target DNA sequence of a repeat
domain containing polypeptide of the present invention in front of a gene
which after expression leads to a defence reaction of the plant by
activating a resistance-mediating gene.
[0147] In a further embodiment, custom DNA-binding polypeptides can be
constructed by rearranging repeat unit types thus allowing the generation
of repeat domains with novel target DNA binding specificity. Individual
repeat units are nearly identical at the DNA level which precludes
classical cloning strategies. The present invention provides a quick and
inexpensive strategy to assemble custom polypeptides with repeat domains
of the present invention. To improve cloning versatility such
polypeptides, a two-step assembly method was designed. This method was
used to assemble polypeptides with novel repeat types to study their
target DNA recognition and binding specificity.
[0148] Summarily, any DNA sequence can be modified to enable binding by a
repeat domain containing polypeptide of the present invention by
introducing base pairs into any DNA region or specific regions of a gene
or a genetic control element to specifically target a polypeptide having
a repeat domain comprised of repeat units that will bind said modified
DNA sequence in order to facilitate specific recognition and binding to
each other.
[0149] The inventors have demonstrated that a truly modular DNA
recognizing and preferably binding polypeptide can be efficiently
produced, wherein the binding motif of said polypeptide is a repeat
domain comprised of repeat units which are selected on the basis of their
recognition capability of a combination of particular base pairs.
Accordingly, it should be well within the capability of one of normal
skill in the art to design a polypeptide capable of binding to any
desired target DNA sequence simply by considering the sequence of base
pairs present in the target DNA and combining in the appropriate order
repeat units as binding motifs having the necessary characteristics to
bind thereto. The greater the length of known sequence of the target DNA,
the greater the number of modular repeat units that can be included in
the polypeptide. For example, if the known sequence is only 9 bases long,
then nine repeat units as defined above can be included in the
polypeptide. If the known sequence is 27 bases long, then up to 27 repeat
units could be included in the polypeptide. The longer the target DNA
sequence, the lower the probability of its occurrence in any other given
portion of DNA elsewhere in the genome.
[0150] Moreover, those repeat units selected for inclusion in the
polypeptide could be artificially modified in order to modify their
binding characteristics. Alternatively (or additionally) the length and
amino acid sequence of the repeat unit could be varied as long as its
binding characteristic is not affected.
[0151] Generally, it will be preferred to select those repeat units having
high affinity and high specificity for the target DNA sequence.
[0152] As described herein, effectors can be designed to recognize any
suitable target site, for regulation of expression of any endogenous gene
of choice. Examples of endogenous genes suitable for regulation include
VEGF, CCR5, ER.alpha., Her2/Neu, Tat, Rev, HBV C, S, X, and P, LDL-R,
PEPCK, CYP7, Fibrinogen, ApoB, Apo E, Apo(a), renin, NF-.kappa.B,
I-.kappa.B, TNF-.alpha., FAS ligand, amyloid precursor protein, atrial
naturetic factor, ob-leptin, ucp-1, IL-1, IL-2, IL-3, IL-4, IL-5, IL-6,
IL-12, G-CSF, GM-CSF, Epo, PDGF, PAF, p53, Rb, fetal hemoglobin,
dystrophin, eutrophin, GDNF, NGF, IGF-1, VEGF receptors fit and flk,
topoisomerase, telomerase, bcl-2, cyclins, angiostatin, IGF, ICAM-1,
STATS, c-myc, c-myb, TH, PTI-1, polygalacturonase, EPSP synthase, FAD2-1,
delta-12 desaturase, delta-9 desaturase, delta-15 desaturase, acetyl-CoA
carboxylase, acyl-ACP-thioesterase, ADP-glucose pyrophosphorylase, starch
synthase, cellulose synthase, sucrose synthase, senescence-associated
genes, heavy metal chelators, fatty acid hydroperoxide lyase, viral
genes, protozoal genes, fungal genes, and bacterial genes. In general,
suitable genes to be regulated include cytokines, lymphokines, growth
factors, mitogenic factors, chemotactic factors, onco-active factors,
receptors, potassium channels, G-proteins, signal transduction molecules,
disease resistance genes, and other disease-related genes.
[0153] In another aspect, a method of modulating expression of a target
gene in a cell is provided. The cell may be preferably a plant cell, a
human cell, animal cell, fungal cell or any other living cell. The cells
contain a polypeptide wherein said polypeptide comprises at least a
repeat domain comprising repeat units, and these repeat units contain a
hypervariable region and each repeat unit is responsible for the
recognition of 1 base pair in said target DNA sequence. Said polypeptide
is introduced either as DNA encoding for the polypeptide or the
polypeptide is introduced per se into the cell by methods known in the
art. Regardless of how introduced, the polypeptide should include at
least one repeat domain that specifically recognizes and preferably binds
to a target DNA sequence of base pairs and modulates the expression of a
target gene. In a preferred embodiment, all repeat units contain a
hypervariable region which determines recognition of base pairs in a
target DNA sequence.
[0154] Examples of peptide sequences which can be linked to an effector of
the present invention, for facilitating uptake of effectors into cells,
include, but are not limited to: an 11 animo acid peptide of the tat
protein of HIV; a 20 residue peptide sequence which corresponds to amino
acids 84 103 of the p16 protein (see Fahraeus et al. (1996) Current
Biology 6:84); the third helix of the 60-amino acid long homeodomain of
Antennapedia (Derossi et al. (1994) J. Biol. Chem. 269:10444); the h
region of a signal peptide such as the Kaposi fibroblast growth factor
(K-FGF) h region; or the VP22 translocation domain from HSV (Elliot &
O'Hare (1997) Cell 88:223 233). Other suitable chemical moieties that
provide enhanced cellular uptake may also be chemically linked to
effectors.
[0155] Toxin molecules also have the ability to transport polypeptides
across cell membranes. Often, such molecules are composed of at least two
parts (called "binary toxins"): a translocation or binding domain or
polypeptide and a separate toxin domain or polypeptide. Typically, the
translocation domain or polypeptide binds to a cellular receptor, and
then the toxin is transported into the cell. Several bacterial toxins,
including Clostridium perfringens iota toxin, diphtheria toxin (DT),
Pseudomonas exotoxin A (PE), pertussis toxin (PT), Bacillus anthracis
toxin, and pertussis adenylate cyclase (CYA), have been used in attempts
to deliver peptides to the cell cytosol as internal or amino-terminal
fusions (Arora et al. (1993) J. Biol. Chem. 268:3334 3341; Perelle et al.
(1993) Infect. Immun. 61:5147 5156 (1993); Stenmark et al. (1991) J. Cell
Biol. 113:1025 1032 (1991); Donnelly et al. (1993) Proc. Natl. Acad. Sci.
USA 90:3530 3534; Carbonetti et al. (1995) Abstr. Annu. Meet. Am. Soc.
Microbiol. 95:295; Sebo et al. (1995) Infect. Immun. 63:3851 3857;
Klimpel et al. (1992) Proc. Natl. Acad. Sci. USA 89:10277 10281; and
Novak et al. (1992) J. Biol. Chem. 267:17186 17193).
[0156] Effectors can also be introduced into an animal cell, preferably a
mammalian cell, via liposomes and liposome derivatives such as
immunoliposomes. The term "liposome" refers to vesicles comprised of one
or more concentrically ordered lipid bilayers, which encapsulate an
aqueous phase. The aqueous phase typically contains the compound to be
delivered to the cell, in this case an effector. The liposome fuses with
the plasma membrane, thereby releasing the effector into the cytosol.
Alternatively, the liposome is phagocytosed or taken up by the cell in a
transport vesicle. Once in the endosome or phagosome, the liposome either
degrades or fuses with the membrane of the transport vesicle and releases
its contents.
[0157] The invention particularly relates to the field of plant and
agricultural technology. In one aspect, the present invention is directed
to a method to modulate the expression of a target gene in plant cells,
which method comprises providing plant cells with a polypeptide modified
according to the invention, said polypeptide being capable of
specifically recognizing a target nucleotide sequence, or a complementary
strand thereof, within a target gene, and allowing said polypeptide to
recognize and particularly bind to said target nucleotide sequence,
whereby the expression of said target gene in said plant cells is
modulated.
[0158] The polypeptide can be provided to the plant cells via any suitable
methods known in the art. For example, the protein can be exogenously
added to the plant cells and the plant cells are maintained under
conditions such that the polypeptide is introduced into the plant cell,
binds to the target nucleotide sequence and regulates the expression of
the target gene in the plant cells. Alternatively, a nucleotide sequence,
e.g., DNA or RNA, encoding the polypeptide can be expressed in the plant
cells and the plant cells are maintained under conditions such that the
expressed polypeptide binds to the target nucleotide sequence and
regulates the expression of the target gene in the plant cells.
[0159] A preferred method to modulate the expression of a target gene in
plant cells comprises the following steps: a) providing plant cells with
an expression system for a polypeptide modified according to the
invention, said polypeptide being capable of specifically recognizing,
and preferably binding, to a target nucleotide sequence, or a
complementary strand thereof, within an expression control element of a
target gene, preferably a promoter; and b) culturing said plant cells
under conditions wherein said polypeptide is produced and binds to said
target nucleotide sequence, whereby expression of said target gene in
said plant cells is modulated.
[0160] Any target nucleotide sequence can be modulated by the present
method. For example, the target nucleotide sequence can be endogenous or
exogenous to the target gene. In an embodiment of the invention the
target nucleotide sequence can be present in a living cell or present in
vitro. In a specific embodiment, the target nucleotide sequence is
endogenous to the plant. The target nucleotide sequence can be located in
any suitable place in relation to the target gene. For example, the
target nucleotide sequence can be upstream or downstream of the coding
region of the target gene. Alternatively, the target nucleotide sequence
is within the coding region of the target gene. Preferably, the target
nucleotide sequence is a promoter of a gene.
[0161] Any target gene can be modulated by the present method. For
example, the target gene can encode a product that affects biosynthesis,
modification, cellular trafficking, metabolism and degradation of a
peptide, a protein, an oligonucleotide, a nucleic acid, a vitamin, an
oligosaccharide, a carbohydrate, a lipid, or a small molecule.
Furthermore, effectors can be used to engineer plants for traits such as
increased disease resistance, modification of structural and storage
polysaccharides, flavors, proteins, and fatty acids, fruit ripening,
yield, color, nutritional characteristics, improved storage capability,
and the like.
[0162] Therefore, the invention provides a method of altering the
expression of a gene of interest in a target cell, comprising:
determining (if necessary) at least part of the DNA sequence of the
structural region and/or a regulatory region of the gene of interest;
designing a polypeptide including the repeat units modified in accordance
with the invention to recognize specific base pairs on the DNA of known
sequence, and causing said modified polypeptide to be present in the
target cell, (preferably in the nucleus thereof). (It will be apparent
that the DNA sequence need not be determined if it is already known.)
[0163] The regulatory region could be quite remote from the structural
region of the gene of interest (e.g. a distant enhancer sequence or
similar).
[0164] In addition, the polypeptide may advantageously comprise functional
domains from other proteins (e.g. catalytic domains from restriction
endonucleases, recombinases, replicases, integrases and the like) or even
"synthetic" effector domains. The polypeptide may also comprise
activation or processing signals, such as nuclear localisation signals.
These are of particular usefulness in targeting the polypeptide to the
nucleus of the cell in order to enhance the binding of the polypeptide to
an intranuclear target (such as genomic DNA).
[0165] The modified polypeptide may be synthesised in situ in the cell as
a result of delivery to the cell of DNA directing expression of the
polypeptide. Methods of facilitating delivery of DNA are well-known to
those skilled in the art and include, for example, recombinant viral
vectors (e.g. retroviruses, adenoviruses), liposomes and the like.
Alternatively, the modified polypeptide could be made outside the cell
and then delivered thereto. Delivery could be facilitated by
incorporating the polypeptide into liposomes etc. or by attaching the
polypeptide to a targeting moiety (such as the binding portion of an
antibody or hormone molecule, or a membrane transition domain, or the
translocation domain of a fungal or oomycete effector, or the
cell-binding B-domain of the classical A-B family of bacterial toxins).
Indeed, one significant advantage of the modified proteins of the
invention in controlling gene expression would be the vector-free
delivery of protein to target cells.
[0166] To the best knowledge of the inventors, design of a polypeptide
containing modified repeat units capable of specifically recognizing base
pairs in a target DNA sequence and its successful use in modulation of
gene expression (as described herein) has never previously been
demonstrated. Thus, the breakthrough of the present invention as
disclosed herein presents numerous possibilities that extend beyond uses
in plants. In one embodiment of the invention, effector polypeptides are
designed for therapeutic and/or prophylactic use in regulating the
expression of disease-associated genes. For example, said polypeptides
could be used to inhibit the expression of foreign genes (e.g., the genes
of bacterial or viral pathogens) in humans, other animals, or plants, or
to modify the expression of mutated host genes (such as oncogenes).
[0167] The invention therefore also provides an effector polypeptide
capable of inhibiting the expression of a disease-associated gene.
Typically the polypeptide will not be a naturally occurring polypeptide
but will be specifically designed to inhibit the expression of the
disease-associated gene. Conveniently the effector polypeptide will be
designed by any of the methods of the invention.
[0168] The invention also relates to the field of genome engineering. An
effector polypeptide can be generated according to the invention to
target a specific DNA sequence in a genome. Said polypeptide can be
modified to contain an activity that directs modification of the target
DNA sequence (e.g. site specific recombination or integration of target
sequences). This method enables targeted DNA modifications in complex
genomes.
[0169] In a still further embodiment of the invention, a polypeptide is
provided which is modified to include at least a repeat domain comprising
repeat units, the repeat units having hypervariable region for
determining selective recognition of a base pair in a DNA sequence.
[0170] In a preferred embodiment, the polypeptide comprises within said
repeat unit a hypervariable region which is selected from the following
group in order to determine recognition of one of the following base
pairs: [0171] HD for recognition of C/G [0172] NI for recognition of
A/T [0173] NG for recognition of T/A [0174] NS for recognition of C/G or
A/T or T/A or G/C [0175] NN for recognition of G/C or A/T [0176] IG for
recognition of T/A [0177] N for recognition of C/G or T/A [0178] HG for
recognition of T/A [0179] H for recognition of T/A [0180] NK for
recognition of G/C [0181] NH for recognition of G/C [0182] NP for
recognition of A/T or C/G or T/A [0183] NT for recognition of A/T or G/C
[0184] HN for recognition of A/T or G/C [0185] SH for recognition of G/C
[0186] SN for recognition of G/C and [0187] IS for recognition of A/T.
[0188] The polypeptides of the present invention can further comprise
within a repeat unit a hypervariable region which is selected from the
following group in order to determine recognition of one of the following
base pairs: HA for recognition of C/G; ND for recognition of C/G; HI for
recognition of C/G; FIN for recognition of G/C; and NA for recognition of
G/C.
[0189] The invention also comprises DNA which encodes for any one of the
polypeptides described before.
[0190] In a still further embodiment, DNA is provided which is modified to
include a base pair located in a target DNA sequence so that said base
pair can be specifically recognized by a polypeptide which includes at
least a repeat domain comprising repeat units, the repeat units having a
hypervariable region which determine recognition of said base pair in
said DNA. In one optional embodiment, said base pair is located in a gene
expression control sequence. Due to the modular assembly of the repeat
domain, a sequence of base pairs can be specifically targeted by said
repeat domain.
[0191] In an alternative embodiment of the invention, said DNA is modified
by a base pair selected from the following group in order to receive a
selective and determined recognition by one of the following
hypervariable regions: [0192] C/G for recognition by HD [0193] A/T for
recognition by NI [0194] T/A for recognition by NG [0195] CT or A/T or
T/A or G/C for recognition by NS [0196] G/C or A/T for recognition by NN
[0197] T/A for recognition by IG. [0198] C/G or T/A for recognition by N
[0199] T/A for recognition by HG [0200] T/A for recognition by H [0201]
G/C for recognition by NK [0202] G/C for recognition of NH [0203] A/T or
C/G or T/A for recognition of NP [0204] A/T or G/C for recognition of NT
[0205] A/T or G/C for recognition of HN [0206] G/C for recognition of SH
[0207] G/C for recognition of SN and [0208] A/T for recognition of IS.
[0209] The DNA of the present invention can be modified to modified by a
base pair selected from the following group in order to receive a
selective and determined recognition by one of the following
hypervariable regions: HA for recognition of C/G; ND for recognition of
C/G; HI for recognition of C/G; HN for recognition of G/C; and NA for
recognition of G/C.
[0210] In yet another aspect the invention provides a method of modifying
a nucleic acid sequence of interest present in a sample mixture by
binding thereto a polypeptide according to the invention, comprising
contacting the sample mixture with said polypeptide having affinity for
at least a portion of the sequence of interest, so as to allow the
polypeptide to recognize and preferably bind specifically to the sequence
of interest.
[0211] The term "modifying" as used herein is intended to mean that the
sequence is considered modified simply by the binding of the polypeptide.
It is not intended to suggest that the sequence of nucleotides is
changed, although such changes (and others) could ensue following binding
of the polypeptide to the nucleic acid of interest. Conveniently the
nucleic acid sequence is DNA.
[0212] Modification of the nucleic acid of interest (in the sense of
binding thereto by a polypeptide modified to contain modular repeat
units) could be detected in any of a number of methods (e.g. gel mobility
shift assays, use of labelled polypeptides--labels could include
radioactive, fluorescent, enzyme or biotin/streptavidin labels).
[0213] Modification of the nucleic acid sequence of interest (and
detection thereof) may be all that is required (e.g. in diagnosis of
disease). Desirably, however, further processing of the sample is
performed. Conveniently the polypeptide (and nucleic acid sequences
specifically bound thereto) is separated from the rest of the sample.
Advantageously the polypeptide-DNA complex is bound to a solid phase
support, to facilitate such separation. For example, the polypeptide may
be present in an acrylamide or agarose gel matrix or, more preferably, is
immobilised on the surface of a membrane or in the wells of a microtitre
plate.
[0214] In one embodiment of the invention, said repeat domain comprising
repeat units is inserted in a bacterial, viral, fungal, oomycete, human,
animal or plant polypeptide to achieve a targeted recognition and
preferably binding of one or more specified base pairs in a DNA sequence,
and optionally wherein said repeat units are taken from the repeat
domains of AvrBs3-like family of proteins which are further optionally
modified in order to obtain a pre-selected specific binding activity to
one or more base pairs in a DNA sequence.
[0215] The invention encompasses isolated or substantially purified
polynucleotide or protein compositions. An "isolated" or "purified"
polynucleotide or protein, or biologically active portion thereof, is
substantially or essentially free from components that normally accompany
or interact with the polynucleotide or protein as found in its naturally
occurring environment. Thus, an isolated or purified polynucleotide or
protein is substantially free of other cellular material or culture
medium when produced by recombinant techniques, or substantially free of
chemical precursors or other chemicals when chemically synthesized.
Optimally, an "isolated" polynucleotide is free of sequences (optimally
protein encoding sequences) that naturally flank the polynucleotide
(i.e., sequences located at the 5' and 3' ends of the polynucleotide) in
the genomic DNA of the organism from which the polynucleotide is derived.
For example, in various embodiments, the isolated polynucleotide can
contain less than about 5 kb, 4 kb, 3 kb, 2 kb, 1 kb, 0.5 kb, or 0.1 kb
of nucleotide sequence that naturally flank the polynucleotide in genomic
DNA of the cell from which the polynucleotide is derived. A protein that
is substantially free of cellular material includes preparations of
protein having less than about 30%, 20%, 10%, 5%, or 1% (by dry weight)
of contaminating protein. When the protein of the invention or
biologically active portion thereof is recombinantly produced, optimally
culture medium represents less than about 30%, 20%, 10%, 5%, or 1% (by
dry weight) of chemical precursors or non-protein-of-interest chemicals.
[0216] Fragments and variants of the disclosed DNA sequences and proteins
encoded thereby are also encompassed by the present invention. By
"fragment" is intended a portion of the DNA sequence or a portion of the
amino acid sequence and hence protein encoded thereby. Fragments of a DNA
sequence comprising coding sequences may encode protein fragments that
retain biological activity of the native protein and hence DNA
recognition or binding activity to a target DNA sequence as herein
described. Alternatively, fragments of a DNA sequence that are useful as
hybridization probes generally do not encode proteins that retain
biological activity or do not retain promoter activity. Thus, fragments
of a DNA sequence may range from at least about 20 nucleotides, about 50
nucleotides, about 100 nucleotides, and up to the full-length
polynucleotide of the invention.
[0217] "Variants" is intended to mean substantially similar sequences. For
DNA sequences, a variant comprises a DNA sequence having deletions (i.e.,
truncations) at the 5' and/or 3' end; deletion and/or addition of one or
more nucleotides at one or more internal sites in the native
polynucleotide; and/or substitution of one or more nucleotides at one or
more sites in the native polynucleotide. As used herein, a "native" DNA
sequence or polypeptide comprises a naturally occurring DNA sequence or
amino acid sequence, respectively. For DNA sequences, conservative
variants include those sequences that, because of the degeneracy of the
genetic code, encode the amino acid sequence of one of the polypeptides
of the invention. Variant DNA sequences also include synthetically
derived DNA sequences, such as those generated, for example, by using
site-directed mutagenesis but which still encode a protein of the
invention. Generally, variants of a particular DNA sequence of the
invention will have at least about 70%, 75%, 80%, 85%, 90%, 91%, 92%,
93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity to that
particular polynucleotide as determined by sequence alignment programs
and parameters as described elsewhere herein.
[0218] Variants of a particular DNA sequence of the invention (i.e., the
reference DNA sequence) can also be evaluated by comparison of the
percent sequence identity between the polypeptide encoded by a variant
DNA sequence and the polypeptide encoded by the reference DNA sequence.
Percent sequence identity between any two polypeptides can be calculated
using sequence alignment programs and parameters described elsewhere
herein. Where any given pair of polynucleotides of the invention is
evaluated by comparison of the percent sequence identity shared by the
two polypeptides they encode, the percent sequence identity between the
two encoded polypeptides is at least about 70%, 75%, 80%, 85%, 90%, 91%,
92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more sequence identity.
[0219] "Variant" protein is intended to mean a protein derived from the
native protein by deletion (so-called truncation) of one or more amino
acids at the N-terminal and/or C-terminal end of the native protein;
deletion and/or addition of one or more amino acids at one or more
internal sites in the native protein; or substitution of one or more
amino acids at one or more sites in the native protein. Variant proteins
encompassed by the present invention are biologically active, that is
they continue to possess the desired biological activity of the native
protein as described herein. Such variants may result from, for example,
genetic polymorphism or from human manipulation. Biologically active
variants of a protein of the invention will have at least about 70%, 75%,
80%, 85%, 90%, 91%, 92%, 93%, 94%, 95%, 96%, 97%, 98%, 99% or more
sequence identity to the amino acid sequence for the native protein as
determined by sequence alignment programs and parameters described
elsewhere herein. A biologically active variant of a protein of the
invention may differ from that protein by as few as 1-15 amino acid
residues, as few as 1-10, such as 6-10, as few as 5, as few as 4, 3, 2,
or even 1 amino acid residue.
[0220] The proteins of the invention may be altered in various ways
including amino acid substitutions, deletions, truncations, and
insertions. Methods for such manipulations are generally known in the
art. For example, amino acid sequence variants and fragments of the
proteins can be prepared by mutations in the DNA. Methods for mutagenesis
and polynucleotide alterations are well known in the art. See, for
example, Kunkel (1985) Proc. Natl. Acad. Sci. USA 82:488-492; Kunkel et
al. (1987) Methods in Enzymol. 154:367-382; U.S. Pat. No. 4,873,192;
Walker and Gaastra, eds. (1983) Techniques in Molecular Biology
(MacMillan Publishing Company, New York) and the references cited
therein. Guidance as to appropriate amino acid substitutions that do not
affect biological activity of the protein of interest may be found in the
model of Dayhoff et al. (1978) Atlas of Protein Sequence and Structure
(Natl. Biomed. Res. Found., Washington, D.C.), herein incorporated by
reference. Conservative substitutions, such as exchanging one amino acid
with another having similar properties, may be optimal.
[0221] The deletions, insertions, and substitutions of the protein
sequences encompassed herein are not expected to produce radical changes
in the characteristics of the protein. However, when it is difficult to
predict the exact effect of the substitution, deletion, or insertion in
advance of doing so, one skilled in the art will appreciate that the
effect will be evaluated by routine screening assays as described
elsewhere herein or known in the art.
[0222] Variant DNA sequences and proteins also encompass sequences and
proteins derived from a mutagenic and recombinogenic procedure such as
DNA shuffling. Strategies for such DNA shuffling are known in the art.
See, for example, Stemmer (1994) Proc. Natl. Acad. Sci. USA
91:10747-10751; Stemmer (1994) Nature 370:389-391; Crameri et al. (1997)
Nature Biotech. 15:436-438; Moore et al. (1997) J. Mol. Biol.
272:336-347; Zhang et al. (1997) Proc. Natl. Acad. Sci. USA 94:4504-4509;
Crameri et al. (1998) Nature 391:288-291; and U.S. Pat. Nos. 5,605,793
and 5,837,458.
[0223] In a PCR approaches, oligonucleotide primers can be designed for
use in PCR reactions to amplify corresponding DNA sequences from cDNA or
genomic DNA extracted from any organism of interest. Methods for
designing PCR primers and PCR cloning are generally known in the art and
are disclosed in Sambrook et al. (1989) Molecular Cloning: A Laboratory
Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.).
See also Innis et al., eds. (1990) PCR Protocols: A Guide to Methods and
Applications (Academic Press, New York); Innis and Gelfand, eds. (1995)
PCR Strategies (Academic Press, New York); and Innis and Gelfand, eds.
(1999) PCR Methods Manual (Academic Press, New York). Known methods of
PCR include, but are not limited to, methods using paired primers, nested
primers, single specific primers, degenerate primers, gene-specific
primers, vector-specific primers, partially-mismatched primers, and the
like.
[0224] In hybridization techniques, all or part of a known polynucleotide
is used as a probe that selectively hybridizes to other corresponding
polynucleotides present in a population of cloned genomic DNA fragments
or cDNA fragments (i.e., genomic or cDNA libraries) from a chosen
organism. The hybridization probes may be genomic DNA fragments, cDNA
fragments, RNA fragments, or other oligonucleotides, and may be labeled
with a detectable group such as .sup.32P, or any other detectable marker.
Thus, for example, probes for hybridization can be made by labeling
synthetic oligonucleotides based on the DNA sequences of the invention.
Methods for preparation of probes for hybridization and for construction
of cDNA and genomic libraries are generally known in the art and are
disclosed in Sambrook et al. (1989) Molecular Cloning: A Laboratory
Manual (2d ed., Cold Spring Harbor Laboratory Press, Plainview, N.Y.).
[0225] Hybridization of such sequences may be carried out under stringent
conditions. By "stringent conditions" or "stringent hybridization
conditions" is intended conditions under which a probe will hybridize to
its target sequence to a detectably greater degree than to other
sequences (e.g., at least 2-fold over background). Stringent conditions
are sequence-dependent and will be different in different circumstances.
By controlling the stringency of the hybridization and/or washing
conditions, target sequences that are 100% complementary to the probe can
be identified (homologous probing). Alternatively, stringency conditions
can be adjusted to allow some mismatching in sequences so that lower
degrees of similarity are detected (heterologous probing). Generally, a
probe is less than about 1000 nucleotides in length, optimally less than
500 nucleotides in length.
[0226] Typically, stringent conditions will be those in which the salt
concentration is less than about 1.5 M Na ion, typically about 0.01 to
1.0 M Na ion concentration (or other salts) at pH 7.0 to 8.3 and the
temperature is at least about 30.degree. C. for short probes (e.g., 10 to
50 nucleotides) and at least about 60.degree. C. for long probes (e.g.,
greater than 50 nucleotides). Stringent conditions may also be achieved
with the addition of destabilizing agents such as formamide. Exemplary
low stringency conditions include hybridization with a buffer solution of
30 to 35% formamide, 1 M NaCl, 1% SDS (sodium dodecyl sulphate) at
37.degree. C., and a wash in 1.times. to 2.times.SSC (20.times.SSC=3.0 M
NaCl/0.3 M trisodium citrate) at 50 to 55.degree. C. Exemplary moderate
stringency conditions include hybridization in 40 to 45% formamide, 1.0 M
NaCl, 1% SDS at 37.degree. C., and a wash in 0.5.times. to 1.times.SSC at
55 to 60.degree. C. Exemplary high stringency conditions include
hybridization in 50% formamide, 1 M NaCl, 1% SDS at 37.degree. C., and a
wash in 0.1.times.SSC at 60 to 65.degree. C. Optionally, wash buffers may
comprise about 0.1% to about 1% SDS. Duration of hybridization is
generally less than about 24 hours, usually about 4 to about 12 hours.
The duration of the wash time will be at least a length of time
sufficient to reach equilibrium.
[0227] Specificity is typically the function of post-hybridization washes,
the critical factors being the ionic strength and temperature of the
final wash solution. For DNA-DNA hybrids, the T.sub.m can be approximated
from the equation of Meinkoth and Wahl (1984) Anal. Biochem. 138:267-284:
T.sub.m=81.5.degree. C.+16.6 (log M)+0.41 (% GC)-0.61 (% form)-500/L;
where M is the molarity of monovalent cations, % GC is the percentage of
guanosine and cytosine nucleotides in the DNA, % form is the percentage
of formamide in the hybridization solution, and L is the length of the
hybrid in base pairs. The T.sub.m is the temperature (under defined ionic
strength and pH) at which 50% of a complementary target sequence
hybridizes to a perfectly matched probe. T.sub.m is reduced by about
1.degree. C. for each 1% of mismatching; thus, T.sub.m, hybridization,
and/or wash conditions can be adjusted to hybridize to sequences of the
desired identity. For example, if sequences with .gtoreq.90% identity are
sought, the T.sub.m can be decreased 10.degree. C. Generally, stringent
conditions are selected to be about 5.degree. C. lower than the thermal
melting point (T.sub.m) for the specific sequence and its complement at a
defined ionic strength and pH. However, severely stringent conditions can
utilize a hybridization and/or wash at 1, 2, 3, or 4.degree. C. lower
than the thermal melting point (T.sub.m); moderately stringent conditions
can utilize a hybridization and/or wash at 6, 7, 8, 9, or 10.degree. C.
lower than the thermal melting point (T.sub.m); low stringency conditions
can utilize a hybridization and/or wash at 11, 12, 13, 14, 15, or
20.degree. C. lower than the thermal melting point (T.sub.m). Using the
equation, hybridization and wash compositions, and desired T.sub.m, those
of ordinary skill will understand that variations in the stringency of
hybridization and/or wash solutions are inherently described. If the
desired degree of mismatching results in a T.sub.m of less than
45.degree. C. (aqueous solution) or 32.degree. C. (formamide solution),
it is optimal to increase the SSC concentration so that a higher
temperature can be used. An extensive guide to the hybridization of
nucleic acids is found in Tijssen (1993) Laboratory Techniques in
Biochemistry and Molecular Biology--Hybridization with Nucleic Acid
Probes, Part I, Chapter 2 (Elsevier, New York); and Ausubel et al., eds.
(1995) Current Protocols in Molecular Biology, Chapter 2 (Greene
Publishing and Wiley-Interscience, New York). See Sambrook et al. (1989)
Molecular Cloning: A Laboratory Manual (2d ed., Cold Spring Harbor
Laboratory Press, Plainview, N.Y.).
[0228] It is recognized that the DNA sequences and proteins of the
invention encompass polynucleotide molecules and proteins comprising a
nucleotide or an amino acid sequence that is sufficiently identical to
the DNA sequences or to the amino acid sequence disclosed herein. The
term "sufficiently identical" is used herein to refer to a first amino
acid or nucleotide sequence that contains a sufficient or minimum number
of identical or equivalent (e.g., with a similar side chain) amino acid
residues or nucleotides to a second amino acid or nucleotide sequence
such that the first and second amino acid or nucleotide sequences have a
common structural domain and/or common functional activity. For example,
amino acid or nucleotide sequences that contain a common structural
domain having at least about 70% identity, preferably 75% identity, more
preferably 85%, 90%, 95%, 96%, 97%, 98% or 99% identity are defined
herein as sufficiently identical.
[0229] To determine the percent identity of two amino acid sequences or of
two nucleic acids, the sequences are aligned for optimal comparison
purposes. The percent identity between the two sequences is a function of
the number of identical positions shared by the sequences (i.e., percent
identity=number of identical positions/total number of positions (e.g.,
overlapping positions).times.100). In one embodiment, the two sequences
are the same length. The percent identity between two sequences can be
determined using techniques similar to those described below, with or
without allowing gaps. In calculating percent identity, typically exact
matches are counted.
[0230] The determination of percent identity between two sequences can be
accomplished using a mathematical algorithm. A preferred, nonlimiting
example of a mathematical algorithm utilized for the comparison of two
sequences is the algorithm of Karlin and Altschul (1990) Proc. Natl.
Acad. Sci. USA 87:2264, modified as in Karlin and Altschul (1993) Proc.
Natl. Acad. Sci. USA 90:5873-5877. Such an algorithm is incorporated into
the NBLAST and XBLAST programs of Altschul et al. (1990) J. Mol. Biol.
215:403. BLAST nucleotide searches can be performed with the NBLAST
program, score=100, wordlength=12, to obtain nucleotide sequences
homologous to the polynucleotide molecules of the invention. BLAST
protein searches can be performed with the XBLAST program, score=50,
wordlength=3, to obtain amino acid sequences homologous to protein
molecules of the invention. To obtain gapped alignments for comparison
purposes, Gapped BLAST can be utilized as described in Altschul et al.
(1997) Nucleic Acids Res. 25:3389. Alternatively, PSI-Blast can be used
to perform an iterated search that detects distant relationships between
molecules. See Altschul et al. (1997) supra. When utilizing BLAST, Gapped
BLAST, and PSI-Blast programs, the default parameters of the respective
programs (e.g., XBLAST and NBLAST) can be used. See
http://www.ncbi.nlm.nih.gov. Another preferred, non-limiting example of a
mathematical algorithm utilized for the comparison of sequences is the
algorithm of Myers and Miller (1988) CABIOS 4:11-17. Such an algorithm is
incorporated into the ALIGN program (version 2.0), which is part of the
GCG sequence alignment software package. When utilizing the ALIGN program
for comparing amino acid sequences, a PAM120 weight residue table, a gap
length penalty of 12, and a gap penalty of 4 can be used. Alignment may
also be performed manually by inspection.
[0231] Unless otherwise stated, sequence identity/similarity values
provided herein refer to the value obtained using the full-length
sequences of the invention and using multiple alignment by mean of the
algorithm Clustal W (Nucleic Acid Research, 22(22):4673-4680, 1994) using
the program AlignX included in the software package Vector NTI Suite
Version 7 (InforMax, Inc., Bethesda, Md., USA) using the default
parameters; or any equivalent program thereof. By "equivalent program" is
intended any sequence comparison program that, for any two sequences in
question, generates an alignment having identical nucleotide or amino
acid residue matches and an identical percent sequence identity when
compared to the corresponding alignment generated by CLUSTALW (Version
1.83) using default parameters (available at the European Bioinformatics
Institute website: http://www.ebi.ac.uk/Tools/clustalw/html).
[0232] The DNA sequences of the invention can be provided in expression
cassettes for expression in any prokaryotic or eukaryotic cell and/or
organism of interest including, but not limited to, bacteria, fungi,
algae, plants, and animals. The cassette will include 5' and 3'
regulatory sequences operably linked to a DNA sequence of the invention.
"Operably linked" is intended to mean a functional linkage between two or
more elements. For example, an operable linkage between a polynucleotide
or gene of interest and a regulatory sequence (i.e., a promoter) is
functional link that allows for expression of the polynucleotide of
interest. Operably linked elements may be contiguous or non-contiguous.
When used to refer to the joining of two protein coding regions, by
operably linked is intended that the coding regions are in the same
reading frame. The cassette may additionally contain at least one
additional gene to be cotransformed into the organism. Alternatively, the
additional gene(s) can be provided on multiple expression cassettes. Such
an expression cassette is provided with a plurality of restriction sites
and/or recombination sites for insertion of the DNA sequence to be under
the transcriptional regulation of the regulatory regions. The expression
cassette may additionally contain selectable marker genes.
[0233] The expression cassette will include in the 5'-3' direction of
transcription, a transcriptional and translational initiation region
(i.e., a promoter), a DNA sequence of the invention, and a
transcriptional and translational termination region (i.e., termination
region) functional in plants or other organism or non-human host cell.
The regulatory regions (i.e., promoters, transcriptional regulatory
regions, and translational termination regions) and/or the DNA sequence
of the invention may be native/analogous to the host cell or to each
other. Alternatively, the regulatory regions and/or DNA sequence of the
invention may be heterologous to the host cell or to each other. As used
herein, "heterologous" in reference to a sequence is a sequence that
originates from a foreign species, or, if from the same species, is
substantially modified from its native form in composition and/or genomic
locus by deliberate human intervention. For example, a promoter operably
linked to a heterologous polynucleotide is from a species different from
the species from which the polynucleotide was derived, or, if from the
same/analogous species, one or both are substantially modified from their
original form and/or genomic locus, or the promoter is not the native
promoter for the operably linked polynucleotide. As used herein, a
chimeric gene comprises a coding sequence operably linked to a
transcription initiation region that is heterologous to the coding
sequence.
[0234] The termination region may be native with the transcriptional
initiation region, may be native with the operably linked DNA sequence of
interest, may be native with the host, or may be derived from another
source (i.e., foreign or heterologous) to the promoter, the DNA sequence
of interest, the plant host, or any combination thereof. Convenient
termination regions for use in plants are available from the Ti-plasmid
of A. tumefaciens, such as the octopine synthase and nopaline synthase
termination regions. See also Guerineau et al. (1991) Mol. Gen. Genet.
262:141-144; Proudfoot (1991) Cell 64:671-674; Sanfacon et al. (1991)
Genes Dev. 5:141-149; Mogen et al. (1990) Plant Cell 2:1261-1272; Munroe
et al. (1990) Gene 91:151-158; Ballas et al. (1989) Nucleic Acids Res.
17:7891-7903; and Joshi et al. (1987) Nucleic Acids Res. 15:9627-9639.
[0235] Where appropriate, the polynucleotides may be optimized for
increased expression in a transformed organism. That is, the
polynucleotides can be synthesized using codons preferred by the host for
improved expression. See, for example, Campbell and Gowri (1990) Plant
Physiol. 92:1-11 for a discussion of host-preferred codon usage. Methods
are available in the art for synthesizing host-preferred gene,
particularly plant-preferred genes. See, for example, U.S. Pat. Nos.
5,380,831, and 5,436,391, and Murray et al. (1989) Nucleic Acids Res.
17:477-498, herein incorporated by reference.
[0236] Additional sequence modifications are known to enhance gene
expression in a cellular host. These include elimination of sequences
encoding spurious polyadenylation signals, exon-intron splice site
signals, transposon-like repeats, and other such well-characterized
sequences that may be deleterious to gene expression. The G-C content of
the sequence may be adjusted to levels average for a given cellular host,
as calculated by reference to known genes expressed in the host cell.
When possible, the sequence is modified to avoid predicted hairpin
secondary mRNA structures.
[0237] The expression cas
settes may additionally contain 5' leader
sequences. Such leader sequences can act to enhance translation.
Translation leaders are known in the art and include: picornavirus
leaders, for example, EMCV leader (Encephalomyocarditis 5' noncoding
region) (Elroy-Stein et al. (1989) Proc. Natl. Acad. Sci. USA
86:6126-6130); potyvirus leaders, for example, TEV leader (Tobacco Etch
Virus) (Gallie et al. (1995) Gene 165(2):233-238), MDMV leader (Maize
Dwarf Mosaic Virus) (Virology 154:9-20), and human immunoglobulin
heavy-chain binding protein (BiP) (Macejak et al. (1991) Nature
353:90-94); untranslated leader from the coat protein mRNA of alfalfa
mosaic virus (AMV RNA 4) (Jobling et al. (1987) Nature 325:622-625);
tobacco mosaic virus leader (TMV) (Gallie et al. (1989) in Molecular
Biology of RNA, ed. Cech (Liss, New York), pp. 237-256); and maize
chlorotic mottle virus leader (MCMV) (Lommel et al. (1991) Virology
81:382-385). See also, Della-Cioppa et al. (1987) Plant Physiol.
84:965-968.
[0238] In preparing the expression cassette, the various DNA fragments may
be manipulated, so as to provide for the DNA sequences in the proper
orientation and, as appropriate, in the proper reading frame. Toward this
end, adapters or linkers may be employed to join the DNA fragments or
other manipulations may be involved to provide for convenient restriction
sites, removal of superfluous DNA, removal of restriction sites, or the
like. For this purpose, in vitro mutagenesis, primer repair, restriction,
annealing, resubstitutions, e.g., transitions and transversions, may be
involved.
[0239] A number of promoters can be used in the practice of the invention.
The promoters can be selected based on the host of interest and the
desired outcome. The nucleic acids can be combined with constitutive,
tissue-preferred, or other promoters for expression in plants. Such
constitutive promoters include, for example, the core CaMV 35S promoter
(Odell et al. (1985) Nature 313:810-812); rice actin (McElroy et al.
(1990) Plant Cell 2:163-171); ubiquitin (Christensen et al. (1989) Plant
Mol. Biol. 12:619-632 and Christensen et al. (1992) Plant Mol. Biol.
18:675-689); pEMU (Last et al. (1991) Theon. Appl. Genet. 81:581-588);
MAS (Velten et al. (1984) EMBO J. 3:2723-2730); ALS promoter (U.S. Pat.
No. 5,659,026), and the like. Other constitutive promoters include, for
example, U.S. Pat. Nos. 5,608,149; 5,608,144; 5,604,121; 5,569,597;
5,466,785; 5,399,680; 5,268,463; 5,608,142; and 6,177,611.
[0240] Tissue-preferred promoters can be utilized to target enhanced
expression within a particular host tissue. Such tissue-preferred
promoters for use in plants include, but are not limited to,
leaf-preferred promoters, root-preferred promoters, seed-preferred
promoters, and stem-preferred promoters. Tissue-preferred promoters
include Yamamoto et al. (1997) Plant J. 12(2):255-265; Kawamata et al.
(1997) Plant Cell Physiol. 38(7):792-803; Hansen et al. (1997) Mol. Gen.
Genet. 254(3):337-343; Russell et al. (1997) Transgenic Res.
6(2):157-168; Rinehart et al. (1996) Plant Physiol. 112(3):1331-1341; Van
Camp et al. (1996) Plant Physiol. 112(2):525-535; Canevascini et al.
(1996) Plant Physiol. 112(2):513-524; Yamamoto et al. (1994) Plant Cell
Physiol. 35(5):773-778; Lam (1994) Results Probl. Cell Differ.
20:181-196; Orozco et al. (1993) Plant Mol. Biol. 23(6):1129-1138;
Matsuoka et al. (1993) Proc Natl. Acad. Sci. USA 90(20):9586-9590; and
Guevara-Garcia et al. (1993) Plant J. 4(3):495-505. Such promoters can be
modified, if necessary, for weak expression.
[0241] Generally, it will be beneficial to express the gene from an
inducible promoter, particularly from a pathogen-inducible promoter. Such
promoters include those from pathogenesis-related proteins (PR proteins),
which are induced following infection by a pathogen; e.g., PR proteins,
SAR proteins, beta-1,3-glucanase, chitinase, etc. See, for example,
Redolfi et al. (1983) Neth. J. Plant Pathol. 89:245-254; Uknes et al.
(1992) Plant Cell 4:645-656; and Van Loon (1985) Plant Mol. Virol.
4:111-116. See also WO 99/43819, herein incorporated by reference.
[0242] Of interest are promoters that are expressed locally at or near the
site of pathogen infection. See, for example, Marineau et al. (1987)
Plant Mol. Biol. 9:335-342; Matton et al. (1989) Molecular Plant-Microbe
Interactions 2:325-331; Somsisch et al. (1986) Proc. Natl. Acad. Sci. USA
83:2427-2430; Somsisch et al. (1988) Mol. Gen. Genet. 2:93-98; and Yang
(1996) Proc. Natl. Acad. Sci. USA 93:14972-14977. See also, Chen et al.
(1996) Plant J. 10:955-966; Zhang et al. (1994) Proc. Natl. Acad. Sci.
USA 91:2507-2511; Warner et al. (1993) Plant J. 3:191-201; Siebertz et
al. (1989) Plant Cell 1:961-968; U.S. Pat. No. 5,750,386
(nematode-inducible); and the references cited therein. Of particular
interest is the inducible promoter for the maize PRms gene, whose
expression is induced by the pathogen Fusarium moniliforme (see, for
example, Cordero et al. (1992) Physiol. Mol. Plant. Path. 41:189-200).
[0243] Chemical-regulated promoters can be used to modulate the expression
of a gene in a plant through the application of an exogenous chemical
regulator. Depending upon the objective, the promoter may be a
chemical-inducible promoter, where application of the chemical induces
gene expression, or a chemical-repressible promoter, where application of
the chemical represses gene expression. Chemical-inducible promoters are
known in the art and include, but are not limited to, the maize In2-2
promoter, which is activated by benzenesulfonamide herbicide safeners,
the maize GST promoter, which is activated by hydrophobic electrophilic
compounds that are used as pre-emergent herbicides, and the tobacco PR-1a
promoter, which is activated by salicylic acid. Other chemical-regulated
promoters of interest include steroid-responsive promoters (see, for
example, the glucocorticoid-inducible promoter in Schena et al. (1991)
Proc. Natl. Acad. Sci. USA 88:10421-10425 and McNellis et al. (1998)
Plant J. 14(2):247-257) and tetracycline-inducible and
tetracycline-repressible promoters (see, for example, Gatz et al. (1991)
Mol. Gen. Genet. 227:229-237, and U.S. Pat. Nos. 5,814,618 and
5,789,156), herein incorporated by reference.
[0244] The expression cassette can also comprise a selectable marker gene
for the selection of transformed cells. Selectable marker genes are
utilized for the selection of transformed cells or tissues. Marker genes
include genes encoding antibiotic resistance, such as those encoding
neomycin phosp
hotransferase II (NEO) and hygromycin phosp
hotransferase
(HPT), as well as genes conferring resistance to herbicidal compounds,
such as glufosinate ammonium, bromoxynil, imidazolinones, and
2,4-dichlorophenoxyacetate (2,4-D). Additional selectable markers include
phenotypic markers such as .beta.-galactosidase and fluorescent proteins
such as green fluorescent protein (GFP) (Su et al. (2004) Biotechnol
Bioeng 85:610-9 and Fetter et al. (2004) Plant Cell 16:215-28), cyan
florescent protein (CYP) (Bolte et al. (2004) J. Cell Science 117:943-54
and Kato et al. (2002) Plant Physiol 129:913-42), and yellow florescent
protein (PhiYFP.TM. from Evrogen, see, Bolte et al. (2004) J. Cell
Science 117:943-54). For additional selectable markers, see generally,
Yarranton (1992) Curr. Opin. Biotech. 3:506-511; Christopherson et al.
(1992) Proc. Natl. Acad. Sci. USA 89:6314-6318; Yao et al. (1992) Cell
71:63-72; Reznikoff (1992) Mol. Microbiol. 6:2419-2422; Barkley et al.
(1980) in The Operon, pp. 177-220; Hu et al. (1987) Cell 48:555-566;
Brown et al. (1987) Cell 49:603-612; Figge et al. (1988) Cell 52:713-722;
Deuschle et al. (1989) Proc. Natl. Acad. Aci. USA 86:5400-5404; Fuerst et
al. (1989) Proc. Natl. Acad. Sci. USA 86:2549-2553; Deuschle et al.
(1990) Science 248:480-483; Gossen (1993) Ph.D. Thesis, University of
Heidelberg; Reines et al. (1993) Proc. Natl. Acad. Sci. USA 90:1917-1921;
Labow et al. (1990) Mol. Cell. Biol. 10:3343-3356; Zambretti et al.
(1992) Proc. Natl. Acad. Sci. USA 89:3952-3956; Baim et al. (1991) Proc.
Natl. Acad. Sci. USA 88:5072-5076; Wyborski et al. (1991) Nucleic Acids
Res. 19:4647-4653; Hillenand-Wissman (1989) Topics Mol. Struc. Biol.
10:143-162; Degenkolb et al. (1991) Antimicrob. Agents Chemother.
35:1591-1595; Kleinschnidt et al. (1988) Biochemistry 27:1094-1104; Bonin
(1993) Ph.D. Thesis, University of Heidelberg; Gossen et al. (1992) Proc.
Natl. Acad. Sci. USA 89:5547-5551; Oliva et al. (1992) Antimicrob. Agents
Chemother. 36:913-919; Hlavka et al. (1985) Handbook of Experimental
Pharmacology, Vol. 78 (Springer-Verlag, Berlin); Gill et al. (1988)
Nature 334:721-724. Such disclosures are herein incorporated by
reference.
[0245] The above list of selectable marker genes is not meant to be
limiting. Any selectable marker gene can be used in the present
invention.
[0246] Numerous plant transformation vectors and methods for transforming
plants are available. See, for example, An, G. et al. (1986) Plant
Pysiol., 81:301-305; Fry, J., et al. (1987) Plant Cell Rep. 6:321-325;
Block, M. (1988) Theon. Appl Genet. 76:767-774; Hinchee, et al. (1990)
Stadler. Genet. Symp. 203212.203-212; Cousins, et al. (1991) Aust. J.
Plant Physiol. 18:481-494; Chee, P. P. and Slightom, J. L. (1992) Gene.
118:255-260; Christou, et al. (1992) Trends. Biotechnol. 10:239-246;
D'Halluin, et al. (1992) Bio/Technol. 10:309-314; Dhir, et al. (1992)
Plant Physiol. 99:81-88; Casas et al. (1993) Proc. Nat. Acad. Sci. USA
90:11212-11216; Christou, P. (1993) In Vitro Cell. Dev. Biol.-Plant;
29P:119-124; Davies, et al. (1993) Plant Cell Rep. 12:180-183; Dong, J.
A. and Mchughen, A. (1993) Plant Sci. 91:139-148; Franklin, C. I. and
Trieu, T. N. (1993) Plant. Physiol. 102:167; Golovkin, et al. (1993)
Plant Sci. 90:41-52; Guo Chin Sci. Bull. 38:2072-2078; Asano, et al.
(1994) Plant Cell Rep. 13; Ayeres N. M. and Park, W. D. (1994) Crit. Rev.
Plant. Sci. 13:219-239; Barcelo, et al. (1994) Plant. J. 5:583-592;
Becker, et al. (1994) Plant. J. 5:299-307; Borkowska et al. (1994) Acta.
Physiol Plant. 16:225-230; Christou, P. (1994) Agro. Food. Ind. Hi Tech.
5: 17-27; Eapen et al. (1994) Plant Cell Rep. 13:582-586; Hartman, et al.
(1994) Bio-Technology 12: 919923; Ritala, et al. (1994) Plant. Mol. Biol.
24:317-325; and Wan, Y. C. and Lemaux, P. G. (1994) Plant Physiol.
104:3748.
[0247] The methods of the invention involve introducing a polynucleotide
construct comprising a DNA sequence into a host cell. By "introducing" is
intended presenting to the plant the polynucleotide construct in such a
manner that the construct gains access to the interior of the host cell.
The methods of the invention do not depend on a particular method for
introducing a polynucleotide construct into a host cell, only that the
polynucleotide construct gains access to the interior of one cell of the
host. Methods for introducing polynucleotide constructs into bacteria,
plants, fungi and animals are known in the art including, but not limited
to, stable transformation methods, transient transformation methods, and
virus-mediated methods.
[0248] By "stable transformation" is intended that the polynucleotide
construct introduced into a plant integrates into the genome of the host
and is capable of being inherited by progeny thereof. By "transient
transformation" is intended that a polynucleotide construct introduced
into the host does not integrate into the genome of the host.
[0249] For the transformation of plants and plant cells, the DNA sequences
of the invention are inserted using standard techniques into any vector
known in the art that is suitable for expression of the DNA sequences in
a host cell or organism of interest. The selection of the vector depends
on the preferred transformation technique and the target host species to
be transformed.
[0250] Methodologies for constructing plant expression cas
settes and
introducing foreign nucleic acids into plants are generally known in the
art and have been previously described. For example, foreign DNA can be
introduced into plants, using tumor-inducing (Ti) plasmid vectors. Other
methods utilized for foreign DNA delivery involve the use of PEG mediated
protoplast transformation, electroporation, microinjection whiskers, and
biolistics or microprojectile bombardment for direct DNA uptake. Such
methods are known in the art. (U.S. Pat. No. 5,405,765 to Vasil et al.;
Bilang et al. (1991) Gene 100: 247-250; Scheid et al., (1991) Mol. Gen.
Genet., 228: 104-112; Guerche et al., (1987) Plant Science 52: 111-116;
Neuhause et al., (1987) Theor. Appl Genet. 75: 30-36; Klein et al.,
(1987) Nature 327: 70-73; Howell et al., (1980) Science 208:1265; Horsch
et al., (1985) Science 227: 1229-1231; DeBlock et al., (1989) Plant
Physiology 91: 694-701; Methods for Plant Molecular Biology (Weissbach
and Weissbach, eds.) Academic Press, Inc. (1988) and Methods in Plant
Molecular Biology (Schuler and Zielinski, eds.) Academic Press, Inc.
(1989). The method of transformation depends upon the plant cell to be
transformed, stability of vectors used, expression level of gene products
and other parameters.
[0251] The DNA sequences of the invention may be introduced into plants by
contacting plants with a virus or viral nucleic acids. Generally, such
methods involve incorporating a polynucleotide construct of the invention
within a viral DNA or RNA molecule. It is recognized that the a protein
of the invention may be initially synthesized as part of a viral
polyprotein, which later may be processed by proteolysis in vivo or in
vitro to produce the desired recombinant protein. Further, it is
recognized that promoters of the invention also encompass promoters
utilized for transcription by viral RNA polymerases. Methods for
introducing polynucleotide constructs into plants and expressing a
protein encoded therein, involving viral DNA or RNA molecules, are known
in the art. See, for example, U.S. Pat. Nos. 5,889,191, 5,889,190,
5,866,785, 5,589,367 and 5,316,931; herein incorporated by reference.
[0252] In specific embodiments, the DNA sequences of the invention can be
provided to a plant using a variety of transient transformation methods.
Such transient transformation methods include, but are not limited to,
the introduction of the protein or variants and fragments thereof
directly into the plant or the introduction of a transcript encoding the
protein into the plant. Such methods include, for example, microinjection
or particle bombardment. See, for example, Crossway et al. (1986) Mol.
Gen. Genet. 202:179-185; Nomura et al. (1986) Plant Sci. 44:53-58; Hepler
et al. (1994) Proc. Natl. Acad. Sci. 91: 2176-2180 and Hush et al. (1994)
The Journal of Cell Science 107:775-784, all of which are herein
incorporated by reference. Alternatively, the polynucleotide can be
transiently transformed into the plant using techniques known in the art.
Such techniques include Agrobacterium tumefaciens-mediated transient
expression as described below.
[0253] The cells that have been transformed may be grown into plants in
accordance with conventional ways. See, for example, McCormick et al.
(1986) Plant Cell Reports 5:81-84. These plants may then be grown, and
either pollinated with the same transformed strain or different strains,
and the resulting hybrid having constitutive expression of the desired
phenotypic characteristic identified. Two or more generations may be
grown to ensure that expression of the desired phenotypic characteristic
is stably maintained and inherited and then seeds harvested to ensure
expression of the desired phenotypic characteristic has been achieved. In
this manner, the present invention provides transformed seed (also
referred to as "transgenic seed") having a polynucleotide construct of
the invention, for example, an expression cassette of the invention,
stably incorporated into their genome.
[0254] The present invention may be used for transformation of any plant
species, including, but not limited to, monocots and dicots. Plants of
particular interest include, but are not limited to, and grain plants
that provide seeds of interest, oil-seed plants, leguminous plants, and
Arabidopsis thaliana. Seeds of interest include grain seeds, such as
corn, wheat, barley, rice, sorghum, rye, etc. Oil-seed plants include
cotton, soybean, safflower, sunflower, Brassica, maize, alfalfa, palm,
coconut, etc. Leguminous plants include beans and peas. Beans include
guar, locust bean, fenugreek, soybean, garden beans, cowpea, mungbean,
lima bean, fava bean, lentils, chickpea, etc.
[0255] As used herein, the term plant includes plant cells, plant
protoplasts, plant cell tissue cultures from which plants can be
regenerated, plant calli, plant clumps, and plant cells that are intact
in plants or parts of plants such as embryos, pollen, ovules, seeds,
leaves, flowers, branches, fruits, roots, root tips, anthers, and the
like. Progeny, variants, and mutants of the regenerated plants are also
included within the scope of the invention, provided that these parts
comprise the introduced polynucleotides.
[0256] The present invention further encompasses the introduction of the
DNA sequences of the invention into non-plant host cells, including, but
not limited to, bacterial cells, yeast cells other fungal cells, human
cells, and other animal cells. In addition, the invention encompasses the
introduction of the DNA sequences into animals and other organisms by
both stable and transient transformation methods.
[0257] As discussed herein, a DNA sequence of the present invention can be
expressed in these eukaryotic systems. Synthesis of heterologous
polynucleotides in yeast is well known (Sherman et al. (1982) Methods in
Yeast Genetics, Cold Spring Harbor Laboratory). Two widely utilized
yeasts for production of eukaryotic proteins are Saccharomyces cerevisiae
and Pichia pastoris. Vectors, strains, and protocols for expression in
Saccharomyces and Pichia are known in the art and available from
commercial suppliers (e.g., Invitrogen). Suitable vectors usually have
expression control sequences, such as promoters, including
3-phosphoglycerate kinase or alcohol oxidase, and an origin of
replication, termination sequences and the like as desired.
[0258] The sequences of the present invention can also be ligated to
various expression vectors for use in transfecting cell cultures of
mammalian or insect origin. Illustrative cell cultures useful for the
production of the peptides are mammalian cells. A number of suitable host
cell lines capable of expressing intact proteins have been developed in
the art, and include the HEK293, BHK21, and CHO cell lines. Expression
vectors for these cells can include expression control sequences, such as
an origin of replication, a promoter (e.g. the CMV promoter, a HSV tk
promoter or pgk (phosphoglycerate kinase) promoter), an enhancer (Queen
et al. (1986) Immunol. Rev. 89:49), and necessary processing information
sites, such as ribosome binding sites, RNA splice sites, polyadenylation
sites (e.g., an SV40 large T Ag poly A addition site), and
transcriptional terminator sequences. Other animal cells useful for
production of proteins of the present invention are available, for
instance, from the American Type Culture Collection.
[0259] Appropriate vectors for expressing proteins of the present
invention in insect cells are usually derived from the SF9 baculovirus.
Suitable insect cell lines include mosquito larvae, silkworm, armyworm,
moth and Drosophila cell lines such as a Schneider cell line (See,
Schneider (1987) J. Embyol. Exp. Morphol. 27:353-365).
[0260] As with yeast, when higher animal or plant host cells are employed,
polyadenylation or transcription terminator sequences are typically
incorporated into the vector. An example of a terminator sequence is the
polyadenylation sequence from the bovine growth hormone gene. Sequences
for accurate splicing of the transcript may also be included. An example
of a splicing sequence is the VP 1 intron from SV40 (Sprague et al.
(1983) J. Virol. 45:773-781). Additionally, gene sequences to control
replication in the host cell may be incorporated into the vector such as
those found in bovine papilloma virus type-vectors (Saveria-Campo (1985)
DNA Cloning Vol. II a Practical Approach, D. M. Glover, Ed., IRL Press,
Arlington, Va., pp. 213-238).
[0261] Animal and lower eukaryotic (e.g., yeast) host cells are competent
or rendered competent for transfection by various means. There are
several well-known methods of introducing DNA into animal cells. These
include: calcium phosphate precipitation, fusion of the recipient cells
with bacterial protoplasts containing the DNA, treatment of the recipient
cells with liposomes containing the DNA, DEAE dextrin, electroporation,
biolistics, and micro-injection of the DNA directly into the cells. The
transfected cells are cultured by means well known in the art (Kuchler
(1997) Biochemical Methods in Cell Culture and Virology, Dowden,
Hutchinson and Ross, Inc.).
[0262] Prokaryotes most frequently are represented by various strains of
E. coli; however, other microbial strains may also be used in the method
of the invention. Commonly used prokaryotic control sequences which are
defined herein to include promoters for transcription initiation,
optionally with an operator, along with ribosome binding sequences,
include such commonly used promoters as the beta lactamase
(penicillinase) and lactose (lac) promoter systems (Chang et al. (1977)
Nature 198:1056), the tryptophan (trp) promoter system (Goeddel et al.
(1980) Nucleic Acids Res. 8:4057) and the lambda derived P L promoter and
N-gene ribosome binding site (Shimatake et al. (1981) Nature 292:128).
The inclusion of selection markers in DNA vectors transfected in E coli.
is also useful. Examples of such markers include genes specifying
resistance to ampicillin, tetracycline, or chloramphenicol.
[0263] The vector is selected to allow introduction into the appropriate
host cell. Bacterial vectors are typically of plasmid or phage origin.
Appropriate bacterial cells are infected with phage vector particles or
transfected with naked phage vector DNA. If a plasmid vector is used, the
bacterial cells are transfected with the plasmid vector DNA. Expression
systems for expressing a protein of the present invention are available
using Bacillus sp. and Salmonella (Palva et al. (1983) Gene 22:229-235);
Mosbach et al. (1983) Nature 302:543-545).
[0264] With respect to fusion proteins, "operably linked" is intended to
mean a functional linkage between two or more elements or domains. If it
recognized that a linker of one or more amino acids may be inserted in
between each of the two or more elements to maintain the desired function
of the two or more elements.
[0265] In one embodiment of the invention, fusion proteins comprise a
repeat domain of the invention operably linked to at least one protein or
part or domain thereof. In certain embodiments of the invention, the
protein or part or domain thereof comprises a protein or functional part
or domain thereof, that is capable of modifying DNA or RNA. In other
embodiments, protein or functional part or domain thereof is capable of
functioning as a transcriptional activator or a transcriptional
repressor. Preferred proteins include, but are not limited to,
transcription activators, a transcription repressors, a
resistance-mediating proteins, nucleases, topoisomerases, ligases,
integrases, recombinases, resolvases, methylases, acetylases,
demethylases, and deacetylases.
[0266] The following examples are offered by way of illustration and not
by way of limitation.
EXAMPLES
Example 1
Identification of the Basis for DNA Specificity of TAL Effectors
[0267] The fact that AvrBs3 directly binds the UPA-box, a promoter element
in induced target genes (Kay et al. (2007) Science 318, 648-651; Romer et
al. (2007) Science 318:645-648), prompted us to investigate the basis for
DNA-sequence specificity. Each repeat region generally consists of 34
amino acid, and the repeat units are nearly identical; however, amino
acids 12 and 13 are hypervariable (Schornack et al. (2006) J. Plant
Physiol. 163:256-272; FIG. 1A). The most C-terminal repeat of AvrBs3
shows sequence similarity to other repeat units only in its first 20
amino acids and is therefore referred to as half repeat. The repeat units
can be classified into different repeat types based on their
hypervariable 12th and 13th amino acids (FIG. 1B). Because the size of
the UPA-box (18 (20)/19 (21) bp) almost corresponds to the number of
repeat units (17.5) in AvrBs3, we considered the possibility that one
repeat unit of AvrBs3 contacts one specific DNA base pair. When the
repeat types of AvrBs3 (amino acid 12 and 13 of each repeat) are
projected onto the UPA box, it becomes evident that certain repeat types
correlate with specific base pairs in the target DNA. For example, HD and
NI repeat units have a strong preference for C and A, respectively (FIG.
1B). For simplicity, we designate only bases in the upper (sense) DNA
strand. Our model of recognition specificity is supported by the fact
that the AvrBs3 repeat deletion derivative AvrBs3.DELTA.rep16 which lacks
four repeat units (.DELTA.11-14; FIG. 5A, B) recognizes a shorter and
different target DNA sequence (FIGS. 5 to 8). Based on sequence
comparisons of UPA-boxes of AvrBs3-induced pepper genes and mutational
analysis, the target DNA box of AvrBs3 appears to be 1 bp longer than the
number of repeat units in AvrBs3. In addition, a T is conserved at the 5'
end of the UPA box immediately preceding the predicted recognition
specificity of the first repeat (FIG. 1). Intriguingly, secondary
structure predictions of the protein region preceding the first repeat
and the repeat region show similarities, despite lack of amino
acid-sequence conservation. This suggests an additional repeat, termed
repeat 0 (FIG. 1B).
[0268] To further substantiate and extend our model (FIG. 1B), we
predicted the yet unknown target DNA sequences of Xanthomonas TAL
effectors based on the sequence of their repeat units, and inspected the
promoters of known TAL target genes and their alleles for the presence of
putative binding sites. We identified sequences matching the predicted
specificity in promoters of alleles that are induced in response to the
corresponding TAL effector, but not in non-induced alleles
[0269] (FIGS. 5C-F). The presence of these boxes suggests that the induced
genes are direct targets of the corresponding TAL effectors. Based on the
DNA base frequency for different repeat types in the target DNA sequences
using eight TAL effectors we deduced a code for the DNA target
specificity of certain repeat types (FIG. 1C, D; FIG. 5).
[0270] To experimentally validate our model we predicted target DNA
sequences for the TAL effectors Hax2 (21.5 repeat units), Hax3 (11.5
repeat units), and Hax4 (14.5 repeat units) from the
Brassicaceae-pathogen X. campestris pv. armoraciae (22). First, we
derived target DNA boxes for Hax3 and Hax4, because they exclusively
contain repeat-types present in AvrBs3 (amino acid 12/13: NI, HD, NG, NS;
FIG. 1A, FIG. 2A) for which DNA binding and gene activation have been
shown experimentally. The Hax3 and Hax4 target boxes were placed in front
of the minimal (-55 to +25) tomato Bs4 promoter, which has very weak
basal activity (Schornack et al. (2005) Mol. Plant-Microbe Interact.
18:1215-1225; FIG. 2B; FIG. 9), driving a promoterless uidA
(.beta.-glucuronidase, GUS) reporter gene. For transient expression
studies, we transfected the reporter constructs together with cauliflower
mosaic virus 35S-promoter driven effector genes hax3 and hax4 into
Nicotiana benthamiana leaves using Agrobacterium-mediated T-DNA delivery.
Qualitative and quantitative GUS assays demonstrated that promoters
containing the Hax3- or Hax4-box were strongly and specifically induced
in the presence of the corresponding effector (FIG. 2C). Likewise, we
addressed the importance of the first nucleotide (T) in the predicted
target DNA sequence of Hax3 and generated four different Hax3-boxes with
either A, C, G or T at the 5' end (FIG. 10A, B). Coexpression of hax3 and
the reporter constructs in N. benthamiana demonstrated that only a
promoter containing a Hax3-box with a 5' T was strongly induced in the
presence of Hax3 whereas the others led to weaker activation (FIG. 10C).
This indicates that position 0 contributes to promoter activation
specificity of Hax3 and likely other TAL effectors. To address the
possibility that some repeat types confer broader specificity, i.e.,
recognize more than one base, we permutated the Hax4-box (FIG. 3A, B).
Transient GUS assays showed that NI-, HD-, and NG-repeat units in Hax4
strongly favour recognition of the bases A, C, and T, respectively,
whereas NS-repeat units recognize all four bases (FIG. 3B; FIG. 11). As
several TAL effectors contain NN-repeat units (FIG. 5 and FIG. 15, Table
1), we generated ArtX1, an artificial TAL effector with NN-repeat units
and deduced a corresponding DNA recognition sequence using our code (FIG.
3C). Analysis of ArtX1-box derivatives demonstrated that NN-repeat units
recognize both A and G, with preference for G (FIG. 3C). This result
confirms our prediction of the natural AvrXa27-box in rice which contains
either an A or a G at positions corresponding to NN-repeat units (FIG.
5C). In addition, we derived two possible AvrXa10-boxes with either A or
G at positions corresponding to NN-repeat units in AvrXa10. Both reporter
constructs were induced efficiently by AvrXa10 (FIG. 12). Together, these
data strongly suggest that some repeat types recognize specific base
pairs whereas others are more flexible.
[0271] An exceptional TAL effector is Hax2 because it contains 35 amino
acids per repeat instead of the typical 34 amino acid-repeat units (Kay
et al. (2005) Mol. Plant-Microbe Interact. 18:838-848). In addition, Hax2
contains a rare amino acid combination in its second repeat (amino acids
12/13: IG; FIG. 2A). We permutated the corresponding third base of the
Hax2-box and analyzed reporter gene activation with the effector Hax2
using the transient assay. This showed that an IG repeat confers
specificity for T (FIG. 13). The Hax2-box only leads to promoter
activation by Hax2, but not by Hax3 or Hax4 (FIG. 2C). This demonstrates
that 35 amino acid-repeat units function like 34 amino acid-repeat units.
This is supported by the fact that the TAL effector AvrHah1 which
contains 35 amino acid repeat units, induces Bs3-mediated resistance
(Schornack et al. (2008) New Phytol. 179:546-556). The repeat types of
AvrHah1 match to the UPA-box in the Bs3 promoter (FIG. 5A, B).
[0272] Interestingly, the expression of hax2 in Arabidopsis thaliana leads
to purple coloured leaves, indicating an accumulation of anthocyanin
(FIGS. 14A, B). To identify Hax2 target genes we analyzed promoter
regions of the A. thaliana genome using pattern search (Patmatch, TAIR;
www.arabidopsis.org) with degenerated Hax2-box sequences. One of the
putative Hax2 target genes encodes the MYB transcription factor PAP1
(At1G56650) which controls anthocyanin biosynthesis (Borevitz et al.
(2000) Plant Cell 12:2383-2394). Semiquantitative analysis of the PAP1
transcript level demonstrated that expression of PAP1 is strongly induced
by Hax2 (FIG. 14C). Visual inspection of the PAP1 promoter region
revealed the presence of a suboptimal Hax2-box (FIGS. 14D, E). Based on
the code for TAL effector repeat types (FIG. 1D) and the data described
above we predicted putative target DNA sequences for additional TAL
effectors some of which are important virulence factors (FIG. 15, Table
1).
[0273] Because the repeat number in TAL effectors ranges from 1.5 to 28.5,
a key question is whether effectors with few repeat units can activate
gene expression. Therefore, we tested how the number of repeat units
influences target gene expression. For this, we constructed artificial
effectors containing the N- and C-terminal regions of Hax3 and a repeat
domain with 0.5 to 15.5 HD-repeat units (specificity for C). For
technical reasons, the first repeat in all cases was NI (specificity for
A). The corresponding target DNA box consists of 17 C-residues preceded
by TA (FIGS. 4A, B). Promoter activation by the artificial effectors was
measured using the transient Bs4-promoter GUS-assay in N. benthamiana.
While at least 6.5 repeat units were needed for gene induction, 10.5 or
more repeat units led to strong reporter gene activation (FIG. 4C). These
data demonstrate that a minimal number of repeat units is required to
recognize the artificial target DNA-box and activate gene expression. The
results also suggest that effectors with fewer repeat numbers are largely
inactive. We have shown that the repeat region of TAL effectors has a
sequential nature that corresponds to a consecutive target DNA sequence.
Hence, it should be feasible to generate effectors with novel DNA-binding
specificities. Three artificial effectors were generated (ArtX1, ArtX2,
ArtX3), each with randomly assembled 12.5 repeat units (FIGS. 3C, D), and
tested for induction of Bs4 promoter-reporter fusions containing
predicted target DNA-sequences. All three artificial effectors strongly
and specifically induced the GUS reporter only in presence of the
corresponding target DNA-box (FIG. 3E; FIG. 11). Our model for
recognition specificity of TAL effectors in which one repeat unit
contacts one base pair in the DNA via amino acids 12 and 13 of each
repeat enables to predict the binding specificity of TAL effectors and
identification of plant target genes. As many TAL effectors are major
virulence factors the knowledge of plant target genes will greatly
enhance our understanding of plant disease development caused by
xanthomonads. In addition, we successfully designed artificial effectors
that act as transcription factors with specific DNA-binding domains.
Previously, zinc finger transcription factors containing a tandem
arrangement of zinc finger units have been engineered to bind specific
target DNA sequences.
[0274] Similarly, TAL effectors have a linear DNA-binding specificity that
can easily be rearranged. It has not escaped our notice that the
postulated right-handed superhelical structure of the repeat regions in
TAL effectors immediately suggests a possible mechanism for interaction
with the right-handed helix of the genetic material. It will be important
to determine the structure of the novel DNA-binding domain of TAL
effectors complexed with target DNA.
[0275] The following paragraphs describe further embodiments of the
invention:
1) Prediction of DNA-Binding Specificities of Naturally Occurring
Avrbs3-Homologous Proteins and Generation of Resistant Plants.
[0276] The repeat units of the repeat domain of naturally occurring
effectors of the AvrBs3-family encode a corresponding DNA-binding
specificity. These recognition sequences can be predicted with the
recognition code.
[0277] The artificial insertion of the predicted recognition sequences in
front of a gene in transgenic plants leads to expression of the gene if
the corresponding AvrBs3-like effector is translocated into the plant
cell (e.g. during a bacterial infection).
[0278] If the recognition sequence is inserted in front of a gene whose
expression leads to a defence reaction (resistance-mediating gene) of the
plant, such constructed transgenic plants are resistant against an
infection of plant pathogenic bacteria which translocate the
corresponding effector.
2) The Identification of Plant Genes Whose Expression is Induced by a
Specific Effector of the AvrBs3-Family
[0279] The prediction of DNA target sequences of a corresponding effector
of the AvrBs3-family in the promoter region of plant genes is an
indication for the inducible expression of these genes by the effector.
Using the method according to the invention it is possible to predict
inducible plant genes. Predictions are particularly straightforward in
sequenced genomes.
3) Use of Other Effectors as Transcriptional Activators in Expression
Systems
[0280] Analogous to the use of Hax3 and Hax4, the predicted DNA binding
sequences of other members of the AvrBs3-family can be inserted into
promoters to generate new controllable promoters which can be induced by
the corresponding effector.
4) Construction of a Secondarily Inducible System
[0281] Two constructs are introduced into plants. First, a hax3 gene whose
expression is under control of an inducible promoter. Secondly, a target
gene that contains the Hax3-box in the promoter.
[0282] Induction of the expression of hax3 leads to production of the Hax3
protein that then induces the expression of the target gene. The
described two-component construction leads to a twofold expression switch
which allows a variable expression of the target gene. The
trans-activator and the target gene can also be present first in
different plant lines and can be introgressed at will. Analogous to this,
Hax4 and the corresponding Hax4-box can be used. This system can also be
used with other members of the AvrBs3-family or artificial derivatives
and predicted DNA-target sequences. The functionality of the system could
already be verified. Transgenic Arabidopsis thaliana plants were
constructed, which contain an inducible avrBs3 gene as well as a Bs3 gene
under control of its native promoter, whose expression can be induced by
AvrBs3. The induction of expression of avrBs3 leads to expression of Bs3
and therefore to cell death. See, WO 2009/042753, herein incorporated by
reference.
5) Construction of Disease-Resistant Plants
[0283] If the DNA target sequence of an AvrBs3-similar effector is
inserted in front of a gene whose expression leads to a defence reaction
(resistance-mediating gene) of the plant, correspondingly constructed
transgenic plants will be resistant against infection of plant pathogenic
organisms, which make this effector available. Such a
resistance-mediating gene can for example lead to a local cell death
which prevents spreading of the organisms/pathogens, or induce the basal
or systemic resistance of the plant cell.
6) Generation of Repeat Domains for the Detection of a Specific DNA
Sequence and Induction of Transcription of Following Genes
[0284] The modular architecture of the central repeat domain enables the
targeted construction of definite DNA binding specificities and with this
the induction of transcription of selected plant genes. The DNA binding
specificities can either be artificially inserted in front of target
genes so that novel effector-DNA-box variants are generated for the
inducible expression of target genes. Moreover, repeat domains can be
constructed that recognize a naturally occurring DNA sequence in
organisms. The advantage of this approach is that the expression of any
gene in non-transgenic organisms can be induced if a corresponding
effector of the invention is present in the cells of this organism.
[0285] Introduction of the effector can be done in different ways:
(1) transfer via bacteria with a protein transport system (e.g. type-III
secretion system); (2) cell-bombardment with an artificial
AvrBs3-protein; (3) transfer of a DNA-segment that leads to production of
the effector, via introgression, Agrobacterium, viral vectors or
cell-bombardment; or (4) other methods that result in uptake of the
effector protein by the target cell
[0286] The central repeat domain of effectors of the AvrBs3-family is a
new type of DNA binding domain (Kay et al., 2007). The decryption of the
specificity of the single repeat units now allows the targeted adaptation
of the DNA-binding specificity of this region. The DNA binding region can
be translationally fused to other functional domains to generate
sequence-specific effects. Below, four examples of such protein fusions
are given.
7) Construction of Transcriptional Activators for the Inducible Expression
of Genes in Cells of Living Organisms
[0287] The effectors of the AvrBs3-like family induce the expression of
genes in plant cells. For this, the C-terminus of the protein is
essential, which contains a transcriptional activation domain and nuclear
localization sequences that mediate the import of the protein into the
plant nucleus. The C-terminus of the AvrBs3-homologous protein can be
modified in such a way that it mediates the expression of genes in
fungal, animal, or human systems. Thereby, effectors can constructed that
function as transcriptional activators in humans, other animals, or
fungi. Thus, the methods according to the invention can be applied not
only to plants, but also to other living organisms.
8) Use of Effectors as Transcriptional Repressors
[0288] The DNA binding specificity of the repeat domain can be used
together with other domains in protein fusions to construct effectors
that act as specific repressors. These effectors exhibit a DNA binding
specificity that has been generated in such a way that they bind to
promoters of target genes. In contrast to the TAL effectors which are
transcription activators, these effectors are constructed to block the
expression of target genes. Like classical repressors, these effectors
are expected to cover promoter sequences by their recognition of, or
binding to, a target DNA sequence and make them inaccessible for factors
that otherwise control the expression of the target genes. Alternatively,
or in addition, the repeat domains can be fused to a
transcription-repressing domain, such as an EAR motif (Ohta et al. Plant
Cell 13:1959-1968 (2001)).
9) Use of Repeat Domains for Labelling and Isolation of Specific Sequences
[0289] The capability of a repeat domain to recognize a specific target
DNA sequence an be used together with other domains to label specific DNA
sequences. C-terminally a GFP ("green-fluorescent-protein") can for
example be fused to an artificial repeat domain that detects a desired
DNA sequence. This fusion protein binds in vivo and in vitro to a
corresponding DNA sequence. The position of this sequence on the
chromosome can be localized using the fused GFP-protein. In an analogous
way, other protein domains that enable a cellular localization of the
protein (e.g. by FISH) can be fused to a specific artificial repeat
domain which targets the protein to a corresponding DNA sequence in the
genome of the cell. In addition, the DNA recognition specificity of
repeat domains of the invention can be used to isolate specific DNA
sequences. For this, the AvrBs3-like protein can be immobilized to a
matrix and interacts with corresponding DNA molecules that contain a
matching sequence. Therefore, specific DNA sequences can be isolated from
a mixture of DNA molecules.
10) Use of Repeat Domains for the Endonucleolytic Cleavage of DNA
[0290] The DNA recognition specificity of the repeat domain can be fused
to a suitable restriction endonuclease to specifically cleave DNA.
Therefore, the sequence-specific binding of the repeat domain leads to
localization of the fusion protein to few specific sequences, so that the
endonuclease specifically cleaves the DNA at the desired location. By
means of the recognition of target DNA sequences, unspecific nucleases
such as FokI can be changed into specific endonucleases analogous to work
done with zinc finger nucleases. For example, the optimal distance
between the two effector DNA target sites would be determined to that
would be required to support dimerization of two FokI domains. This would
be accomplished by analysis of a collection of constructs in which the
two DNA binding sites are separated by differently sized spacer
sequences. Using this approach enables one to determine the distances
that allow nuclease-mediated DNA cleavage to occur and the functional
analysis of additional effector nucleases that target different DNA
sequences. In an alternative approach, a newly developed single-chain
FokI dimer (Mino et al. (2009) J Biotechnol 140:156-161) is employed. In
this approach two FokI catalytic domains are transcriptionally fused to a
single repeat domain of the invention. Thus, functionality of a
corresponding nuclease no longer relies on intermolecular dimerization of
two FokI domains that are located on two different proteins. This type of
construct has been used successfully in the context of zinc finger-based
DNA binding motifs. Moreover, these methods enable very specific cuts at
only a few positions in complex DNA-molecules. These methods can amongst
other things be used to introduce double-strand breaks in vivo and
selectively incorporate donor DNA at these positions. These methods can
also be used to specifically insert transgenes.
11) Construction of Repeat Domains with Custom-Designed Repeat Order
[0291] Due to the high similarity between the individual repeat units of a
repeat domain, construction of a custom DNA-binding polypeptide as
described above might not be feasible through methods involving
traditional cloning methods. As detailed in this example, a repeat domain
with a repeat unit order that matches a desired DNA-sequence in a
promoter of interest, such as the Bs4 promoter (FIGS. 17B, C), is
determined based on the recognition code of the present invention.
Generation of a specific 11.5 repeat unit order was accomplished using
"Golden gate" cloning (Engler et al. (2008) PLoS ONE 3:e3647). As
building blocks, we subcloned the N- and C-terminus of Hax3 as well as
the 12 individual repeat units resembling the 11.5 repeat units. Each
building block contained individual flanking BsaI sites (FIG. 18) that
allowed an ordered assembly of the fragments into a custom effector
polypeptide. The effector (ARTBs4) was correctly assembled from the total
of 14 fragments into a BsaI-compatible binary vector that allows
Agrobacterium-mediated expression of the custom effector polypeptide as
an N-terminally tagged GFP fusion in plant cells (FIG. 18).
12) Use of Effectors as Viral Repressors
[0292] The nucleotide binding specificity of the repeat domain can be used
to design effectors that disrupt viral replication in cells. These
effectors will exhibit a nucleotide binding specificity targeted to
nucleotide sequence in viral origins of replication and other sequences
critical to viral function. No additional protein domains need to be
fused to these repeat domain proteins in order to block viral function.
They act like classical repressors by covering origins of replication or
other key sequences, including promoters, enhancers, long terminal repeat
units, and internal ribosome entry sites, by binding and making them
inaccessible for host or viral factors, including viral encoded
RNA-dependent RNA polymerase, nucleocapsid proteins and integrases, which
participate in viral replication and function. This type of strategy has
been used successfully with zinc-finger proteins (Sera (2005) J. Vir.
79:2614-2619; Takenaka et al. (2007) Nucl Acids Symposium Series
51:429-430).
[0293] Summarizing, the present invention additionally covers isolated
nucleic acid molecules to be used in any of the methods of the present
invention, transformed plants comprising a heterologous polynucleotide
stably incorporated in their genome and comprising the nucleotide
molecule described above, preferably operably linked to a promoter
element and/or operably linked to a gene of interest. The transformed
plant is preferably a monocot or a dicot. The invention covers also seeds
of the transformed plants. The invention covers human and non-human host
cells transformed with any of the polynucleotides of the invention or the
polypeptides of the invention. The promoters used in combination with any
of the nucleotides and polypeptides of the invention are preferably
tissue specific promoters, chemical-inducible promoters and promoters
inducible by pathogens.
[0294] While the present invention can be used in animal and plant
systems, one preferred optional embodiment refers to the use in plant
systems. The term plant includes plant cells, plant protoplasts, plant
cell tissue cultures from which plants can be regenerated, plant calli,
plant clumps and plant cells that are intact in plants or parts of plants
such as embryos, pollen, ovules, seed, leaves, flowers, branches, fruits,
roots, root tips, anthers and the like. Progeny, variants, and mutants of
the regenerated plants are also included within the scope of the
invention, provided that these parts comprise the introduced
polynucleotides.
Materials and Methods
[0295] Bacterial strains and growth conditions. Escherichia coli were
cultivated at 37.degree. C. in lysogeny broth (LB) and Agrobacterium
tumefaciens GV3101 at 30.degree. C. in yeast extract broth (YEB)
supplemented with appropriate antibiotics.
[0296] Plant material and inoculations. Nicotiana benthamiana plants were
grown in the greenhouse (day and night temperatures of 23.degree. C. and
19.degree. C., respectively) with 16 h light and 40 to 60% humidity.
Mature leaves of five- to seven-week-old plants were inoculated with
Agrobacterium using a needleless syringe as described previously (S1).
Inoculated plants were transferred to a Percival growth chamber (Percival
Scientific) with 16 h light, 22.degree. C. and 18.degree. C. night
temperature.
[0297] Construction of artificial effectors. The construction of effectors
with modified repeat region was based on ligation of Esp3I (Fermentas)
restriction fragments. Esp3I cuts outside of its recognition sequence and
typically once per repeat. To construct a GATEWAY (Invitrogen)-compatible
ENTRY-vector for generation of effectors of the invention, the N- and
C-termini of hax3 were amplified by PCR using a proof reading polymerase
(HotStar HiFidelity Polymerase Kit; Qiagen), combined by SOE (splicing by
overlap extension)-PCR and inserted into pCR8/GW/TOPO resulting in a
hax3-derivative with 1.5 repeat units (pC3SE26; first repeat=NI; last
half repeat=NG). A 1 bp frame-shift preceding the start codon was
inserted by site-directed mutagenesis to allow in frame N-terminal
fusions using GATEWAY recombination (Invitrogen) resulting in pC3SEIF.
Single repeat units were amplified from TAL effectors using a forward
primer binding to most repeat units and repeat-specific reverse primers.
Both primers included the naturally present Esp3I sites. To avoid
amplification of more than one repeat, template DNA was digested with
Esp3I prior to the PCR reaction. PCR-products were digested with Esp3I
and cloned into Esp3I-digested pC3SE26 yielding Hax3-derivatives with 2.5
repeat units where a single repeat can be excised with Esp3I
(HD-repeat=repeat 5 of Hax3; NI-repeat=repeat 11 of Hax3;
NG-repeat=repeat 4 of Hax4; NN-repeat=G.sub.13N mutant of repeat 4 of
Hax4). The ArtHD effector backbone construct consists of the N- and
C-terminus of Hax3 with the last half repeat mutated into a HD-repeat.
The resulting construct was restricted by Esp3I and dephosphorylated. DNA
fragments encoding repeat units were excised with Esp3I from
pC3SE26-derivatives containing a single HD-repeat and purified via
agarose gels. Ligation was performed using a molar excess of insert to
vector to facilitate concatemer ligation and transformed into E. coli.
The number of repeat units was determined in recombinant plasmids using
StuI and HincII. ArtX1-3 effectors with a random combination of repeat
types were generated by isolating DNA fragments encoding repeat units as
described above from cloned single NI-, HD-, NN-, and NG-repeat units
(specificities for A, C, G/A, and T, respectively). The fragments were
added in equal molar amounts each to the concatemer ligation reaction
with vector pC3SEIF. Plasmids containing effectors of the invention with
12.5 repeat units were chosen for subsequent analysis. Effectors were
cloned by GATEWAY-recombination (Invitrogen) into pGWB6 (S2) for
expression of N-terminal GFP-effector fusions. Oligonucleotide sequences
are available upon request. All constructs were sequenced.
[0298] GUS reporter constructs. The minimal Bs4 promoter was amplified by
PCR and inserted into pENTR/D-TOPO (Invitrogen) with target DNA boxes at
the 5' end (S3; FIG. S5). Promoter derivatives were cloned into pGWB3
(S2) containing a promoterless uidA gene.
[0299] Construction of hax2-transgenic A. thaliana. hax2 was cloned under
control of the inducible alcA promoter from Aspergillus nidulans into a
GATEWAY-compatible derivative of the binary T-DNA vector binSRNACatN
(Zeneca Agrochemicals) containing the 35S-driven alcR ethanol-dependent
regulator gene and a nptII selection marker. AlcR drives
ethanol-dependent induction of the alcA promoter (S4). T-DNA containing
these genes was transformed into A. thaliana Col-0 via A. tumefaciens
using floral dip inoculation (S5). Transformants were selected as
kanamycin-resistant plants on sterile medium.
[0300] Construction of ARTBs4, an artificial effector. "Golden gate"
cloning (Engler et al. (2008) PLoS ONE 3:e3647) was used to assemble
effectors with 11.5 specifically ordered repeat units. The N- and
C-terminus of Hax3 and 12 individual repeat units resembling the 11.5
repeat units were subcloned. Each building block contained individual
flanking BsaI sites that allowed an ordered assembly of the fragments
into an artificial effector. For the targeted assembly of effectors with
any desired repeat composition, the building block repertoire of repeat
units was expanded. To allow for target specificity to any of the four
natural bases (A, C, G, and T) in DNA, four different repeat types were
chosen, based on the amino acids 12 and 13 per repeat unit. The four
repeat types and their specificities are: NI=A; HD=C; NG=T, NN=G or A. To
generate a universally applicable assembly kit, four units corresponding
to each of the four repeat unit types were cloned with flanking BsaI
sites for each of the 12 repeat positions. The sum of 48 building blocks
resembles a library that can be used to assemble effectors with 11.5
repeat units with any composition of the four repeat unit types.
[0301] .beta.-Glucuronidase (GUS) assays. For transient GUS assays
Agrobacterium strains delivering effector constructs and GUS reporter
constructs were mixed 1:1, and inoculated into Nicotiana benthamiana
leaves with an OD.sub.600 of 0.8. Two leaf discs (0.9 cm diameter) were
sampled two days post infiltration (dpi) and quantitative GUS activity
was determined using 4-methyl-umbelliferyl-.beta.-D-glucuronide (MUG), as
described previously (Si). Proteins were quantified using Bradford assays
(BioRad). Data correspond to triplicate samples from different plants.
For qualitative GUS assays, leaf discs were sampled 2 dpi, incubated in
X-Gluc (5-bromo-4-chloro-3-indolyl-.beta.-D-glucuronide) staining
solution (S3), destained in ethanol, and dried. Experiments were
performed at least twice with similar results.
[0302] Expression of hax2, hax3, and hax4. hax2, hax3, and hax4 were
expressed in planta under control of the constitutive cauliflower mosaic
virus 35S promoter using pAGH2, pAGH3, and pAGH4 (S6).
[0303] DNaseI footprinting. DNaseI footprinting was performed as described
(S7) with the following modifications: Fluorescently labeled PCR products
of Bs3 and Bs3-E promoter DNA were generated using plasmids
pCRBluntII-TOPO::FPBs3 (Bs3 promoter fragment from -211 to +108) and
pCRBluntII-TOPO::FPBs3-E (Bs3-E promoter fragment from -224 to +108),
respectively, as template and Phusion DNA polymerase (Finnzymes).
Fluorescently labeled PCR product of UPA20-ubm-r16 promoter DNA was
generated using plasmid pCRBluntII-TOPO::FPU20-ubm-r16 (UPA20 promoter
fragment from -213 to +86 containing the ubm-r16 mutation (S7) as
template and Phusion DNA polymerase (Finnzymes). Plasmids
pCRBluntII-TOPO::FPBs3, pCRBluntII-TOPO::FPBs3-E and
pCRBluntII-TOPO::FPU20-ubm-r16 were sequenced, using the Thermo Sequenase
Dye Primer Manual Cycle Sequencing Kit (USB) according to the
manufacturer's instructions. An internal Gene Scan-500 LIZ Size Standard
(Applied Biosystems) was used to determine the DNA fragment size.
Example 2
Identification of a TAL Repeat Unit That Binds to G Nucleotides
[0304] The DNA binding domain of TAL effectors is composed of
tandem-arranged 34-amino acid repeat units. The amino acid sequences of
the repeat units are mostly conserved, except for two adjacent highly
variable residues (HVRs) at positions 12 and 13 that define DNA target
specificity (Boch et al. (2009) Science 326:1509-1512; Moscou & Bogdanove
(2009) Science 326:1501). Functional analysis identified HVR motifs that
bind preferentially to A (NI), C(HD), T (NG, IG) or equally well to G and
A (NN) (Boch et al. (2009) Science 326:1509-1512). Bioinformatic analysis
revealed HVRs that in the given promoter-TAL effector interactions match
specifically to G (Moscou & Bogdanove (2009) Science 326:1501). However
this, analysis was based on a single (HN & NA) or two (NK) interaction
sites. In our view the number of interaction sites is too low to make
reliable conclusions on the HVR specificity. Yet, these HVRs can be
considered as suitable candidates that may mediate specific binding to G.
[0305] In order to clarify the target specificity of HVRs with unknown
specificity we made use of the well-characterized interaction between
AvrBs3 and the UPA box in the Bs3 promoter. Using site directed
mutagenesis we replaced the HVR NI in the 5.sup.th and the 6.sup.th
repeat unit by NK resulting in AvrBs3-NK.sub.5/6. In the wildtype Bs3
promoter the NI residues of the 5.sup.th and the 6.sup.th repeat both
match to A nucleotides. Using site-directed mutagenesis we replaced the
two A nucleotides in the Bs3 promoter by two C, G and T nucleotides. The
wildtype Bs3 promoter and the three promoter mutants were fused to an
uidA reporter gene and tested via Agrobacterium tumefaciens transient
expression in combination with either wildtype AvrBs3 or
AvrBs3-NK.sub.5/6 in Nicotiana benthamiana leaves. GUS assays revealed
that AvrBs3-NK.sub.5/6 activated the GUS reporter only in combination
with the "GG" Bs3 promoter mutant while AvrBs3 activated only the Bs3
wildtype promoter construct.
[0306] Our analysis suggests that NK pairs specifically to G and thus
provides an option to generate more specific repeat arrays and also to
specifically target G-rich target sequences.
Example 3
Method for Generation of Designer Effectors via Golden Gate Cloning
[0307] The DNA binding domain of TAL effectors is composed of
tandem-arranged 34-amino acid repeat units. The amino acid sequences of
the repeat units are mostly conserved, except for two adjacent highly
variable residues (HVRs) at positions 12 and 13 that define DNA target
specificity (Boch et al. (2009) Science 326:1509-1512; Moscou & Bogdanove
(2009) Science 326:1501). Different HVR motifs bind with different levels
of specificity to individual A, C, G or T nucleotides. Importantly,
statistical analysis suggests that tandem arranged repeat units do not to
interfere with the specificity of adjacent units (Moscou & Bogdanove
(2009) Science 326:1501). Thus modular assembly of repeat units with
pre-characterized specificities is likely to provide an efficient way for
generation of DNA-recognition modules with desired DNA specificity.
[0308] However, the generation of DNA constructs that encode desired
repeat domains is challenging due to the fact that the repeat units are
almost identical. In the past we have used chemical synthesis to generate
effectors genes that encode 17.5 repeat units with the desired HVR
composition. To maximize the differences between repeat units at the DNA
level we exploited the degeneracy of the genetic code. The
codon-optimized sequence of the 17.5 repeat unit encoding DNA sequence
was, in contrast to the corresponding TAL effector wildtype gene,
PCR-amplifiable and amenable to PCR-based mutagenesis. Our findings also
demonstrate that chemical synthesis of effector repeat domains is
generally feasible. However, chemical synthesis does not allow rapid and
cost-efficient generation of multiple effectors with desired HVR
composition. Furthermore this approach will most likely not allow
generation of repeat domains with 20 or more repeat units.
[0309] The recently developed "Golden-Gate cloning" provides an
alternative approach for generation of repeat unit arrays of desired
composition. The strategy is based on the use of type IIS restriction
enzymes, which cut outside of their recognition sequence. We will work
with the type IIS enzyme BsaI, which creates a 4-bp sticky end. Due to
the fact, that recognition and cleavage site are separated in type IIS
enzymes we can generate by BsaI restriction in principle 256 (4.sup.4)
different sticky ends which provides the basis for multi fragment
ligations. With proper design of the cleavage sites, two or more
fragments cut by type IIS restriction enzymes can be ligated into a
product lacking the original restriction site (Engler et al. (2008) PLoS
ONE 3:e3647; Engler et al. (2009) PLoS ONE 4:e5553).
[0310] However in practice there are two limitations to this method. Due
to exonuclease activity in some reactions, single stranded overhanging
DNA sticky ends are reduced from four to three bases, effectively making
the number of compatible sticky ends only 16 (2.sup.4). Secondly, the
efficiency of the ligation reactions decreases precipitously with large
numbers of inserts, such as would be needed to create an effector with
17.5 repeat units as typically found in naturally occurring functional
TAL effectors. To circumvent these limitations, we have designed a
two-stage ligation process that allows the effective production of
effectors of 20, 30, 40 or more repeat units.
[0311] The basis for our "repeat-array building kit" is a set of "insert
plasmids" that contain individual repeat units (one repeat unit per
plasmid), "intermediate vectors" that contain repeat domains consisting
of sets of 10 repeat units, and one "acceptor vector" that contains the
N- and C-terminal non-repeat region of a TAL effector. All repeat units
are designed in such a way that the BsaI recognition sites flank the
insert in the insert plasmids.
[0312] To simplify the explanation of the multi-fragment ligation we
define herein the different ends of the repeat unit genes with upper case
letters (instead of the sequence overhang of the sticky end) and indicate
their orientation (N- or C-terminus of the repeat unit) with N or C in
square brackets (e.g. A[C]). The insert plasmid containing the 1.sup.st
repeat unit gene is designed in such a way that BsaI treatment creates
A[N] and B[C] termini. The 2.sup.nd repeatunit gene has B[N] and C[C]
termini upon BsaI cleavage, while BsaI cleavage of the insert plasmid
with the 3.sup.rd repeat unit gene results in C[N] and D[C] termini, and
so on. Since only compatible ends can be fused, the B[C] terminus of the
1.sup.st repeat unit gene will fuse specifically to the B[N] terminus of
the 2.sup.nd repeat unit gene. Similarly the C[C] terminus of the
2.sup.nd repeat unit gene will ligate specifically to the C[N] terminus
of the 3.sup.rd repeat unit gene and so on.
[0313] BsaI digestion releases the repeat units with 4-bp sticky overhangs
that are compatible only with the designed adjacent repeat units. The
BsaI recognition site itself remains in the cleaved insert plasmid vector
and the released insert has no BsaI recognition site. The repeat units
are joined together in the order specified by the overhanging ends in a
cut-ligation reaction (cleavage and ligation running simultaneously). Due
to the simultaneous action of BsaI and ligase the religation of repeat
units into the insert donor vector is avoided since this restores the
BsaI recognition site. By contrast the desired ligation products lack the
BsaI recognition sites. This experimental design makes this cloning
procedure highly efficient.
[0314] To generate effectors that are designed to recognize specific base
sequences, four variants are made for each repeat unit position. These
variants are individual repeat units with specific nucleotide recognition
specificity, (e.g. HD residues at position 12 and 13 for recognition of a
C base, NI for A, and so on). The variant for each position is made with
the appropriate sticky ends for each repeat unit, for example A[N] and
B[C] termini for repeat unit 1, such that there are four possible insert
plasmids for repeat unit one, chosen based on the desired DNA
recognition. There are four variants for repeat unit 2, with different
nucleotide recognition specificity and B[N] and C[C] termini, and so on
for each repeat position
[0315] Ligations are carried out in two stages. In the first stage, 10
repeat units are combined into an intermediate vector. Different sets of
10 repeat units can be combined in intermediate vectors. Intermediate
vector 1 contains repeat units 1-10, intermediate vector 2 contains
repeat units 11-20 and so on. In the second stage, separately assembled
10 repeat units are combined into acceptor vectors. The acceptor vector
also contains the N- and C-terminal non-repeat areas of the effector,
such that a complete effector comprised of 10, 20, 30 40 or other
multiples of 10 repeat units is assembled in the final construct. The
intermediate vector has BsaI sites in the insert for introducing the 10
repeat unit fragments and also has flanking BpiI sites in the flanking
vector sequence. BpiI is another type IIS enzyme with a recognition site
distinct from BsaI. Using BsaI, the 10 repeat units are first assembled
into the "intermediate vector" and using BpiI the assembled 10 mers are
released as one fragment. This fragment is ligated in a BpiI cut-ligase
reaction with the acceptor vector, which contains BpiI sites between the
N- and C-terminal non-repeat areas of the TAL effector. In this case only
2-4 inserts are ligated into the acceptor vector. This allows to make
each ligation highly specific and to assemble easily 40 and more repeat
units.
[0316] The acceptor vector in which the repeat unit array is finally
cloned, represents a GATEWAY Entry clone and thus allows
recombination-based transfer of the effector into any desired expression
construct. Currently the acceptor vector is designed to generate a
TAL-type transcription factor. However, with few modifications the
acceptor vector allows also fusions of the repeat array to the FokI
endonuclease or other desired functional domains.
[0317] A schematic of this method is provided in FIG. 19A-D.
Example 4
Production and Testing of Target DNA-Specific Nucleases
[0318] Fusion proteins comprising a repeat domain of the invention that
recognizes a target DNA sequence and a FokI nuclease
("TAL-type-nucleases") are produced as described by any of the method
disclosed herein or known in the art. The fusion proteins are tested for
nuclease activity by incubation with corresponding target DNA. The repeat
domain DNA target site is cloned into the multiple cloning site of a
plasmid vector (e.g., bluescript). As negative controls, either an "empty
vector" that contains no TAL-nuclease target site or cloned target sites
with mutations are used. Before treatment of the DNA substrate with the
TAL-type nuclease, the vector is linearized by treatment with a suitable
standard endonuclease that cleaves in the vector backbone. This
linearized vector is incubated with in vitro generated repeat domain-FokI
nuclease fusion proteins and the products analyzed by agarose gel
electrophoresis. The detection of two DNA fragments in gel
electrophoresis is indicative for specific nuclease mediated cleavage. By
contrast, the negative controls that do not contain a target site that is
recognized by repeat domain are unaffected by treatment with the repeat
domain-FokI nuclease fusion protein. DNA-driven, cell-free systems for in
vitro gene expression and protein synthesis are used to generate repeat
domain-FokI nuclease fusion proteins (e.g. T7 High-Yield Protein
Expression System; Promega). To use such systems, repeat domain-FokI
nuclease fusion protein nucleotide sequences are cloned in front of a T7
RNA polymerase. Such fusion proteins that are produced via in vitro
transcription and translation are used in DNA cleavage assays without
further purification.
Example 5
Determination of Additional Recognition Specificites
[0319] Further experiments were conducted essentially as described
hereinabove to determine the recognition specificities of additional
amino acid pairs in the hypervariable region. DNA binding domains were
constructed using Golden Gate Cloning as described in Example 3. The
experiments conducted and the experimental results obtained are provided
in FIGS. 20-27 and their respective figure legends.
[0320] From these experiments, the recognition specificity for the amino
acids found at positions 12 and 13 in a repeat unit and the base pair in
the target DNA sequence were determined for the following amino acid
pairs: [0321] NH for recognition of G/C [0322] NP for recognition of
A/T or C/G or T/A [0323] NT for recognition of A/T or G/C [0324] HN for
recognition of A/T or G/C [0325] SH for recognition of G/C [0326] SN for
recognition of G/C and [0327] IS for recognition of A/T.
[0328] It is recognized that the recognition specificities set forth in
this Example can be used in the methods of the present invention. It is
further recognized that the recognition specificities set forth in this
Example can be used to produce compositions of the present invention,
such as, for example, polypeptides and DNA. Preferably, the recognition
specificities set forth in this Example are used in such methods or to
produce such compositions in combination with any of the other
recognition specificities disclosed herein.
[0329] The article "a" and "an" are used herein to refer to one or more
than one (i.e., to at least one) of the grammatical object of the
article. By way of example, "an element" means one or more element.
[0330] Throughout the specification the word "comprising," or variations
such as "comprises" or "comprising," will be understood to imply the
inclusion of a stated element, integer or step, or group of elements,
integers or steps, but not the exclusion of any other element, integer or
step, or group of elements, integers or steps.
[0331] All publications and patent applications mentioned in the
specification are indicative of the level of those skilled in the art to
which this invention pertains. All publications and patent applications
are herein incorporated by reference to the same extent as if each
individual publication or patent application was specifically and
individually indicated to be incorporated by reference. Additionally,
each of the following patent applications is hereby herein incorporated
referenced in its entirety: DE 10 2009 004 659.3 filed Jan. 12, 2009, EP
09165328 filed Jul. 13, 2009, and U.S. 61/225,043 filed Jul. 13, 2009.
[0332] Although the foregoing invention has been described in some detail
by way of illustration and example for purposes of clarity of
understanding, it will be obvious that certain changes and modifications
may be practiced within the scope of the appended claims.
Sequence CWU
1
113134PRTXanthomonas campestris pv. vesicatoria 1Leu Thr Pro Glu Gln Val
Val Ala Ile Ala Ser His Asp Gly Gly Lys1 5
10 15Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
Leu Cys Gln Ala 20 25 30His
Gly219DNAArtificial Sequencepredicted 2tatataaacc tnnccctct
19323DNAArtificial Sequencepredicted
binding domain sequence 3tgttattctc acactctcct tat
23413DNAArtificial Sequencepredicted binding domain
sequence 4tacacccaaa cat
13516DNAArtificial Sequencepredicted binding domain sequence
5tacctaaact aaatat
16616DNAArtificial Sequenceconstructed binding domain sequence
6taccaaaaca aaaaaa
16716DNAArtificial Sequenceconstructed binding domain sequence
7tacccaaacc aaacac
16816DNAArtificial Sequenceconstructed binding domain sequence
8taccgaaacg aaagag
16916DNAArtificial Sequenceconstructed binding domain sequence
9taaataaaat aaatat
161016DNAArtificial Sequenceconstructed binding domain sequence
10taggtaaagt aaatat
161116DNAArtificial Sequenceconstructed binding domain sequence
11tatttaaatt aaatat
161216DNAArtificial Sequenceconstructed binding domain sequence
12tccctaacct cactct
161316DNAArtificial Sequenceconstructed binding domain sequence
13tgcctaagct gagtgt
161416DNAArtificial Sequenceconstructed binding domain sequence
14ttcctaatct tatttt
161516DNAArtificial Sequenceconstructed binding domain sequence
15tacctccact acatat
161616DNAArtificial Sequenceconstructed binding domain sequence
16tacctggact agatat
161716DNAArtificial Sequenceconstructed binding domain sequence
17tacctttact atatat
161814DNAArtificial Sequencepredicted binding domain sequence
18tattctggga cgtt
141914DNAArtificial Sequenceconstructed binding domain sequence
19tattctaaaa catt
142014DNAArtificial Sequenceconstructed binding domain sequence
20tattctccca cctt
142114DNAArtificial Sequenceconstructed binding domain sequence
21tattctttta cttt
142214DNAArtificial Sequencepredicted binding domain sequence
22tatgcggtcc ctct
142314DNAArtificial Sequencepredicted binding domain sequence
23tatgggtgcc ctat
142419DNAArtificial Sequencepredicted binding domain sequence
24tacccccccc ccccccccc
192519DNACapsicum annuum 25tatataaacc taaccatcc
192619DNACapsicum annuum 26tatataaacc tctctattc
192718DNAOryza sativa
27tagaagaaga gacccata
182818DNAOryza sativa 28tagaagagac caatagag
182925DNAOryza sativa 29tgcatctccc cctactgtac accac
253025DNAOryza sativa
30gatatgtccc cctccaacta tataa
253124DNAOryza sativa 31tataaaaggc cctcaccaac ccat
243223DNAOryza sativa 32tataatcccc aaatcccctc ctc
233337DNACapsicum annuum
33ttttattata taaacctaac catcctcaca acttcaa
373436DNACapsicum annuum 34gttgtgagga tggttaggtt tatataataa aattgg
363530DNACapsicum annuum 35tttattatat aaacctctct
attccactaa 303632DNACapsicum annuum
36gtggaataga gaggtttata taataaaatt gg
323735DNAArtificial Sequenceconstructed binding domain sequence
37catctttata taaacctctc cctttgtgac attct
353834DNAArtificial Sequenceconstructed binding domain sequence
38gtcacaaagg gagaggttta tataaagatg aaga
343931DNAArtificial Sequenceconstructed binding domain sequence
39catctttata taaacctctc cctttgtgac a
314032DNAArtificial Sequenceconstructed binding domain sequence
40cacaaaggga gaggtttata taaagatgaa ga
324119DNAArtificial Sequenceconstructed binding domain sequence
41tatataaacc tctcccttt
194244DNACapsicum annuum 42caattttatt atataaacct aaccatcctc acaacttcaa
gtta 444344DNACapsicum annuum 43ttgaagttgt
gaggatggtt aggtttatat aataaaattg gtca
444444DNACapsicum annuum 44ccaattttat tatataaacc tctctattcc actaaaccat
cctc 444546DNACapsicum annuum 45gatggtttag
tggaatagag aggtttatat aataaaattg gtcagg
464643DNAArtificial Sequenceconstructed binding domain sequence
46tcttcatctt tatataaacc tctccctttg tgacattctg aga
434744DNAArtificial Sequenceconstructed binding domain sequence
47cagaatgtca caaagggaga ggtttatata aagatgaaga gaga
444820DNAArtificial Sequenceconstructed binding domain sequence
48ccgcggccgc ccccttcacc
204975DNASolanum lycopersicum 49ttctttcttg tatataactt tgtccaaaat
atcatcaatt gatctcatcc atacaattta 60tttttaatcg aatct
755025DNAArtificial Sequencesequence
generated during cloning 50tctagaccca agggtgggcg cgccg
255113DNAArtificial Sequenceconstructed binding
domain sequence 51aacacccaaa cat
135213DNAArtificial Sequenceconstructed binding domain
sequence 52cacacccaaa cat
135313DNAArtificial Sequenceconstructed binding domain sequence
53gacacccaaa cat
135417DNAArtificial Sequencepredicted binding domain sequence
54tatataaaca catatct
175517DNAArtificial Sequencepredicted binding domain sequence
55tatataagca cgtatct
175623DNAArtificial Sequenceconstructed binding domain sequence
56tgatattctc acactctcct tat
235723DNAArtificial Sequenceconstructed binding domain sequence
57tgctattctc acactctcct tat
235823DNAArtificial Sequenceconstructed binding domain sequence
58tggtattctc acactctcct tat
2359150DNAArabidopsis thaliana 59tgtttttata aattttctca catactcaca
ctctctataa gacctccaat catttgtgaa 60accatactat atataccctc ttccttgacc
aatttactta taccttttac aatttgttta 120tatattttac gtatctatct ttgttccatg
1506019DNAArtificial Sequencepredicted
binding domain sequence 60tctntaaacc tnnccctct
196115DNAArtificial Sequencepredicted binding
domain sequence 61trtaaacctr accct
156223DNAArtificial Sequencepredicted binding domain
sequence 62tgttattctc acactctcct tat
236313DNAArtificial Sequencepredicted binding domain sequence
63tacacccnnn cat
136416DNAArtificial Sequencepredicted binding domain sequence
64tacctnnact anatat
166517DNAArtificial Sequencepredicted binding domain sequence
65tananaarca crnntct
176618DNAArtificial Sequencepredicted binding domain sequence
66tarntnrrra ranccatt
186725DNAArtificial Sequencepredicted binding domain sequence
67trcanctncc attactrtaa aannn
256824DNAArtificial Sequencepredicted binding domain sequence
68tanarrrrrc acncannaan cnnt
246923DNAArtificial Sequencepredicted binding domain sequence
69tataanrccn aaatcnrnrc ctn
237019DNAArtificial Sequencepredicted binding domain sequence
70tataattant antccnctt
197119DNAArtificial Sequencepredicted binding domain sequence
71tataaacctc ttttncctt
197217DNAArtificial Sequencepredicted binding domain sequence
72tatacacctc ttttact
177325DNAArtificial Sequencepredicted binding domain sequence
73tacacacctc ctaccacctc tactt
257419DNAArtificial Sequencepredicted binding domain sequence
74tataaatctc ttttncctt
197519DNAArtificial Sequencepredicted binding domain sequence
75tctctatctc aaccccttt
197619DNAArtificial Sequencepredicted binding domain sequence
76tctccatata actcccttt
197716DNAArtificial Sequencepredicted binding domain sequence
77tacacatnan accact
167815DNAArtificial Sequencepredicted binding domain sequence
78tcatccacan cccrt
157915DNAArtificial Sequencepredicted binding domain sequence
79taccacatar cattr
158014DNAArtificial Sequencepredicted binding domain sequence
80taaracnnrt crat
148110DNAArtificial Sequencepredicted binding domain sequence
81tcccttrcct
108227DNAArtificial Sequencepredicted binding domain sequence
82tanaancrcc cnnnccnnrr atrannn
278325DNAArtificial Sequencepredicted binding domain sequence
83trcntcrtac ncrcrcrrrr rrrct
258418DNAArtificial Sequencepredicted binding domain sequence
84tananaccna cacnacct
188521DNAArtificial Sequencepredicted binding domain sequence
85tatrtntara rarntnratn t
218617DNAArtificial Sequencepredicted binding domain sequence
86tacacacctc ttttaat
178720DNAArtificial Sequencepredicted binding domain sequence
87tanaancrcc cntnccnnrt
208817DNAArtificial Sequencepredicted binding domain sequence
88tacacatctt taaaact
178928DNAArtificial Sequencepredicted binding domain sequence
89tananrtrnn nrnncncccn ncncccct
289019DNAArtificial Sequencepredicted binding domain sequence
90tanaaacctc ttttncctt
199123DNAArtificial Sequencepredicted binding domain sequence
91tanarrarca cnnncrctcc ctt
239230DNAArtificial Sequencepredicted binding domain sequence
92tananaaacr ccctctaccr narrtrcnnn
309316DNAArtificial Sequencepredicted binding domain sequence
93tatrtntara racnnt
169417DNAArtificial Sequencepredicted binding domain sequence
94tarraaacnn rrraanc
179517DNAArtificial Sequencepredicted binding domain sequence
95tancnnrcnt rrcctct
179621DNAArtificial Sequencepredicted binding domain sequence
96tananrtrnn nrnnancacc t
219719DNAArtificial Sequencepredicted binding domain sequence
97tanaaarcnr nrcracrnt
199821DNAArtificial Sequencepredicted binding domain sequence
98tannnncntc rtntcnccar t
219919DNAArtificial Sequencepredicted binding domain sequence
99tanaaarcnr nrcracrnt
1910021DNAArtificial Sequencepredicted binding domain sequence
100tannnncntc rtntcnccar t
2110121DNAArtificial Sequencepredicted binding domain sequence
101tccctnrccn aarcnncact t
2110228DNAArtificial Sequencepredicted binding domain sequence
102tccrrttcnn ctncccnrar cnncnrnt
2810314DNAArtificial Sequencepredicted binding domain sequence
103tarannrncn ccct
1410425DNAArtificial Sequencepredicted binding domain sequence
104trcntcrnac ncrcrcrrrr rrrct
2510522DNAArtificial Sequencepredicted binding domain sequence
105trcccaarac ccnrrcnrcn nn
2210619DNAArtificial Sequencepredicted binding domain sequence
106tanaaarcnr nrcracrnt
1910718DNAArtificial Sequencepredicted binding domain sequence
107tncatattcr atcrnrtr
1810821DNAArtificial Sequencepredicted binding domain sequence
108tncatataat tcratcrnrt r
2110920DNAArtificial Sequencepredicted binding domain sequence
109tataacaccc tcnacatant
201101164PRTXanthomonas campestris pv.
vesicatoriaN-terminus(1)..(288)Repeat 1(289)..(322)Repeat
2(323)..(356)Repeat 3(357)..(390)Repeat 4(391)..(424)Repeat
5(425)..(458)Repeat 6(459)..(492)Repeat 7(493)..(526)Repeat
8(527)..(560)Repeat 9(561)..(594)Repeat 10(595)..(628)Repeat
11(629)..(662)Repeat 12(663)..(696)Repeat 13(697)..(730)Repeat
14(731)..(764)Repeat 15(765)..(798)Repeat 16(799)..(832)Repeat
17(833)..(866)Repeat 17.5(867)..(886)C-terminus(887)..(1164) 110Met Asp
Pro Ile Arg Ser Arg Thr Pro Ser Pro Ala Arg Glu Leu Leu1 5
10 15Pro Gly Pro Gln Pro Asp Gly Val
Gln Pro Thr Ala Asp Arg Gly Val 20 25
30Ser Pro Pro Ala Gly Gly Pro Leu Asp Gly Leu Pro Ala Arg Arg
Thr 35 40 45Met Ser Arg Thr Arg
Leu Pro Ser Pro Pro Ala Pro Ser Pro Ala Phe 50 55
60Ser Ala Gly Ser Phe Ser Asp Leu Leu Arg Gln Phe Asp Pro
Ser Leu65 70 75 80Phe
Asn Thr Ser Leu Phe Asp Ser Leu Pro Pro Phe Gly Ala His His
85 90 95Thr Glu Ala Ala Thr Gly Glu
Trp Asp Glu Val Gln Ser Gly Leu Arg 100 105
110Ala Ala Asp Ala Pro Pro Pro Thr Met Arg Val Ala Val Thr
Ala Ala 115 120 125Arg Pro Pro Arg
Ala Lys Pro Ala Pro Arg Arg Arg Ala Ala Gln Pro 130
135 140Ser Asp Ala Ser Pro Ala Ala Gln Val Asp Leu Arg
Thr Leu Gly Tyr145 150 155
160Ser Gln Gln Gln Gln Glu Lys Ile Lys Pro Lys Val Arg Ser Thr Val
165 170 175Ala Gln His His Glu
Ala Leu Val Gly His Gly Phe Thr His Ala His 180
185 190Ile Val Ala Leu Ser Gln His Pro Ala Ala Leu Gly
Thr Val Ala Val 195 200 205Lys Tyr
Gln Asp Met Ile Ala Ala Leu Pro Glu Ala Thr His Glu Ala 210
215 220Ile Val Gly Val Gly Lys Gln Trp Ser Gly Ala
Arg Ala Leu Glu Ala225 230 235
240Leu Leu Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Leu Asp
245 250 255Thr Gly Gln Leu
Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 260
265 270Glu Ala Val His Ala Trp Arg Asn Ala Leu Thr
Gly Ala Pro Leu Asn 275 280 285Leu
Thr Pro Glu Gln Val Val Ala Ile Ala Ser His Asp Gly Gly Lys 290
295 300Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
Pro Val Leu Cys Gln Ala305 310 315
320His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly
Gly 325 330 335Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys 340
345 350Gln Ala His Gly Leu Thr Pro Gln Gln Val
Val Ala Ile Ala Ser Asn 355 360
365Ser Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val 370
375 380Leu Cys Gln Ala His Gly Leu Thr
Pro Glu Gln Val Val Ala Ile Ala385 390
395 400Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val
Gln Arg Leu Leu 405 410
415Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Glu Gln Val Val Ala
420 425 430Ile Ala Ser Asn Ile Gly
Gly Lys Gln Ala Leu Glu Thr Val Gln Ala 435 440
445Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Glu
Gln Val 450 455 460Val Ala Ile Ala Ser
Asn Ile Gly Gly Lys Gln Ala Leu Glu Thr Val465 470
475 480Gln Ala Leu Leu Pro Val Leu Cys Gln Ala
His Gly Leu Thr Pro Glu 485 490
495Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys Gln Ala Leu Glu
500 505 510Thr Val Gln Ala Leu
Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr 515
520 525Pro Glu Gln Val Val Ala Ile Ala Ser His Asp Gly
Gly Lys Gln Ala 530 535 540Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly545
550 555 560Leu Thr Pro Glu Gln Val Val
Ala Ile Ala Ser His Asp Gly Gly Lys 565
570 575Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
Leu Cys Gln Ala 580 585 590His
Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly 595
600 605Gly Lys Gln Ala Leu Glu Thr Val Gln
Arg Leu Leu Pro Val Leu Cys 610 615
620Gln Ala His Gly Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn625
630 635 640Ser Gly Gly Lys
Gln Ala Leu Glu Thr Val Gln Ala Leu Leu Pro Val 645
650 655Leu Cys Gln Ala His Gly Leu Thr Pro Glu
Gln Val Val Ala Ile Ala 660 665
670Ser Asn Ser Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
675 680 685Pro Val Leu Cys Gln Ala His
Gly Leu Thr Pro Glu Gln Val Val Ala 690 695
700Ile Ala Ser His Asp Gly Gly Lys Gln Ala Leu Glu Thr Val Gln
Arg705 710 715 720Leu Leu
Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Glu Gln Val
725 730 735Val Ala Ile Ala Ser His Asp
Gly Gly Lys Gln Ala Leu Glu Thr Val 740 745
750Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr
Pro Glu 755 760 765Gln Val Val Ala
Ile Ala Ser His Asp Gly Gly Lys Gln Ala Leu Glu 770
775 780Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
His Gly Leu Thr785 790 795
800Pro Gln Gln Val Val Ala Ile Ala Ser Asn Gly Gly Gly Arg Pro Ala
805 810 815Leu Glu Thr Val Gln
Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly 820
825 830Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser His
Asp Gly Gly Lys 835 840 845Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala 850
855 860His Gly Leu Thr Pro Gln Gln Val Val Ala Ile
Ala Ser Asn Gly Gly865 870 875
880Gly Arg Pro Ala Leu Glu Ser Ile Val Ala Gln Leu Ser Arg Pro Asp
885 890 895Pro Ala Leu Ala
Ala Leu Thr Asn Asp His Leu Val Ala Leu Ala Cys 900
905 910Leu Gly Gly Arg Pro Ala Leu Asp Ala Val Lys
Lys Gly Leu Pro His 915 920 925Ala
Pro Ala Leu Ile Lys Arg Thr Asn Arg Arg Ile Pro Glu Arg Thr 930
935 940Ser His Arg Val Ala Asp His Ala Gln Val
Val Arg Val Leu Gly Phe945 950 955
960Phe Gln Cys His Ser His Pro Ala Gln Ala Phe Asp Asp Ala Met
Thr 965 970 975Gln Phe Gly
Met Ser Arg His Gly Leu Leu Gln Leu Phe Arg Arg Val 980
985 990Gly Val Thr Glu Leu Glu Ala Arg Ser Gly
Thr Leu Pro Pro Ala Ser 995 1000
1005Gln Arg Trp Asp Arg Ile Leu Gln Ala Ser Gly Met Lys Arg Ala
1010 1015 1020Lys Pro Ser Pro Thr Ser
Thr Gln Thr Pro Asp Gln Ala Ser Leu 1025 1030
1035His Ala Phe Ala Asp Ser Leu Glu Arg Asp Leu Asp Ala Pro
Ser 1040 1045 1050Pro Met His Glu Gly
Asp Gln Thr Arg Ala Ser Ser Arg Lys Arg 1055 1060
1065Ser Arg Ser Asp Arg Ala Val Thr Gly Pro Ser Ala Gln
Gln Ser 1070 1075 1080Phe Glu Val Arg
Val Pro Glu Gln Arg Asp Ala Leu His Leu Pro 1085
1090 1095Leu Ser Trp Arg Val Lys Arg Pro Arg Thr Ser
Ile Gly Gly Gly 1100 1105 1110Leu Pro
Asp Pro Gly Thr Pro Thr Ala Ala Asp Leu Ala Ala Ser 1115
1120 1125Ser Thr Val Met Arg Glu Gln Asp Glu Asp
Pro Phe Ala Gly Ala 1130 1135 1140Ala
Asp Asp Phe Pro Ala Phe Asn Glu Glu Glu Leu Ala Trp Leu 1145
1150 1155Met Glu Leu Leu Pro Gln
11601111321PRTXanthomonas campestris pv.
armoraciaeN-terminus(1)..(288)Repeat 1(289)..(323)Repeat
2(324)..(358)Repeat 3(359)..(393)Repeat 4(394)..(428)Repeat
5(429)..(463)Repeat 6(464)..(498)Repeat 7(499)..(533)Repeat
8(534)..(568)Repeat 9(569)..(603)Repeat 10(604)..(638)Repeat
11(639)..(673)Repeat 12(674)..(708)Repeat 13(709)..(743)Repeat
14(744)..(778)Repeat 15(779)..(813)Repeat 16(814)..(848)Repeat
17(849)..(883)Repeat 18(884)..(918)Repeat 19(919)..(953)Repeat
20(954)..(988)Repeat 21(989)..(1023)Repeat
21.5(1024)..(1043)C-terminus(1044)..(1321) 111Met Asp Pro Ile Arg Ser Arg
Thr Pro Ser Pro Ala Arg Glu Leu Leu1 5 10
15Ser Gly Pro Gln Pro Asp Gly Val Gln Pro Thr Ala Asp
Arg Gly Val 20 25 30Ser Pro
Pro Ala Gly Gly Pro Leu Asp Gly Leu Pro Ala Arg Arg Thr 35
40 45Met Ser Arg Thr Arg Leu Pro Ser Pro Pro
Ala Pro Ser Pro Ala Phe 50 55 60Ser
Ala Asp Ser Phe Ser Asp Leu Leu Arg Gln Phe Asp Pro Ser Leu65
70 75 80Phe Asn Thr Ser Leu Phe
Asp Ser Leu Pro Pro Phe Gly Ala His His 85
90 95Thr Glu Ala Ala Thr Gly Glu Trp Asp Glu Val Gln
Ser Gly Leu Arg 100 105 110Ala
Ala Asp Ala Pro Pro Pro Thr Met Arg Val Ala Val Thr Ala Ala 115
120 125Arg Pro Pro Arg Ala Lys Pro Ala Pro
Arg Arg Arg Ala Ala Gln Pro 130 135
140Ser Asp Ala Ser Pro Ala Ala Gln Val Asp Leu Arg Thr Leu Gly Tyr145
150 155 160Ser Gln Gln Gln
Gln Glu Lys Ile Lys Pro Lys Val Arg Ser Thr Val 165
170 175Ala Gln His His Glu Ala Leu Val Gly His
Gly Phe Thr His Ala His 180 185
190Ile Val Ala Leu Ser Gln His Pro Ala Ala Leu Gly Thr Val Ala Val
195 200 205Lys Tyr Gln Asp Met Ile Ala
Ala Leu Pro Glu Ala Thr His Glu Ala 210 215
220Ile Val Gly Val Gly Lys Gln Trp Ser Gly Ala Arg Ala Leu Glu
Ala225 230 235 240Leu Leu
Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Leu Asp
245 250 255Thr Gly Gln Leu Leu Lys Ile
Ala Lys Arg Gly Gly Val Thr Ala Val 260 265
270Glu Ala Val His Ala Trp Arg Asn Ala Leu Thr Gly Ala Pro
Leu Asn 275 280 285Leu Thr Pro Glu
Gln Val Val Ala Ile Ala Ser Asn Asn Gly Gly Lys 290
295 300Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
Leu Cys Gln Ala305 310 315
320Pro His Asp Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Ile Gly
325 330 335Gly Gly Lys Gln Ala
Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu 340
345 350Cys Gln Ala Pro His Asp Leu Thr Pro Glu Gln Val
Val Ala Ile Ala 355 360 365Ser Asn
Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu 370
375 380Pro Val Leu Cys Gln Ala Pro His Cys Leu Thr
Pro Glu Gln Val Val385 390 395
400Ala Ile Ala Ser Asn Ile Gly Gly Lys Gln Ala Leu Glu Thr Val Gln
405 410 415Ala Leu Leu Pro
Val Leu Cys Gln Ala Pro His Cys Leu Thr Pro Glu 420
425 430Gln Val Val Ala Ile Ala Ser Asn Gly Gly Gly
Lys Gln Ala Leu Glu 435 440 445Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala Pro His Asp Leu 450
455 460Thr Pro Glu Gln Val Val Ala Ile Ala Ser
Asn Gly Gly Gly Lys Gln465 470 475
480Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
Pro 485 490 495His Asp Leu
Thr Pro Glu Gln Val Val Ala Ile Ala Ser His Asp Gly 500
505 510Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
Leu Leu Pro Val Leu Cys 515 520
525Gln Ala Pro His Asp Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser 530
535 540Asn Gly Gly Gly Lys Gln Ala Leu
Glu Thr Val Gln Arg Leu Leu Pro545 550
555 560Val Leu Cys Gln Ala Pro His Asp Leu Thr Pro Glu
Gln Val Val Ala 565 570
575Ile Ala Ser His Asp Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg
580 585 590Leu Leu Pro Val Leu Cys
Gln Ala Pro His Asp Leu Thr Pro Glu Gln 595 600
605Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys Gln Ala Leu
Glu Thr 610 615 620Val Gln Ala Leu Leu
Pro Val Leu Cys Gln Ala Pro His Cys Leu Thr625 630
635 640Pro Glu Gln Val Val Ala Ile Ala Ser His
Asp Gly Gly Lys Gln Ala 645 650
655Leu Glu Thr Val Gln Ala Leu Leu Pro Val Leu Cys Gln Ala Pro His
660 665 670Asp Leu Thr Pro Glu
Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly 675
680 685Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro
Val Leu Cys Gln 690 695 700Ala Pro His
Asp Leu Thr Arg Glu Gln Val Val Ala Ile Ala Ser His705
710 715 720Asp Gly Gly Lys Gln Ala Leu
Glu Thr Val Gln Arg Leu Leu Pro Val 725
730 735Leu Cys Gln Ala Pro His Asp Leu Thr Pro Glu Gln
Val Val Ala Ile 740 745 750Ala
Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu 755
760 765Leu Pro Val Leu Cys Gln Ala Pro His
Asp Leu Thr Pro Glu Gln Val 770 775
780Val Ala Ile Ala Ser His Asp Gly Gly Lys Gln Ala Leu Glu Thr Val785
790 795 800Gln Arg Leu Leu
Pro Val Leu Cys Gln Ala Pro His Asp Leu Thr Pro 805
810 815Glu Gln Val Val Ala Ile Ala Ser Asn Gly
Gly Gly Lys Gln Ala Leu 820 825
830Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala Pro His Asp
835 840 845Leu Thr Pro Glu Gln Val Val
Ala Ile Ala Ser His Asp Gly Gly Lys 850 855
860Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys Gln
Ala865 870 875 880Pro His
Asp Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser His Asp
885 890 895Gly Gly Lys Gln Ala Leu Glu
Thr Val Gln Arg Leu Leu Pro Val Leu 900 905
910Cys Gln Ala Pro His Asp Leu Thr Pro Glu Gln Val Val Ala
Ile Ala 915 920 925Ser Asn Gly Gly
Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu 930
935 940Pro Val Leu Cys Gln Ala Pro His Asp Leu Thr Pro
Glu Gln Val Val945 950 955
960Ala Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln
965 970 975Ala Leu Leu Pro Val
Leu Cys Gln Ala Pro His Asp Leu Thr Pro Glu 980
985 990Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys
Gln Ala Leu Glu 995 1000 1005Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala Pro His Asp 1010
1015 1020Leu Thr Pro Glu Gln Val Val Ala Ile
Ala Ser Asn Gly Gly Gly 1025 1030
1035Lys Gln Ala Leu Glu Ser Ile Phe Ala Gln Leu Ser Arg Pro Asp
1040 1045 1050Pro Ala Leu Ala Ala Leu
Thr Asn Asp Arg Leu Val Ala Leu Ala 1055 1060
1065Cys Ile Gly Gly Arg Ser Ala Leu Asn Ala Val Lys Asp Gly
Leu 1070 1075 1080Pro Asn Ala Leu Thr
Leu Ile Arg Arg Ala Asn Ser Arg Ile Pro 1085 1090
1095Glu Arg Thr Ser His Leu Val Ala Asp His Thr Gln Val
Val Arg 1100 1105 1110Val Leu Gly Phe
Phe Gln Cys His Ser His Pro Ala Gln Ala Phe 1115
1120 1125Asp Glu Ala Met Thr Gln Phe Gly Met Ser Arg
His Gly Leu Leu 1130 1135 1140Gln Leu
Phe Arg Arg Val Gly Val Thr Glu Leu Glu Ala Arg Ser 1145
1150 1155Gly Thr Leu Pro Pro Ala Ser Gln Arg Trp
Asp Arg Ile Leu Gln 1160 1165 1170Ala
Ser Gly Met Lys Arg Ala Lys Pro Ser Pro Thr Ser Thr Gln 1175
1180 1185Thr Pro Asp Gln Ala Ser Leu His Ala
Phe Ala Asp Ser Leu Glu 1190 1195
1200Arg Asp Leu Asp Ala Pro Ser Pro Met His Glu Gly Asp Gln Thr
1205 1210 1215Arg Ala Ser Ser Arg Lys
Arg Ser Arg Ser Asp Arg Ala Val Thr 1220 1225
1230Gly Pro Ser Ala Gln Gln Ser Phe Glu Val Arg Val Pro Glu
Gln 1235 1240 1245Arg Asp Ala Leu His
Leu Pro Leu Leu Ser Trp Gly Val Lys Arg 1250 1255
1260Pro Arg Thr Arg Ile Gly Gly Leu Leu Asp Pro Gly Thr
Pro Met 1265 1270 1275Asp Ala Asp Leu
Val Ala Ser Ser Thr Val Val Trp Glu Gln Asp 1280
1285 1290Ala Asp Pro Phe Ala Gly Thr Ala Asp Asp Phe
Pro Ala Phe Asn 1295 1300 1305Glu Glu
Glu Leu Ala Trp Leu Met Glu Leu Leu Pro His 1310
1315 1320112960PRTXanthomonas campestris pv.
armoraciaeN-terminus(1)..(288)Repeat 1(289)..(322)Repeat
2(323)..(356)Repeat 3(357)..(390)Repeat 4(391)..(424)Repeat
5(425)..(458)Repeat 6(459)..(492)Repeat 7(493)..(526)Repeat
8(527)..(560)Repeat 9(561)..(594)Repeat 10(595)..(628)Repeat
11(629)..(662)Repeat 11.5(663)..(682)C-terminus(683)..(960) 112Met Asp
Pro Ile Arg Ser Arg Thr Pro Ser Pro Ala Arg Glu Leu Leu1 5
10 15Ser Gly Pro Gln Pro Asp Gly Val
Gln Pro Thr Ala Asp Arg Gly Val 20 25
30Ser Pro Pro Ala Gly Gly Pro Leu Asp Gly Leu Pro Ala Arg Arg
Thr 35 40 45Met Ser Arg Thr Arg
Leu Pro Ser Pro Pro Ala Pro Ser Pro Ala Phe 50 55
60Ser Ala Asp Ser Phe Ser Asp Leu Leu Arg Gln Phe Asp Pro
Ser Leu65 70 75 80Phe
Asn Thr Ser Leu Phe Asp Ser Leu Pro Pro Phe Gly Ala His His
85 90 95Thr Glu Ala Ala Thr Gly Glu
Trp Asp Glu Val Gln Ser Gly Leu Arg 100 105
110Ala Ala Asp Ala Pro Pro Pro Thr Met Arg Val Ala Val Thr
Ala Ala 115 120 125Arg Pro Pro Arg
Ala Lys Pro Ala Pro Arg Arg Arg Ala Ala Gln Pro 130
135 140Ser Asp Ala Ser Pro Ala Ala Gln Val Asp Leu Arg
Thr Leu Gly Tyr145 150 155
160Ser Gln Gln Gln Gln Glu Lys Ile Lys Pro Lys Val Arg Ser Thr Val
165 170 175Ala Gln His His Glu
Ala Leu Val Gly His Gly Phe Thr His Ala His 180
185 190Ile Val Ala Leu Ser Gln His Pro Ala Ala Leu Gly
Thr Val Ala Val 195 200 205Lys Tyr
Gln Asp Met Ile Ala Ala Leu Pro Glu Ala Thr His Glu Ala 210
215 220Ile Val Gly Val Gly Lys Gln Trp Ser Gly Ala
Arg Ala Leu Glu Ala225 230 235
240Leu Leu Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Leu Asp
245 250 255Thr Gly Gln Leu
Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 260
265 270Glu Ala Val His Ala Trp Arg Asn Ala Leu Thr
Gly Ala Pro Leu Asn 275 280 285Leu
Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 290
295 300Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
Pro Val Leu Cys Gln Ala305 310 315
320His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser His Asp
Gly 325 330 335Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys 340
345 350Gln Ala His Gly Leu Thr Pro Glu Gln Val
Val Ala Ile Ala Ser Asn 355 360
365Ile Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Ala Leu Leu Pro Val 370
375 380Leu Cys Gln Ala His Gly Leu Thr
Pro Glu Gln Val Val Ala Ile Ala385 390
395 400Ser His Asp Gly Gly Lys Gln Ala Leu Glu Thr Val
Gln Arg Leu Leu 405 410
415Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala
420 425 430Ile Ala Ser His Asp Gly
Gly Lys Gln Ala Leu Glu Thr Val Gln Arg 435 440
445Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val 450 455 460Val Ala Ile Ala Ser
His Asp Gly Gly Lys Gln Ala Leu Glu Thr Val465 470
475 480Gln Arg Leu Leu Pro Val Leu Cys Gln Ala
His Gly Leu Thr Pro Gln 485 490
495Gln Val Val Ala Ile Ala Ser Asn Ser Gly Gly Lys Gln Ala Leu Glu
500 505 510Thr Val Gln Arg Leu
Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr 515
520 525Pro Gln Gln Val Val Ala Ile Ala Ser Asn Ser Gly
Gly Lys Gln Ala 530 535 540Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly545
550 555 560Leu Thr Pro Gln Gln Val Val
Ala Ile Ala Ser Asn Ser Gly Gly Lys 565
570 575Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
Leu Cys Gln Ala 580 585 590His
Gly Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser His Asp Gly 595
600 605Gly Lys Gln Ala Leu Glu Thr Val Gln
Arg Leu Leu Pro Val Leu Cys 610 615
620Gln Ala His Gly Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn625
630 635 640Ile Gly Gly Lys
Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val 645
650 655Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val Val Ala Ile Ala 660 665
670Ser Asn Gly Gly Gly Arg Pro Ala Leu Glu Ser Ile Val Ala Gln Leu
675 680 685Ser Arg Pro Asp Pro Ala Leu
Ala Ala Leu Thr Asn Asp His Leu Val 690 695
700Ala Leu Ala Cys Leu Gly Gly Arg Pro Ala Leu Asp Ala Val Lys
Lys705 710 715 720Gly Leu
Pro His Ala Pro Ala Leu Ile Lys Arg Thr Asn Arg Arg Ile
725 730 735Pro Glu Arg Thr Ser His Arg
Val Ala Asp His Ala Gln Val Val Arg 740 745
750Val Leu Gly Phe Phe Gln Cys His Ser His Pro Ala Gln Ala
Phe Asp 755 760 765Asp Ala Met Thr
Gln Phe Gly Met Ser Arg His Gly Leu Leu Gln Leu 770
775 780Phe Arg Arg Val Gly Val Thr Glu Leu Glu Ala Arg
Ser Gly Thr Leu785 790 795
800Pro Pro Ala Ser Gln Arg Trp Asp Arg Ile Leu Gln Ala Ser Gly Met
805 810 815Lys Arg Ala Lys Pro
Ser Pro Thr Ser Thr Gln Thr Pro Asp Gln Ala 820
825 830Ser Leu His Ala Phe Ala Asp Ser Leu Glu Arg Asp
Leu Asp Ala Pro 835 840 845Ser Pro
Met His Glu Gly Asp Gln Thr Arg Ala Ser Ser Arg Lys Arg 850
855 860Ser Arg Ser Asp Arg Ala Val Thr Gly Pro Ser
Ala Gln Gln Ser Phe865 870 875
880Glu Val Arg Val Pro Glu Gln Arg Asp Ala Leu His Leu Pro Leu Leu
885 890 895Ser Trp Gly Val
Lys Arg Pro Arg Thr Arg Ile Gly Gly Leu Leu Asp 900
905 910Pro Gly Thr Pro Met Asp Ala Asp Leu Val Ala
Ser Ser Thr Val Val 915 920 925Trp
Glu Gln Asp Ala Asp Pro Phe Ala Gly Thr Ala Asp Asp Phe Pro 930
935 940Ala Phe Asn Glu Glu Glu Leu Ala Trp Leu
Met Glu Leu Leu Pro Gln945 950 955
9601131062PRTXanthomonas campestris pv.
armoraciaeN-terminus(1)..(288)Repeat 1(289)..(322)Repeat
2(323)..(356)Repeat 3(357)..(390)Repeat 4(391)..(424)Repeat
5(425)..(458)Repeat 6(459)..(492)Repeat 7(493)..(526)Repeat
8(527)..(560)Repeat 9(561)..(594)Repeat 10(595)..(628)Repeat
11(629)..(662)Repeat 12(663)..(696)Repeat 13(697)..(730)Repeat
14(731)..(764)Repeat 14.5(765)..(784)C-terminus(785)..(1062) 113Met Asp
Pro Ile Arg Ser Arg Thr Pro Ser Pro Ala Arg Glu Leu Leu1 5
10 15Ser Gly Pro Gln Pro Asp Gly Val
Gln Pro Thr Ala Asp Arg Gly Val 20 25
30Ser Pro Pro Ala Gly Gly Pro Leu Asp Gly Leu Pro Ala Arg Arg
Thr 35 40 45Met Ser Arg Thr Arg
Leu Pro Ser Pro Pro Ala Pro Ser Pro Ala Phe 50 55
60Ser Ala Asp Ser Phe Ser Asp Leu Leu Arg Gln Phe Asp Pro
Ser Leu65 70 75 80Phe
Asn Thr Ser Leu Phe Asp Ser Leu Pro Pro Phe Gly Ala His His
85 90 95Thr Glu Ala Ala Thr Gly Glu
Trp Asp Glu Val Gln Ser Gly Leu Arg 100 105
110Ala Ala Asp Ala Pro Pro Pro Thr Met Arg Val Ala Val Thr
Ala Ala 115 120 125Arg Pro Pro Arg
Ala Lys Pro Ala Pro Arg Arg Arg Ala Ala Gln Pro 130
135 140Ser Asp Ala Ser Pro Ala Ala Gln Val Asp Leu Arg
Thr Leu Gly Tyr145 150 155
160Ser Gln Gln Gln Gln Glu Lys Ile Lys Pro Lys Val Arg Ser Thr Val
165 170 175Ala Gln His His Glu
Ala Leu Val Gly His Gly Phe Thr His Ala His 180
185 190Ile Val Ala Leu Ser Gln His Pro Ala Ala Leu Gly
Thr Val Ala Val 195 200 205Lys Tyr
Gln Asp Met Ile Ala Ala Leu Pro Glu Ala Thr His Glu Ala 210
215 220Ile Val Gly Val Gly Lys Gln Trp Ser Gly Ala
Arg Ala Leu Glu Ala225 230 235
240Leu Leu Thr Val Ala Gly Glu Leu Arg Gly Pro Pro Leu Gln Leu Asp
245 250 255Thr Gly Gln Leu
Leu Lys Ile Ala Lys Arg Gly Gly Val Thr Ala Val 260
265 270Glu Ala Val His Ala Trp Arg Asn Ala Leu Thr
Gly Ala Pro Leu Asn 275 280 285Leu
Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys 290
295 300Gln Ala Leu Glu Thr Val Gln Ala Leu Leu
Pro Val Leu Cys Gln Ala305 310 315
320His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser His Asp
Gly 325 330 335Gly Lys Gln
Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val Leu Cys 340
345 350Gln Ala His Gly Leu Thr Pro Glu Gln Val
Val Ala Ile Ala Ser His 355 360
365Asp Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val 370
375 380Leu Cys Gln Ala His Gly Leu Thr
Pro Gln Gln Val Val Ala Ile Ala385 390
395 400Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val
Gln Arg Leu Leu 405 410
415Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Glu Gln Val Val Ala
420 425 430Ile Ala Ser Asn Ser Gly
Gly Lys Gln Ala Leu Glu Thr Val Gln Ala 435 440
445Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Gln
Gln Val 450 455 460Val Ala Ile Ala Ser
Asn Ser Gly Gly Lys Gln Ala Leu Glu Thr Val465 470
475 480Gln Ala Leu Leu Pro Val Leu Cys Gln Ala
His Gly Leu Thr Pro Glu 485 490
495Gln Val Val Ala Ile Ala Ser Asn Ile Gly Gly Lys Gln Ala Leu Glu
500 505 510Thr Val Gln Arg Leu
Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr 515
520 525Pro Gln Gln Val Val Ala Ile Ala Ser His Asp Gly
Gly Lys Gln Ala 530 535 540Leu Glu Thr
Val Gln Arg Leu Leu Pro Val Leu Cys Gln Ala His Gly545
550 555 560Leu Thr Pro Gln Gln Val Val
Ala Ile Ala Ser Asn Gly Gly Gly Lys 565
570 575Gln Ala Leu Glu Thr Val Gln Arg Leu Leu Pro Val
Leu Cys Gln Ala 580 585 590His
Gly Leu Thr Pro Glu Gln Val Val Ala Ile Ala Ser Asn Ile Gly 595
600 605Gly Lys Gln Ala Leu Glu Thr Val Gln
Ala Leu Leu Pro Val Leu Cys 610 615
620Gln Ala His Gly Leu Thr Pro Gln Gln Val Val Ala Ile Ala Ser Asn625
630 635 640Ser Gly Gly Lys
Gln Ala Leu Glu Thr Val Gln Ala Leu Leu Pro Val 645
650 655Leu Cys Gln Ala His Gly Leu Thr Pro Glu
Gln Val Val Ala Ile Ala 660 665
670Ser Asn Ile Gly Gly Lys Gln Ala Leu Glu Thr Val Gln Arg Leu Leu
675 680 685Pro Val Leu Cys Gln Ala His
Gly Leu Thr Pro Glu Gln Val Val Ala 690 695
700Ile Ala Ser Asn Gly Gly Gly Lys Gln Ala Leu Glu Thr Val Gln
Arg705 710 715 720Leu Leu
Pro Val Leu Cys Gln Ala His Gly Leu Thr Pro Glu Gln Val
725 730 735Val Ala Ile Ala Ser Asn Ile
Gly Gly Lys Gln Ala Leu Glu Thr Val 740 745
750Gln Ala Leu Leu Pro Val Leu Cys Gln Ala His Gly Leu Thr
Pro Glu 755 760 765Gln Val Val Ala
Ile Ala Ser Asn Gly Gly Gly Arg Pro Ala Leu Glu 770
775 780Ser Ile Val Ala Gln Leu Ser Arg Pro Asp Pro Ala
Leu Ala Ala Leu785 790 795
800Thr Asn Asp His Leu Val Ala Leu Ala Cys Leu Gly Gly Arg Pro Ala
805 810 815Leu Asp Ala Val Lys
Lys Gly Leu Pro His Ala Pro Ala Leu Ile Lys 820
825 830Arg Thr Asn Arg Arg Ile Pro Glu Arg Thr Ser His
Arg Val Ala Asp 835 840 845His Ala
Gln Val Val Arg Val Leu Gly Phe Phe Gln Cys His Ser His 850
855 860Pro Ala Gln Ala Phe Asp Asp Ala Met Thr Gln
Phe Gly Met Ser Arg865 870 875
880His Gly Leu Leu Gln Leu Phe Arg Arg Val Gly Val Thr Glu Leu Glu
885 890 895Ala Arg Ser Gly
Thr Leu Pro Pro Ala Ser Gln Arg Trp Asp Arg Ile 900
905 910Leu Gln Ala Ser Gly Met Lys Arg Ala Lys Pro
Ser Pro Thr Ser Thr 915 920 925Gln
Thr Pro Asp Gln Ala Ser Leu His Ala Phe Ala Asp Ser Leu Glu 930
935 940Arg Asp Leu Asp Ala Pro Ser Pro Met His
Glu Gly Asp Gln Thr Arg945 950 955
960Ala Ser Ser Arg Lys Arg Ser Arg Ser Asp Arg Ala Val Thr Gly
Pro 965 970 975Ser Ala Gln
Gln Ser Phe Glu Val Arg Val Pro Glu Gln Arg Asp Ala 980
985 990Leu His Leu Pro Leu Ser Trp Arg Val Lys
Arg Pro Arg Thr Ser Ile 995 1000
1005Gly Gly Gly Leu Pro Asp Pro Gly Thr Pro Thr Ala Ala Asp Leu
1010 1015 1020Ala Ala Ser Ser Thr Val
Met Arg Glu Gln Asp Glu Asp Pro Phe 1025 1030
1035Ala Gly Ala Ala Asp Asp Phe Pro Ala Phe Asn Glu Glu Glu
Leu 1040 1045 1050Ala Trp Leu Met Glu
Leu Leu Pro Gln 1055 1060
* * * * *