Skip to Content

GSK’s Published Kinase Inhibitor Set

Speaking about open-source drug discovery (such as it is) and sharing of data sets (such as they are), I really should mention a significant example in this area: the GSK Published Kinase Inhibitor Set. (It was mentioned in the comments to this post). The company has made 367 compounds available to any academic investigator working in the kinase field, as long as they make their results publicly available (at ChEMBL, for example). The people at GSK doing this are David Drewry and William Zuercher, for the record – here’s a recent paper from them and their co-workers on the compound set and its behavior in reporter-gene assays.
Why are they doing this? To seed discovery in the field. There’s an awful lot of chemical biology to be done in the kinase field, far more than any one organization could take on, and the more sets of eyes (and cerebral cortices) that are on these problems, the better. So far, there have been about 80 collaborations, mostly in Europe and North America, all the way from broad high-content phenotypic screening to targeted efforts against rare tumor types.
The plan is to continue to firm up the collection, making more data available for each compound as work is done on them, and to add more compounds with different selectivity profiles and chemotypes. Now, the compounds so far are all things that have been published on by GSK in the past, obviating concerns about IP. There are, though, a multitude of other compounds in the literature from other companies, and you have to think that some of these would be useful additions to the set. How, though, does one get this to happen? That’s the stage that things are in now. Beyond that, there’s the possibility of some sort of open network to optimize entirely new probes and tools, but there’s plenty that could be done even before getting to that stage.
So if you’re in academia, and interested in kinase pathways, you absolutely need to take a look at this compound set. And for those of us in industry, we need to think about the benefits that we could get by helping to expand it, or by starting similar efforts of our own in other fields. The science is big enough for it. Any takers?

23 comments on “GSK’s Published Kinase Inhibitor Set”

  1. Varadh says:

    Looks a good beginning towards the future through open network and towards open source.

  2. Varadh says:

    Looks a good beginning towards the future through open network and towards open source.

  3. Bill says:

    Is there a website for the GSK Published Kinase Inhibitor Set? Because you would think there would be, but I can’t find it, which doesn’t speak so well for the effort.

  4. ChrisL says:

    Like (3) Bill I am having trouble getting the structures for the GSK set. The “recent paper” PlosOne reference that Derek links to did not help me. The references in the paper to PubChem data deposition were totally unhelpful. Why cannot the chemistry structures be deposited as a plain vanilla sdf file that everyone can read and examine in their favorite software? This is a good example of how to obfuscate any reasonable medicinal chemistry discussion by hiding the chemistry data in nearly inaccessible formats.

  5. DCRogers says:

    “The company has made 367 compounds available to any academic investigator working in the kinase field”
    Sadly, there is less here than meets the eye — I want more negative data. Without it, modeling is well-neigh useless — only able to regurgitate the blindingly obvious surrounded by vast uncovered spaces of I-don’t-know-Jack.
    The authors acknowledge the importance of negative data stating “importantly [includes] compounds inactive at their original kinase target”. But it’s doubtful to me that a mere 367 compounds is capturing more than a fragment of what can go structurally wrong, or provide even a glimpse of the variety of structural motifs that might be worth exploring.
    That said, I bet they have kinase assay data that would make me drool — I’ll wait until they make that available before I pop the champagne cork.

  6. William Zuercher says:

    @Bill and @ChrisL: Thank you for the interest in seeing the compounds in the set. We wish to eschew any obfuscation, so ’ll see if we can work with your suggestions on a simple link to a PDF or other easily digestible way to access the structures and data. In the meantime, here is how to get the structures and data from ChEMBL: from the main ChEMBL site (, follow the “Activity Source Filter” link. Deselect all options save the “GSK Published Kinase Inhibitor Set” and click update. Any search will now be conducted only on the PKIS. To retrieve all data, enter the wildcard “%” into the search field and search for compound, target, or assay data with the three buttons to the right.
    Any further suggestions, questions, or comments are most welcome!

  7. William Zuercher says:

    I don’t know why the funky characters popped into my last post. The wildcard symbol for ChEMBL searching is the percent sign.
    @DCRogers: A large and growing body of data is available at ChEMBL.

  8. Chris Hayes says:

    I’m an academic in the UK, and we have recently used the GSK PKIS set in a phenotypic screen. Bill Zuercher, and all involved at GSK, have been fantastic and they have helped us much more than I could have hoped. It has been very, very open indeed, and much more open than collaborations with some academic colleagues!
    I would encourage all interested parties to contact GSK (that probably means Bill, as his e-mail is in the PLoS One paper!), and I’m sure that he will try to help you as much as he’s helped us.
    This is a fantastic (free!) resource for academics.
    Many thanks Bill (If you are reading this post).

  9. DCRogers says:

    @7: “A large and growing body of data is available at ChEMBL”
    Sorry for being slightly sour in my previous note: you are correct that ChEMBL has a growing amount of interesting large-scale data sets. That said, you get a lot of noise because data samples arrives from different sources, with widely-varying quality and techniques.
    To re-spin my comment in a positive direction, it would be great if the larger universe of associated screening data you must have searched around your compound set was available in ChEMBL.
    (Or perhaps it is, and I need to update my slightly-musty database?)
    Anyhow, I should have been a bit more appreciative of the efforts you must have gone through to encourage the release of this data set — I realize it must have been quite a bit of work getting it out through legal, accounting, etc. You folks deserve much thanks for that!

  10. @DCRogers: they have deposited in ChEMBL binding data at 2 concentrations (1 uM and 100 nM) for a panel of 220 kinases. It’s all there for the viewing whether you obtain the compounds or not. All of this data comes from Nanosyn.
    If you google “GSK PKIS” you can find a few powerpoint presentations that David and Bill have given on the PKIS.
    I’ll echo the comments of others. My lab has this compound set and both Bill and David have been great to work with.

  11. NigelR says:

    Given all of the data in ChEMBL is it possible to define both the set of published kinase inhibitors to give maximal interpretable of the kinome (ie not just maximising coverage by using pan inhibitors) and what profiles are still needed to increase that coverage ?
    At least having a list of which kinase inhbitors would be valuable to be released by the less enlightened pharmas would be better than a random release. It would also help direct academic med chem efforts to the unrepresented regions.

  12. David Borhani says:

    The CHEMBL download doesn’t work for me. I selected only the GSK set, searched with %, and got zero hits. When I search with *, I get 1,225,703 hits (even though the source filter says GSK is the only source selected).
    An SDF file from the source would make life a bit easier.

  13. William Zuercher says:

    @David Borhani: The search worked earlier. As luck would have it, a new version of ChEMBL was just released today. If you email me ( your address, I will send an SD file.

  14. DCRogers says:

    @10: “they have deposited in ChEMBL binding data at 2 concentrations (1 uM and 100 nM) for a panel of 220 kinases”
    Thanks Matt, I will update my ChEMBL database and have a go!
    Kudos to the authors, who deserve appreciation rather than brickbats. I fully withdraw my earlier comment — this’ll teach me to think twice before writing cranky messages prior to my first cup of coffee in the morning.

  15. David Drewry says:

    Thank you, Derek, for posting this. As you mentioned we need more people thinking about (and talking about) the benefits of sharing compounds. Drug discovery is difficult, and we will make more headway in collaboration. Discovering new medicines is a rare event, but not because we are not trying, rather we just don’t know enough.
    Sharing well annotated compound sets so that many more experiments can be run is one way to improve our collective knowledge base and make discoveries.
    Bill and I would also like to thank everyone for their comments and suggestions to improve data accessibility. Our friends at ChEMBL recently posted some information and instructions on their blog that will help:

  16. Anonymous says:

    Derek you should post the MTA. Very restrictive. While an admirable first step for a FIPCO, this is more a batched standard MTA than open source drug discovery. We are still waiting on compounds more than 8 months after the inquiry. But I hope it works to guide GSK toward an outcome for patients, and that the experience leads to true open access.

  17. Anonymous says:

    @Anonymous 16: We spent significant effort to make the agreement minimally restrictive and believe that the resulting MTA is consistent with the broad aim of openly advancing kinase science. Most of the collaborations have had no issue whatsoever with the MTA template, and we’ve been able to get the compound set into their hands within 4-6 weeks of initial contact. Any delays are due to changes requested by the recipient institution.

  18. William Zuercher says:

    I posted comment 17.

  19. Steric clash says:

    @18 Let’s have a look!

  20. William Zuercher says:

    @19: I am happy to provide a copy of the MTA upon email request.

  21. CDD Data Guy says:

    Given the significant interest in this dataset, as well as the comments above that getting the data desired from ChEMBL was tricky, the data team at Collaborative Drug Discovery (CDD) have gathered the PKIS data that ChEMBL has kindly made available, and processed it so it could be accessed via CDD’s public access web site ( Public access accounts are free).
    The transfer to CDD makes the data available in a more med-chemist friendly manner. There was also some tidying up of the data set. For example there are actually only 364 compounds (some duplicates were due to salt forms or alternate names of the same molecule) and the target names were normalized where possible (for example, the kinases IKKA, IKKB and IKKE were called IKK-alpha, IKK-beta, IKK-epsilon for the dataset from UNC).

  22. Anonymous says:

    Couldn’t find a way to download the SD files from CDD. Is there a way to do it? Thanks.

  23. Costa says:

    Can anybody tell me please where to request PKIS? I learned that Dr. Zuercher left GSK..

Comments are closed.