Searching

2D and 3D coordinates

By default, compounds are returned with 2D coordinates. Use the record_type keyword argument to specify otherwise:

pcp.get_compounds('Aspirin', 'name', record_type='3d')

Advanced search types

By default, requests look for an exact match with the input. Alternatively, you can specify substructure, superstructure, similarity and identity searches using the searchtype keyword argument:

pcp.get_compounds('CC', searchtype='superstructure', listkey_count=3)

The listkey_count and listkey_start arguments can be used for pagination. Each searchtype has its own options that can be specified as keyword arguments. For example, similarity searches have a Threshold, and super/substructure searches have MatchIsotopes. A full list of options is available in the PUG REST Specification.

Note: These types of search are slow.

Getting a full results list for common compound names

For some very common names, PubChem maintains a filtered whitelist of human-chosen CIDs with the intention of reducing confusion about which is the ‘right’ result. In the past, a search for Glucose would return four different results, each with different stereochemistry information. But now, a single result is returned, which has been chosen as ‘correct’ by the PubChem team.

Unfortunately it isn’t directly possible to return to the previous behaviour, but there is a straightforward workaround: Search for Substances with that name (which are completely unfiltered) and then get the compounds that are derived from those substances.

There area a few different ways you can do this using PubChemPy, but the easiest is probably using the get_cids function:

>>> pcp.get_cids('2-nonenal', 'name', 'substance', list_return='flat')
[17166, 5283335, 5354833]

This searches the substance database for ‘2-nonenal’, and gets the CID for the compound associated with each substance. By default, this returns a mapping between each SID and CID, but the list_return='flat' parameter flattens this into just a single list of unique CIDs.

You can then use Compound.from_cid to get the full Compound record, equivalent to what is returned by get_compounds:

>>> cids = pcp.get_cids('2-nonenal', 'name', 'substance', list_return='flat')
>>> [pcp.Compound.from_cid(cid) for cid in cids]
[Compound(17166), Compound(5283335), Compound(5354833)]