API documentation

This part of the documentation is automatically generated from the PubChemPy source code and comments.

Search functions

pubchempy.get_compounds(identifier, namespace=u'cid', searchtype=None, as_dataframe=False, **kwargs)

Retrieve the specified compound records from PubChem.

Parameters:
  • identifier – The compound identifier to use as a search query.
  • namespace – (optional) The identifier type, one of cid, name, smiles, sdf, inchi, inchikey or formula.
  • searchtype – (optional) The advanced search type, one of substructure, superstructure or similarity.
  • as_dataframe – (optional) Automatically extract the Compound properties into a pandas DataFrame and return that.
pubchempy.get_substances(identifier, namespace=u'sid', as_dataframe=False, **kwargs)

Retrieve the specified substance records from PubChem.

Parameters:
  • identifier – The substance identifier to use as a search query.
  • namespace – (optional) The identifier type, one of sid, name or sourceid/<source name>.
  • as_dataframe – (optional) Automatically extract the Substance properties into a pandas DataFrame and return that.
pubchempy.get_assays(identifier, namespace=u'aid', **kwargs)

Retrieve the specified assay records from PubChem.

Parameters:
  • identifier – The assay identifier to use as a search query.
  • namespace – (optional) The identifier type.
pubchempy.get_properties(properties, identifier, namespace=u'cid', searchtype=None, as_dataframe=False, **kwargs)

Retrieve the specified properties from PubChem.

Parameters:
  • identifier – The compound, substance or assay identifier to use as a search query.
  • namespace – (optional) The identifier type.
  • searchtype – (optional) The advanced search type, one of substructure, superstructure or similarity.
  • as_dataframe – (optional) Automatically extract the properties into a pandas DataFrame.

Compound

class pubchempy.Compound(record)

Corresponds to a single record from the PubChem Compound database.

The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Each Compound is uniquely identified by a CID.

Initialize with a record dict from the PubChem PUG REST service.

For most users, the from_cid() class method is probably a better way of creating Compounds.

Parameters:record (dict) – A compound record returned by the PubChem PUG REST service.
record

The raw compound record returned by the PubChem PUG REST service.

classmethod from_cid(cid, **kwargs)

Retrieve the Compound record for the specified CID.

Usage:

c = Compound.from_cid(6819)
Parameters:cid (int) – The PubChem Compound Identifier (CID).
to_dict(properties=None)

Return a dictionary containing Compound data. Optionally specify a list of the desired properties.

synonyms, aids and sids are not included unless explicitly specified using the properties parameter. This is because they each require an extra request.

to_series(properties=None)

Return a pandas Series containing Compound data. Optionally specify a list of the desired properties.

synonyms, aids and sids are not included unless explicitly specified using the properties parameter. This is because they each require an extra request.

cid

The PubChem Compound Identifier (CID).

Note

When searching using a SMILES or InChI query that is not present in the PubChem Compound database, an automatically generated record may be returned that contains properties that have been calculated on the fly. These records will not have a CID property.

elements

List of element symbols for atoms in this Compound.

atoms

List of Atoms in this Compound.

bonds

List of Bonds between Atoms in this Compound.

synonyms

A ranked list of all the names associated with this Compound.

Requires an extra request. Result is cached.

sids

Requires an extra request. Result is cached.

aids

Requires an extra request. Result is cached.

charge

Formal charge on this Compound.

molecular_formula

Molecular formula.

molecular_weight

Molecular Weight.

canonical_smiles

Canonical SMILES, with no stereochemistry information.

isomeric_smiles

Isomeric SMILES.

inchi

InChI string.

inchikey

InChIKey.

iupac_name

Preferred IUPAC name.

xlogp

XLogP.

exact_mass

Exact mass.

monoisotopic_mass

Monoisotopic mass.

tpsa

Topological Polar Surface Area.

complexity

Complexity.

h_bond_donor_count

Hydrogen bond donor count.

h_bond_acceptor_count

Hydrogen bond acceptor count.

rotatable_bond_count

Rotatable bond count.

fingerprint

PubChem CACTVS fingerprint.

Each bit in the fingerprint represents the presence or absence of one of 881 chemical substructures.

More information at ftp://ftp.ncbi.nlm.nih.gov/pubchem/specifications/pubchem_fingerprints.txt

heavy_atom_count

Heavy atom count.

isotope_atom_count

Isotope atom count.

atom_stereo_count

Atom stereocenter count.

defined_atom_stereo_count

Defined atom stereocenter count.

undefined_atom_stereo_count

Undefined atom stereocenter count.

bond_stereo_count

Bond stereocenter count.

defined_bond_stereo_count

Defined bond stereocenter count.

undefined_bond_stereo_count

Undefined bond stereocenter count.

covalent_unit_count

Covalently-bonded unit count.

Atom

class pubchempy.Atom(aid, element, x=None, y=None, z=None, charge=0)

Class to represent an atom in a Compound.

Initialize with an atom ID, element symbol, coordinates and optional change.

Parameters:
  • aid (int) – Atom ID
  • element (string) – Element symbol.
  • x (float) – X coordinate.
  • y (float) – Y coordinate.
  • z (float) – (optional) Z coordinate.
  • charge (int) – (optional) Formal charge on atom.
aid = None

The atom ID within the owning Compound.

element = None

The element symbol for this atom.

x = None

The x coordinate for this atom.

y = None

The y coordinate for this atom.

z = None

The z coordinate for this atom. Will be None in 2D Compound records.

charge = None

The formal charge on this atom.

to_dict()

Return a dictionary containing Atom data.

set_coordinates(x, y, z=None)

Set all coordinate dimensions at once.

coordinate_type

Whether this atom has 2D or 3D coordinates.

Bond

class pubchempy.Bond(aid1, aid2, order=u'single', style=None)

Class to represent a bond between two atoms in a Compound.

Initialize with begin and end atom IDs, bond order and bond style.

Parameters:
  • aid1 (int) – Begin atom ID.
  • aid2 (int) – End atom ID.
  • order (string) – Bond order.
aid1 = None

ID of the begin atom of this bond.

aid2 = None

ID of the end atom of this bond.

order = None

Bond order.

style = None

Bond style annotation.

to_dict()

Return a dictionary containing Bond data.

Substance

class pubchempy.Substance(record)

Corresponds to a single record from the PubChem Substance database.

The PubChem Substance database contains all chemical records deposited in PubChem in their most raw form, before any significant processing is applied. As a result, it contains duplicates, mixtures, and some records that don’t make chemical sense. This means that Substance records contain fewer calculated properties, however they do have additional information about the original source that deposited the record.

The PubChem Compound database is constructed from the Substance database using a standardization and deduplication process. Hence each Compound may be derived from a number of different Substances.

classmethod from_sid(sid)

Retrieve the Substance record for the specified SID.

Parameters:sid (int) – The PubChem Substance Identifier (SID).
record = None

A dictionary containing the full Substance record that all other properties are obtained from.

to_dict(properties=None)

Return a dictionary containing Substance data.

If the properties parameter is not specified, everything except cids and aids is included. This is because the aids and cids properties each require an extra request to retrieve.

Parameters:properties – (optional) A list of the desired properties.
to_series(properties=None)

Return a pandas Series containing Substance data.

If the properties parameter is not specified, everything except cids and aids is included. This is because the aids and cids properties each require an extra request to retrieve.

Parameters:properties – (optional) A list of the desired properties.
sid

The PubChem Substance Idenfitier (SID).

synonyms

A ranked list of all the names associated with this Substance.

source_name

The name of the PubChem depositor that was the source of this Substance.

source_id

Unique ID for this Substance within those from the same PubChem depositor source.

standardized_cid

The CID of the Compound that was produced when this Substance was standardized.

May not exist if this Substance was not standardizable.

standardized_compound

Return the Compound that was produced when this Substance was standardized.

Requires an extra request. Result is cached.

deposited_compound

Return a Compound produced from the unstandardized Substance record as deposited.

The resulting Compound will not have a cid and will be missing most properties.

cids

A list of all CIDs for Compounds that were produced when this Substance was standardized.

Requires an extra request. Result is cached.

aids

A list of all AIDs for Assays associated with this Substance.

Requires an extra request. Result is cached.

Assay

class pubchempy.Assay(record)
classmethod from_aid(aid)

Retrieve the Assay record for the specified AID.

Parameters:aid (int) – The PubChem Assay Identifier (AID).
record = None

A dictionary containing the full Assay record that all other properties are obtained from.

to_dict(properties=None)

Return a dictionary containing Assay data.

If the properties parameter is not specified, everything is included.

Parameters:properties – (optional) A list of the desired properties.
aid

The PubChem Substance Idenfitier (SID).

name

The short assay name, used for display purposes.

description

Description

project_category

A category to distinguish projects funded through MLSCN, MLPCN or from literature.

Possible values include mlscn, mlpcn, mlscn-ap, mlpcn-ap, literature-extracted, literature-author, literature-publisher, rnaigi.

comments

Comments and additional information.

results

A list of dictionaries containing details of the results from this Assay.

target

A list of dictionaries containing details of the Assay targets.

revision

Revision identifier for textual description.

aid_version

Incremented when the original depositor updates the record.

pandas functions

Each of the search functions, get_compounds(), get_substances() and get_properties() has an as_dataframe parameter. When set to True, these functions automatically extract properties from each result in the list into a pandas DataFrame and return that instead of the results themselves.

If you already have a list of Compounds or Substances, the functions below allow a DataFrame to be constructed easily.

pubchempy.compounds_to_frame(compounds, properties=None)

Construct a pandas DataFrame from a list of Compound objects.

Optionally specify a list of the desired Compound properties.

pubchempy.substances_to_frame(substances, properties=None)

Construct a pandas DataFrame from a list of Substance objects.

Optionally specify a list of the desired Substance properties.

Exceptions

exception pubchempy.PubChemPyError

Base class for all PubChemPy exceptions.

exception pubchempy.ResponseParseError

PubChem response is uninterpretable.

exception pubchempy.PubChemHTTPError

Generic error class to handle all HTTP error codes.

exception pubchempy.BadRequestError

Request is improperly formed (syntax error in the URL, POST body, etc.).

exception pubchempy.NotFoundError

The input record was not found (e.g. invalid CID).

exception pubchempy.MethodNotAllowedError

Request not allowed (such as invalid MIME type in the HTTP Accept header).

exception pubchempy.TimeoutError

The request timed out, from server overload or too broad a request.

See Avoiding TimeoutError for more information.

exception pubchempy.UnimplementedError

The requested operation has not (yet) been implemented by the server.

exception pubchempy.ServerError

Some problem on the server side (such as a database server down, etc.).

Changes

  • As of v1.0.3, the atoms and bonds properties on Compounds now return lists of Atom and Bond objects, rather than dicts.
  • As of v1.0.2, search functions now return an empty list instead of raising a NotFoundError exception when no results are found. NotFoundError is still raised when attempting to create a Compound using the from_cid class method with an invalid CID.