How to write a reaction in SMILES format

THE ZYMVOL BLOG

How to write a reaction in SMILES

No matter if you are a chemist or not, if you are interested in chemical notation, we hope this post will make you “smile” 🙂

 

What’s SMILES in chemistry?

Simplified Molecular-Input Line-Entry System (SMILES) is a user-friendly, chemical notation method for specifying the structure of molecules and reactions.

It consists of unambiguous, short, linear strings of characters in ASCII format, written in a language made of symbols and simple “grammar” rules.

SMILES was created to facilitate storage, retrieval and modeling of chemical structures and information in computational chemistry. Thanks to its easy and compact format, it requires a small amount of computer memory and makes it convenient for people to use it.

Plus, SMILES can be read by molecule editors and converted into two-dimensional and three-dimensional models. This is very useful, for example, when you need to study the structure of proteins such as enzymes!

 

The SMILES notation system was created in the 1980s at the Mid-Continent Ecology Division Laboratory in Duluth, Minnesota, and funded by the U.S. Environmental Protection Agency.

Afterwards, other organizations have modified and extended SMILES, which also exists in an open standard called OpenSMILES developed by the Blue Obelisk open-source chemistry community.

 

Five rules for writing SMILES

Understanding SMILES is quite easy!

First of all, keep in mind that in SMILES, each notation string represents the topological structure of a molecule or a reaction.

Similar to the concept of a graph, in SMILES the atoms of a molecule are considered as nodes, bonds are the edges, parentheses indicate branching points and numeric labels designate ring connection points.

Benzoic acid
C1=CC=C(C=C1)C(=O)O

 

And now let’s learn the five rules to write in SMILES format:

Rule 1: Atoms

In SMILES, atoms are represented by their atomic symbols: O for oxygen, Br for bromine, and so on, using lower case for the second letter in two-character symbols.

For elements in the «organic subset» (B, C, N, O, P, S, F, Cl, Br, and I) and with their lowest normal valence, attached hydrogens usually don’t need to be written. That’s why methane (CH4) can be written simply as C.

However, all elements and organic ones with other valences must be described in brackets as follows:

 

[OH3+]

Inside brackets, any attached hydrogens have to be indicated by an «H», followed by a digit.

 

[Fe+3]

Meanwhile, formal charges must always be specified by the symbol «+» or «-«, followed by a digit.

 

Rule 2: Bonds

Bonds between atoms are represented by different symbols depending on the type:

 

For single bonds

=

For double bonds (C=O formaldehyde)

#

For triple bonds (C#N hydrogen cyanide)

:

For aromatic bonds.

 

But atoms that are next to each other are assumed to be connected by a single or aromatic bond, so these two may always be omitted, as in ethanol:

 

CCO

 

Rule 3: Branches

In case of molecules with branches, you just have to know that branches are enclosed in parentheses «( )» and the bond that joins the branch to the “parent chain” has to appear inside the parentheses.

Have a look at triethylamine. Its SMILES is CCN(CC)CC, where (CC) refers to the branch that starts from the nitrogen atom:

 

CCN(CC)CC

 

Rule 4: Cyclic structures

Molecules that are shaped in a ring –like aromatic molecules– are also written linearly.

Ring opening and closure are indicated by a digit that follows

 

the atomic symbol at each opening/closure. See for example cyclohexane:

 

C1CCCCC1

 

Curiously, different notations can represent the same cyclic structure, depending on where the ring starts to be written, and all are equally valid.

For example, cyclohexene (see image below) can be written like:

 

C1=CCCCC1
C=1CCCCC1
C1CCCCC=1
C=1CCCCC=1

 

 

All are equally valid and just differ in the ring’s starting point.

Atoms in aromatic rings are written in lower case to be differentiated, as in benzene. This SMILES represents an hexagonal ring of six carbons with one hydrogen atom attached to each:

 

c1ccccc1

 

Rule 5: Disconnected structures

SMILES does not only serve for writing single molecules. You can also represent disconnected compounds or, in other words, atoms not bonded to each other.

But how?

Disconnected structures are written as individual structures separated by a period: «.». They are just adjacent atoms and the order in which ions or ligands are listed is arbitrary. 

For example, sodium chloride (table salt) is an ionic compound and its SMILES looks like this:

 

[Na+].[Cl-]

 

Are you starting to get it?

Thanks to these five rules, chemists can write very complex topological structures and unique strings for every existing molecular structure.

 

How can SMILES represent unique structures?

As you might have guessed from the rules, SMILES strings describe the two-dimensional graphs that chemists normally use to represent molecules.

Three-dimensional structures are also obtained from SMILES strings with energy-minimization approaches, which basically predict protein structure based on the most efficient arrangement of the atoms and bonds of the molecule in terms of free energy.

But how is this notation method so precise, being the diversity of molecules so wide and complex?

As mentioned before, there is more than one SMILES for some molecules.

For example: OCC, C-C-O and C(O)C are all generic SMILES for ethanol. Generic SMILES do not take into account chiral or isotopic information. 

How has this been solved? With canonicalization: algorithms that generate one single specific SMILES among all valid possibilities. 

The unique SMILE takes into account chiral and isotopic specifications. For the previous example, all previous generic SMILES would be converted into the unique smile CCO, a universal identifier for a specific chemical structure.

 

Using a SMILES generator

Understanding the SMILES “language” is always going to be useful for those who have to deal with chemical notations in computational format.

But don’t worry, you don’t need to learn SMILES by heart, because there are tools to generate SMILES.

For example, when our company launched ZYMSCAN, we knew we wanted to use a SMILES generator to make the user experience as easy as possible for chemists.

ZYMSCAN was created to help users know if a certain reaction can be performed enzymatically without wasting time going through other methods.

It consists of three simple steps, which start with submitting the substrate and product SMILES of the reaction of interest.

Thanks to SMILES’ unambiguous format, we are sure to be understanding the user’s very specific needs correctly, because each molecule’s notation is unique.

 

Are there alternatives to SMILES?

SMILES is not the only linear notation. The International Union of Pure and Applied Chemistry (IUPAC) created its own system to standardize the identification for chemical databases: InChi

As SMILES, it is also open source and freely accessible. Both are the most important and commonly used line notations today, and are complementary to each other. 

The big differences are that SMILES is not an identifier, but a chemical representation format. Besides, while InChI is well-documented and standardized through IUPAC, there is no up-to-date specification documentation for SMILES.

The latter is the main reason why the US Environmental Protection Agency, which created SMILES, is working on the interoperability of this format. The aim is to establish a formalized specification to promote the exchange of scientific information together with IUPAC’s InChi.


What's an EC Number & how to interpret it

THE ZYMVOL BLOG

What's an EC Number & how to interpret it

You might have heard about EC numbers.

They are widely used in biochemistry and molecular biology to provide a clear and standardized way to refer to enzymes, especially in scientific literature and databases.

This classification method helps us differentiate among thousands of enzyme types that, otherwise, would be very hard to recognise by everyone.

Because giving enzymes arbitrary, common names worked well when only a few were known.

But now, after more than a century of enzyme discovery, naming more than 8.000 enzymes would be simply impossible.

Only a few names have survived in our common language, such as cellulase (EC 3.2.1.4), papain (EC 3.4.22.2) or proteases (all EC 3.4. numbers). For the rest, EC numbers are our only and best way to talk about the broad world of biocatalysis.

 

What’s an EC Number?

The Enzyme Commission (EC) number is a nomenclature system based on the chemical reactions that these proteins catalyze.

It was developed by the International Union of Biochemistry and Molecular Biology (IUBMB) and consists of a numerical classification scheme where each number is associated with an enzyme-catalyzed reaction.

Note that each EC number is not associated with a specific enzyme, because in nature we can find different enzymes that catalyze the same reaction.

 

How to read an EC number

An EC number is typically represented as a sequence of four numbers separated by periods, which is linked to a recommended name. For example, EC 1.1.1.1 (alcohol dehydrogenase).

Let’s see what each of these four numbers represents:

First Number: The type of reaction or enzyme class

The first number indicates the enzyme class, based on the type of reaction each of these classes catalyzes. There are seven main classes:

EC 1: Oxidoreductases – Enzymes that catalyze oxidation-reduction reactions.

EC 2: Transferases – Enzymes that transfer a functional group (e.g., a methyl or phosphate group).

EC 3: Hydrolases – Enzymes that catalyze the hydrolysis of various bonds.

EC 4: Lyases – Enzymes that break various chemical bonds by means other than hydrolysis and oxidation, often forming a new double bond or ring structure.

EC 5: Isomerases – Enzymes that catalyze the transfer of groups within molecules to yield isomeric forms.

EC 6: Ligases – Enzymes that join two molecules together, typically using ATP.

EC 7: Translocases – Enzymes that catalyze the movement of ions or molecules across membranes or their separation within membranes.

Second and third numbers: Enzyme subclass and sub-subclass

These two numbers provide more detail about the type of molecular group, bond or product involved in the enzyme-catalyzed reaction, narrowing down the specificity of the reaction.

Fourth Number: Enzyme identification

This is the serial number of the enzyme. It represents the specific enzyme identity, related to specific metabolites and/or cofactors involved in the enzyme-catalyzed reaction.

 

A couple of examples

Now that you know the basics, let’s see how this looks put into practice.

Let’s start with the first number that appears in the classification:

 

EC 1.1.1.1

– The first «1» indicates that it’s an oxidoreductase (an enzyme that catalyzes the transfer of electrons from one molecule, the reductant, to another, the oxidant).

– The second «1» specifies that it acts on the CH-OH group of donors.

– The third «1» indicates that NAD+ or NADP+ is the acceptor.

– The final «1» is the specific serial number for this enzyme in its group, which is given the recommended name of alcohol dehydrogenase.

 

See? That wasn’t so hard.

Now let’s mix it up a little and look at another one.

 

EC 6.3.2.24

– Here, the first «6» indicates that it’s a ligase (an enzyme that catalyzes the joining of two molecules by forming a new chemical bond).

– The second number is «3» and specifies that the type of bonds that forms are carbon-nitrogen.

– The third number is «2» and specifies that the types of molecules that are bonded are an acid and an amino-acid.

– The last number is «24», the specific serial number for this enzyme in its group, and specifies that the two amino acids bond catalyzed by the enzyme is tyrosine—arginine, which gives the recommended name of tyrosine—arginine ligase.

 

Are you starting to get the hang of it? This classification is quite easy to follow. And, don’t worry, nobody expects you to know all these numbers by heart. You just need to understand the rationale behind. After all, EC numbers are here to make our lives easier!

 

Is there a complete enzyme classification by EC number?

Of course! People have always loved to bring order to chaos.

In this case, it is the IUBMB who provides an approved and updated enzyme classification list based on the EC number nomenclature: the ExplorEnz Enzyme Database.

The first edition was published in 1992. From then on, 29 more supplements or edits have been added to that initial version to register all the new enzymes discovered and modifications needed.

You can dive into its tree structure, expanding each list of the classes, subclasses, sub-subclasses and serial numbers, to have a deeper view of the entire classification.

 

How to find the enzyme that I’m searching for?

If you’ve ever had to find out if a certain reaction can be performed enzymatically, you may know the answer does not come easily navigating through the EC numbers database on your own.

This is a very common problem for industries or research groups considering the possibility of doing a chemical process with enzymes, who usually spend too much time searching in the literature or asking other colleagues.

In those cases, a quick and simple answer in the form of an EC number can save a lot of time and open the door to innovative projects that were long stuck.

At ZYMVOL we developed ZYMSCAN with that in mind: if a chemist wants to know if their reaction can be done enzymatically, they just have to submit their reaction and ZYMSCAN will confirm if it can be done with biocatalysis. In case it can, they will receive an email with the EC number of the enzyme family that matches the reaction.

 

As you can see, an EC number can be very useful not only in scientific literature, but also in applied chemistry; and now you know how to interpret it!


Introducing ZYMSCAN: the fastest way to check if there’s an enzyme for your reaction

COMPANY NEWS

Introducing ZYMSCAN:

the fastest way to check if there’s an enzyme for your reaction

Take me to ZYMSCAN

“Is there an enzyme for my process?”

We can’t tell you how many times we’ve been asked this question.

Trying to change your traditional chemical process for an improved biochemical one can be a pain. And this is not surprising at all, given how time-consuming the search of an enzyme for a specific reaction can be. 

The reality is, the individuals that could benefit most from applying enzymes are chemists. However, many are lacking the resources to answer the fundamental question: «Can I do this reaction with an enzyme?»

That’s why today we are happy to announce the launch of a new platform that will make your life easier: ZYMSCAN.

From now on, you are just a few clicks away from knowing if biocatalysis is right for you.

Oh, and have we mentioned that it is free?

Keep reading to learn more.

 

What is ZYMSCAN?

ZYMSCAN is a free, online tool for users who want to know if a certain reaction can be performed enzymatically without wasting time going through other methods.

 

How does it work?

Using ZYMSCAN is easy. Instead of spending weeks looking for your answer, just follow these three simple steps:

1) Fill out the form and submit the substrate and product of interest in SMILES format.

2) Zymvol will use its computational technology to scan the whole enzyme family landscape in search for the right fit. Our experts will accompany the process to certify the final answer.

3) Receive an email between 1-2 business days with the EC number of the enzyme family found.

Don’t worry, we’ve got you covered.
Just draw your molecule in the editor and we’ll create the SMILES for you.

Don’t know how a SMILES works? Check out some examples.

 

Who is ZYMSCAN for?

– Chemists, Innovation Managers or just about anyone who wishes to implement greener processes and products, but need to know first if it can be performed by enzymes.

– Academics or Universities whose research is related to enzymes.

 

Why did we create ZYMSCAN?

Let’s face it: the way many companies pursue biocatalysis today is not always efficient.

How much time do you waste searching for an enzyme? Going through the literature and asking for help, trying to find out if there’s one for the reaction you need?

We realized this was a common problem not only for our clients, but for many other companies looking to “go bio”.

With so many promising projects getting stuck in this first step, we knew it was important to launch a platform that was simple, quick, free of charge and supported by our experts.

If more companies can know from the get-go if there is an enzyme for their processes, we can speed up the transition to a better and greener chemistry everywhere.

Don’t know what reaction to try out?
Use our random SMILES generator!

 

Why does it take 1 business day to receive results?

To ensure optimal performance, ZYMSCAN’s results are obtained through a mix of automated computational work as well as human curation.

After sending your request, our proprietary Artificial Intelligence software does the search and rapidly matches the input substrate and product with an enzyme family. Afterwards, ZYMVOL experts take care of doing a final check on the information, to guarantee that the answer is as accurate as possible.

 

Will there be more updates coming soon?

ZYMSCAN is currently in beta phase, which means it is currently undergoing updates and improvements. As a result, you may encounter incomplete sections, minor disruptions, or changes in content.

We appreciate your patience while these updates are taking place. If you encounter any errors and wish to report them, please send us an email at info@zymvol.com.

Ready to try ZYMSCAN?


How enzymes improved… detergents

THE ZYMVOL BLOG

How enzymes improved... detergents

Detergents were probably one of the first products that showed the world that enzymes – those proteins that perfectly catalyze the chemical reactions in our bodies – could play an even bigger role in our daily lives.

When detergent manufacturers started incorporating enzymes into their formulations, consumers no longer needed high temperatures and long, fabric-damaging processes to completely clean off dirt from their clothes. 

Enzymes like proteases, amylases and lipases helped remove even the most difficult stains; and since these biocatalysts worked best under mild conditions, families could opt-out from using hot water and clothes would still get cleaned. 

Detergents have shown us that enzymes can revolutionize a whole industry; even to the point that, nowadays, we cannot imagine life without them.

There’s a lot to learn from this landmark of innovation. 

Let’s start from the beginning.

The Discovery

Our fresh laundry owes much to Otto Röhm, a German pharmacist from the late 19th century that, on December 11th 1913, patented the first detergent with enzymes: Burnus.

Otto was working for the Stuttgart city gasworks, where his job focused on leather treatment, when he realized that the trypsin extracted from pig’s pancreas could help dissolve clothes’ stains.

Even though “Burnus” didn’t succeed in the market and needed to be improved due to the low activity of the enzyme, it set a precedent for the future of modern cleaning supplies.

50 years later, enzyme detergents exploded in popularity and gradually introduced different types of enzymes to their formulations.

Since then, washing needs lower temperature, less water and diverse types of stains are removed efficiently, saving energy and time.

The success was such that, nowadays, the detergent business is one of the largest single market places for enzymes.

 

An enzyme for every stain

Detergent enzymes detach and degrade all kinds of dirt by breaking down molecules’ chemical bonds. They’re all known as hydrolases, because they need water to do such reactions. But did you know there’s a specific type of enzyme for every kind of dirt stain?

  • Proteases degrade stains composed of protein by breaking the peptide chains. Historically, first enzymes to be used in detergents.
  • Amylases break down starch-based stains and also contribute to maintaining white fabrics color.
  • Lipases attack fat-based stains hydrolyzing triglycerides to simpler fats that can be dissolved.
  • Pectinases eliminate stains from other types of complex sugars present in fruits, vegetables, jams and juices.
  • Mannanases remove stains from milkshakes or ice cream and personal hygiene products containing mannans, a type of complex sugars used as a thickening agent.
  • Cellulases improve cotton and linen fabrics cleanness and softness by breaking down cellulose fibers and hindering dirt particles attachment.

Nowadays, enzyme detergents profit from the combined effect of multi-enzyme systems and are more sustainable thanks to the substitution of traditional detergent components that are harmful for the environment.

But their impact goes way beyond removing stains from clothes.

  • Their use at industrial dishwashers promote leftovers decomposition, which protects machines and saves the recirculating water that used to need replacement more often before.
  • At hospitals, clothes stay white because proteases can remove blood from fabrics. But they also benefit from enzyme action for properly cleaning medical devices, which need different cleaning conditions than normal sterilization machines.
  • Safety at commercial kitchens wouldn’t be the same without lipases. These enzymes are great at removing fat from the floor, which not only cleans it, but helps prevent accidents.

Amazing, right?

However, if their versatility wasn’t enough, there’s one more reason why enzymes have become a powerful ally in the cleaning industry: they make detergents more sustainable.

 

How enzyme detergents help us go green

The need to reduce our environmental footprint without compromising efficiency has found in enzymes a great alternative to traditional chemical processes.

As proteins, enzymes are biodegradable and can substitute toxic and pollutant compounds, like phosphates and phosphonates, enhancing the protection of our health and the environment.

Plus, since biocatalysis can take place at mild conditions of temperature and pH, this can reduce the environmental impact and energy cost of the common use of washing machines.

Here are two clear cases of enzymes helping reduce the environmental impact of detergents:

 

Cold-active enzymes

We’ve all heard the same story: to get rid of difficult stains, you need to use hot water.

But, if you’ve spent a long time relying on this advice for your weekly washing, you might have realized that setting your machine to the highest temperature is not sustainable at all.

Sensitive fabrics might get damaged. Some might lose their color.

And worst of all, electricity bills skyrocket.

This is because, normally, most of the electricity a washing machine uses goes exclusively to heating the water.

But the good news is, for most types of dirt, you don’t need to use hot water to obtain good results. And even if you’re faced with a difficult stain, enzymes can help!

The introduction of cold-active enzymes in detergent products have allowed washing temperatures to be reduced from 60-40ºC to 30°C without compromising on cleanliness.

These enzymes come from psychrophilic microorganisms, which are found in cold regions like the Antarctic and Arctic, glaciers and/or deep sea sediments.

By washing at low temperature, CO2 saving potential in the United States and Europe alone is around 32 million tons annually, equal to the emissions of 8 million cars (OECD).

The only drawback? Natural cold-active enzymes are not always as abundant and stable as the industry needs.

But no worries: there’s evidence that protein engineering can genetically improve psychrophilic strains, enhancing enzymes and making the use of cold-active enzymes easier for everyone.

 

Compact detergents

Powder or liquid, applying it with a dispenser or in individual pods or tablets. No matter the format, over the years formulations have progressed to lower the volume of detergent needed for the same size of wash load.

Thanks to their high activity at low concentrations, tiny amounts of enzymes are enough for compact detergents to wash as efficiently as the others.

Besides, they do not disappear or lose their activity, but activate the reactions that ease dirt removal over and over again during wash time.

This has made possible for the average dosage of detergent to be reduced by 50% and achieve savings of 30 millions tons in Europe over the past two decades (AISE).

Plus, it doesn’t end there:

  • Smaller doses of detergent need less amount of water for washing and can eliminate the need for a pre-wash cycle, which leads to significant water savings.
  • Compact detergents fit in smaller packaging, which reduces the amount of materials needed for storage and transportation. 

Smaller packaging means more packages transported per vehicle along its life cycle. And less trips needed mean, of course, less CO2 emissions!

Can we make enzyme detergents… even better?

Here’s a tricky question: modern detergents contain enzyme mixes that include proteases to degrade protein stains. But, since enzymes are also proteins, how come they don’t degrade themselves?

The answer is simple. By adding “inhibitors” to detergent formulations, scientists can keep proteases deactivated when it’s stored, so that they only work when mixed with water.

But there’s a catch.

Most common inhibitors contain boron, which is toxic for plants and insects. Each time we do our laundry, we are slowly liberating this element into the environment through the discarded water.

So a pretty good solution -which is using enzymes to make detergents work better- is, at the same time, creating a problem we want to avoid: the pollution of our environment!

 

The IDEA-PS Project

At ZYMVOL, we’re proud to be part of a project that aims to solve this issue in the best way we know: using our computer power (and our team’s smart brains!).

Together with biochemical company CYGYC BIOCON (BIOKATAL), we’re developing a new computational platform that can help us search for more sustainable enzyme inhibitors for detergent formulations.

ZYMVOL Senior Researcher, Dr. Brian Jiménez Garcia, is the primary investigator leading this project, which we have named “IDEA-PS”*. You can read more about it in the news article we released last year.

As our colleague Dr. Jiménez Garcia points out: “We hope that with this software, we will be able to help scientists make more effective formulations that will save resources, energy and water”.

Isn’t that a great goal to work towards?

 


*IDEA-PS is funded by ACCIÓ Tecniospring INDUSTRY programme and MSCActions.