Neural Machine Translation With Tensorflow: Training

Reading Time: 3 minutes

Hello everyone, I’m back. Welcome to the third post of Neural Machine Translation with Tensorflow (NMTwT) series. In previous posts, we walked through the data preparation and model creation, which is great because it means that we had done all the hard stuff. Today we’re gonna glue things up to get the model up and “training”.


To get the best out of this post, I highly recommend that you have read my previous posts of this series. Links are as follow:

Let’s get started!

Prepare for training

Before being able to train, we need to create a training op, which is the mechanism to compute gradients and apply them to the model’s parameters.

It’s obvious that we need a loss op to compute gradients from. We also need a threshold value for gradient clipping and a learning rate to control the convergence. Let’s create a method like below:

We started off by getting the global step value. We’re gonna need it to apply gradient clipping and schedule the learning rate.

Next, we will get the model’s parameters (trainable variables), compute the gradients and clip the gradient values which exceeds the threshold:

Finally, we will create an instance of Adam optimizer and apply the clipped gradients:

Now we have all the ingredients we need for the training to begin. Let’s go ahead and call them out. We will begin with the vocabularies:

Next, we will call the create_input_data method which we created before to get the input data:

Well, I love the feeling when all the necessary tools are ready on the table. So great that we don’t have to do anything but use. I also heard that we have done with the network. Fantastic!

Can you guess what we’re gonna do next? You’re right. Let’s use the create_train_op method we created above:

The last thing to do in the preparation step is to set up a Session. We need to initialize the networks’ variables, as well as lookup tables (since we used a lot of lookup ops) and lastly, the iterator for the dataset:

And that’s it. We are now ready to dive into the training loop.

The training loop

It’s time we trained our model. Let’s first create a Saver to save training checkpoints later on. We can also use it to check if any previous checkpoint exists. Imagine that you had accidentally to stop the training. It would be convenient if the model can start from the last checkpoint, right?

Now we will go into the loop. First off, let’s check if the current global step exceed the number of training iterations.

Next, we’re gonna execute the training op. We also need the loss value, the source sequences, target sequences and the predicted sequences so let’s run those ops too:

Now, the model is up and training. We want to periodically print out the loss value. Also, we want to know how well the model can translate, so let’s output the source sequence, the target sequence, and the predicted sequence too.

A small note though, the sequences come in shape of (sequence length, batch size) and we only pick up one random sequence from those (it would be tedious to check them all). Another note: don’t forget to convert indices back to words.

Next, let’s trim the part after the stop token of each sequence. Then print them out for the world to see 😉

The final step to do is to save the training checkpoints. You can choose to either save every one epoch, two epochs or 5000 iterations, etc.

And that is the training loop. There’s no more code we need to write. It’s time to enjoy the learning process!

Training results

Below are a few of my training results along the way. It’s easy to see that the model still had hard time dealing with long sequences. We will address this problems in future posts.

Step 2200: loss 88.0679
b’ Nó trông như thế này này .’
b’ This is what that looked like .
b’It ‘s like this .

Result at step 2200

Step 5300: loss 81.2391
b’ Đây là 1 ví dụ tái xanh khác nữa .’
b’ This is another <unk> example . ‘
b’This is a very interesting example .

Result at step 5300

Step 6100: loss 93.2339
b’ Đột nhiên tiếng nói không còn nghe hiền lành nữa , và khi cô ấy yêu cầu tôi tìm sự chăm sóc y tế , tôi nghiêm túc làm theo , và đó là sai lầm thứ hai .’
b’ Suddenly the voice didn ‘t seem quite so benign anymore , and when she insisted that I seek medical attention , I , and which proved to be mistake number two . ‘
b’Ladies and gentlemen , it was a , and I was going to be a , and I was going to be a , and I was going to be a . ‘

Result at step 6100

Step 7500: loss 67.5791
b’ Nhưng bố mẹ tôi– tôi nghĩ 1 phần lý do họ có thể công nhận nó là vì họ không hiểu .’
b’ But my parents — I think part of why they kind of are able to appreciate it is because they don ‘t understand it . ‘
b’But my parents are doing that we can ‘t have to do a lot of people . ‘

Result at step 7500

Step 11600: loss 84.5231
b’ Cái mà chúng ta đang tìm kiếm là cái mà chúng ta muốn là chìa khoá dẫn đến khả năng của sự vật .’
b’ What we ‘re finding is that what we want is access to the capacities of things . ‘
b’What we ‘re doing is we can change the world to make the difference . ‘

Result at step 11600

Final words

Great job today guys! Throughout the NMTwT series, we are now able to create a machine translation model. Together we have walked through the data preparation step, the model creation step and how to create the training loop. That was a tremendous amount of work so you can be proud of yourselves.

As you might have seen in the training results above. The model doesn’t work well with long sentences and that’s what we will try to solve in the next post of this series: maybe we need some ATTENTION.

And that’s it for today. Thank you for reading and don’t hesitate to tell me if you come across any problem. I’ll see you soon!


  • Sequence To Sequence paper: link
  • Tensorflow’s NMT repository: link

Trung Tran is a software developer + AI engineer. He also works on networking & cybersecurity on the side. He loves blogging about new technologies and all posts are from his own experiences and opinions.

Leave a reply:

Your email address will not be published.