Soooo… day 4! Exciting!

Yesterday I didn’t do any work on PyTorch, so today I decided to simply translate my previous RNN (Recursive Neural Network) created with Tensorflow and Keras into PyTorch code.

I thought this would have been a trivial task, but turned out to be quite challenging! I don’t know why people say that PyTorch is easier for begineers compared to Tensorflow, I don’t think that’s true, especially if you don’t have previous machine learning experience.

I spent a day re-implementing that simple Tensorflow model in PyTorch. By the end I managed to learn how to define new models and create training and evaluation loops. I think the biggest difference between the two frameworks is that Tensorflow abstracts a lot of the logic involved in the model training. That could be a good thing if you want to get simple things done quickly, but implementing a model in Pytorch forced me to think and understand how my model works, what input and hidden layers I needed ot use and I actually designed my own training loop.

Lessons Learned

  • After today I have a much better understanding of how models, training and evaluation loops work. The PyTorch approach allows greater flexibility to people that want to create automations around the model definition and training

Tensorflow vs PyTorch

For completeness I thought to share the two implementations so you can see the difference for yourself

The Tensorflow way

model = tf.keras.Sequential([
  tf.keras.layers.Bidirectional(
    tf.keras.layers.LSTM(units=128, input_shape=[x_train.shape[1],x_train.shape[2]])
  ),
  tf.keras.layers.Dropout(.2),
  tf.keras.layers.Dense(128, activation='relu'),
  tf.keras.layers.Dense(y_train.shape[1], activation='softmax'),
])

model.compile(optimizer='adam',
              loss=tf.keras.losses.CategoricalCrossentropy(),
              metrics=['accuracy'])

hystory = model.fit(x_train,
          y_train,
          validation_split=0.3,
          epochs=15)

The PyTorch way

class LstmClassification(torch.nn.Module):
    def __init__(self, lstm_size: int, num_layers: int = 1):
        super().__init__()
        self.num_layers = num_layers
        self.lstm_size = lstm_size
        self.lstm = nn.LSTM(
            input_size=5,
            hidden_size=lstm_size,
            bidirectional=True,
            batch_first=True,
            num_layers=num_layers,
        )
        hidden_size = self.lstm_size * 2  # bidirectional
        self.sequential = nn.Sequential(
            nn.Dropout(0.2),
            nn.Linear(hidden_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, 2),
        )

    def forward(self, x):
        _x = x.float()
        h0 = torch.zeros(self.num_layers * 2, x.size(0), self.lstm_size).to(device)
        c0 = torch.zeros(self.num_layers * 2, x.size(0), self.lstm_size).to(device)
        out, _ = self.lstm(_x, (h0, c0))
        out = out[:, -1, :]

        out = self.sequential(out)
        return out


def train(
    train_loader: DataLoader,
    test_loader: DataLoader,
    lstm_size: int = 128,
    lstm_layers: int = 1,
    learning_rate: float = 1e-3,
    batch_size: int = 12,
    epochs: int = 1,
):

    model = LstmClassification(lstm_size=lstm_size, num_layers=lstm_layers).to(device)
    print("Using model: ", model)

    loss_fn = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(params=model.parameters(), lr=learning_rate)
    total_steps = len(train_loader)
    for epoch in range(epochs):
        model.train()
        for i, (input_seq, target_labels) in enumerate(train_loader):
            input_seq = input_seq.to(device)
            target_labels = target_labels.to(device)

            outputs = model(input_seq)  # forward pass
            loss = loss_fn(outputs, target_labels)  # calculate loss

            optimizer.zero_grad()  # reset gradients
            loss.backward()  # calculate gradients
            optimizer.step()  # update weights

            if (i + 1) % 100 == 0:
                print(
                    f"Epoch [{epoch+1}/{epochs}], Step [{i}/{total_steps}], Loss: {loss.item():.4f}"
                )

        model.eval()
        with torch.no_grad():
            n_correct = 0
            n_samples = 0

            for input_seq, target_labels in test_loader:
                input_seq = input_seq.to(device)
                target_labels = target_labels.to(device)

                outputs = model(input_seq)
                _, predicted_labels = torch.max(outputs.data, 1)
                n_samples += target_labels.size(0)
                n_correct += (predicted_labels == target_labels).sum().item()

            acc = 100.0 * n_correct / n_samples
            print(f"Accuracy: {acc:.2f} %")

    return model