When will you find overfit condition of your model in TensorFlow?
NUM_WORDS = 10000 (train_data, train_labels), (test_data, test_labels) = keras.datasets.imdb.load_data(num_words=NUM_WORDS) def multi_hot_sequences(sequences, dimension): # Create an all-zero matrix of shape (len(sequences), dimension) results = np.zeros((len(sequences), dimension)) for i, word_indices in enumerate(sequences): results[i, word_indices] = 1.0 # set specific indices of results[i] to 1s return results train_data = multi_hot_sequences(train_data, dimension=NUM_WORDS) test_data = multi_hot_sequences(test_data, dimension=NUM_WORDS)
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/imdb.npz 17465344/17464789 [==============================] - 0s 0us/step
Let’s look at one of the resulting multi-hot vectors. The word indices are sorted by frequency, so it is expected that there are more 1-values near index zero, as we can see in this plot:
[<matplotlib.lines.Line2D at 0x7f2a85ff6a20>]