Here's an example of how you can create a foundational neural network using Python and the Keras library, based on the described architecture, specifically for handling text data such as documents, social media posts, and customer reviews:
```python
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Embedding, Conv1D, GlobalMaxPooling1D, Dense, Dropout
# Define the input layer
text_input = Input(shape=(None,), dtype='int32', name='text_input')
# Define the embedding layer
embedding_layer = Embedding(input_dim=vocab_size, output_dim=embedding_dim, input_length=max_sequence_length)(text_input)
# Define the convolutional layers
conv_layers = []
for filter_size in [3, 4, 5]:
conv = Conv1D(filters=num_filters, kernel_size=filter_size, activation='relu')(embedding_layer)
pool = GlobalMaxPooling1D()(conv)
conv_layers.append(pool)
# Concatenate the convolutional layers
concat = concatenate(conv_layers)
# Define the fully connected layers
fc1 = Dense(units=hidden_units, activation='relu')(concat)
fc1 = Dropout(dropout_rate)(fc1)
fc2 = Dense(units=hidden_units, activation='relu')(fc1)
fc2 = Dropout(dropout_rate)(fc2)
# Define the output layer
output = Dense(units=num_classes, activation='softmax')(fc2)
# Create the model
model = Model(inputs=text_input, outputs=output)
# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
```
Explanation of the code:
1. We start by importing the necessary layers and models from the Keras library.
2. We define the input layer (`text_input`) as an `Input` layer with a variable-length sequence of integers, representing the tokenized text data.
3. We create an embedding layer (`embedding_layer`) to convert the integer-encoded text into dense vector representations. The `input_dim` parameter represents the size of the vocabulary, `output_dim` represents the dimensionality of the embedding vectors, and `input_length` represents the maximum sequence length.
4. We define multiple convolutional layers (`conv_layers`) with different filter sizes (3, 4, and 5) to capture local patterns and features in the text data. Each convolutional layer is followed by a global max-pooling layer to extract the most important features.
5. We concatenate the outputs of the convolutional layers (`concat`) to combine the extracted features.
6. We define two fully connected layers (`fc1` and `fc2`) with a specified number of hidden units and ReLU activation function. Dropout regularization is applied to prevent overfitting.
7. We define the output layer (`output`) with the number of units equal to the number of classes (num_classes) and a softmax activation function for multi-class classification.
8. We create the model by specifying the input and output layers using the `Model` class.
9. Finally, we compile the model with an appropriate optimizer (e.g., Adam), loss function (e.g., categorical cross-entropy), and evaluation metric (e.g., accuracy).
Note: Make sure to replace `vocab_size`, `embedding_dim`, `max_sequence_length`, `num_filters`, `hidden_units`, `dropout_rate`, and `num_classes` with appropriate values based on your specific text classification task and dataset.
This foundational neural network architecture can be fine-tuned and adapted for various text classification tasks by adjusting the hyperparameters, adding or modifying layers, and training on domain-specific datasets.
To train the model, you would need to preprocess your text data, tokenize it, and convert it into integer sequences. You can then use the `fit()` method to train the model on your dataset, specifying the appropriate batch size and number of epochs.
After training, you can evaluate the model's performance on a validation or test set using the `evaluate()` method and make predictions on new text data using the `predict()` method.