Visualizations#
- stream.visuals.visualize_topic_model(model, model_output=None, dataset=None, three_dim=False, reduce_first=False, reducer='umap', port=8050, embedding_model_name='paraphrase-MiniLM-L3-v2', embeddings_folder_path=None, embeddings_file_path=None, use_average=True)[source]#
Visualizes a topic model in 2D or 3D space, employing dimensionality reduction techniques such as UMAP, t-SNE, or PCA. This function facilitates an interactive exploration of topics and their associated documents or words.
- Parameters:
model (AbstractModel) – The trained topic model instance.
model_output (dict, optional) – The output of the topic model, typically including topic-word distributions and document-topic distributions. Required if the model does not have an ‘output’ attribute.
dataset (TMDataset, optional) – The dataset used for training the topic model. Required if the model does not have an ‘output’ attribute.
three_dim (bool, optional) – Flag to visualize in 3D if True, otherwise in 2D. Defaults to False.
reduce_first (bool, optional) – Indicates whether to perform dimensionality reduction on embeddings before computing topic centroids. Defaults to False.
reducer (str, optional) – Choice of dimensionality reduction technique. Supported values are ‘umap’, ‘tsne’, and ‘pca’. Defaults to ‘umap’.
port (int, optional) – The port number on which the visualization dashboard will run. Defaults to 8050.
embedding_model_name (str, optional) – Name of the embedding model used for generating document embeddings. Defaults to “all-MiniLM-L6-v2”.
embeddings_folder_path (str, optional) – Path to the folder containing precomputed embeddings. If not provided, embeddings will be computed on the fly.
embeddings_file_path (str, optional) – Path to the file containing precomputed embeddings. If not provided, embeddings will be computed on the fly.
- Returns:
Launches a Dash server that hosts the visualization dashboard, facilitating interactive exploration of the topic model.
- Return type:
None
- stream.visuals.visualize_topics(model, model_output=None, dataset=None, three_dim=False, reducer='umap', port=8050, embedding_model_name='paraphrase-MiniLM-L3-v2', embeddings_folder_path=None, embeddings_file_path=None, use_average=True)[source]#
Visualize topics in either 2D or 3D space using UMAP, t-SNE, or PCA dimensionality reduction techniques.
Args:
- model (AbstractModel): The trained topic model instance.
model_output (dict, optional): The output of the topic model, typically including topic-word distributions and document-topic distributions. Required if the model does not have an ‘output’ attribute. dataset (TMDataset, optional): The dataset used for training the topic model. Required if the model does not have an ‘output’ attribute. three_dim (bool, optional): Flag to visualize in 3D if True, otherwise in 2D. Defaults to False. reduce_first (bool, optional): Indicates whether to perform dimensionality reduction on embeddings before computing topic centroids. Defaults to False. reducer (str, optional): Choice of dimensionality reduction technique. Supported values are ‘umap’, ‘tsne’, and ‘pca’. Defaults to ‘umap’. port (int, optional): The port number on which the visualization dashboard will run. Defaults to 8050. embedding_model_name (str, optional): Name of the embedding model used for generating document embeddings. Defaults to “all-MiniLM-L6-v2”. embeddings_folder_path (str, optional): Path to the folder containing precomputed embeddings. If not provided, embeddings will be computed on the fly. embeddings_file_path (str, optional): Path to the file containing precomputed embeddings. If not provided, embeddings will be computed on the fly.
- Returns:
- None
The function launches a Dash server to visualize the topic model.
- stream.visuals.visualize_topics_as_wordclouds(model, model_output=None, max_words=50)[source]#
Visualize topics as word clouds.
- Parameters:
model – Trained topic model.
model_output (dict, optional) – If visualizing an OCTIS model, pass the model_output as arguments
max_words (int, optional) – Maximum number of words to display in each word cloud (default is 100).
- Raises:
AssertionError – If the model doesn’t have the necessary output for topic visualization.
- Returns:
- None
This function displays word clouds for each topic.