Visual ChatGPT

Visual ChatGPT is a version of the ChatGPT language model that has been trained to generate text-based responses to visual prompts. It uses computer vision algorithms to analyze and interpret images, and then generates text based on the content of the image.

To get started with Visual ChatGPT, you will need to provide it with an image that you want to generate a response for. This can be done by either uploading an image file or providing a URL to an image hosted online.

Once the image has been processed by the computer vision algorithms, Visual ChatGPT will generate a response in natural language based on the content of the image. This response can be used to answer questions, provide information, or generate captions for images.

To use Visual ChatGPT, you can interact with it through a chat interface, similar to how you would interact with a human chatbot. Simply provide the image and wait for Visual ChatGPT to generate a response. You can also provide additional context or information to help guide the response generated by the model.

Enable Visual ChatGPT on your Windows Machine

Folow the steps:

# clone the repo
git clone https://github.com/microsoft/visual-chatgpt.git

# Go to directory
cd visual-chatgpt

# download Ananconda & create a new environment
https://www.anaconda.com/products/distribution

conda create -n visgpt python=3.8

# activate the new environment
conda activate visgpt

#  prepare the basic environments
pip install -r requirements.txt

# Generate an API Key/Secret Key from your OpenAi.com 
(see image below)

# set your private OpenAI key (for Windows)
set OPENAI_API_KEY={Your_Private_Openai_Key}
set OPENAI_API_KEY=sk-...

# Start Visual ChatGPT !
# You can specify the GPU/CPU assignment by "--load", the parameter indicates which 
# Visual Foundation Model to use and where it will be loaded to
# The model and device are separated by underline '_', the different models are separated by comma ','
# The available Visual Foundation Models can be found in the following table
# For example, if you want to load ImageCaptioning to cpu and Text2Image to cuda:0
# You can use: "ImageCaptioning_cpu,Text2Image_cuda:0"

# Advice for CPU Users
python visual_chatgpt.py --load ImageCaptioning_cpu,Text2Image_cpu

# Advice for 1 Tesla T4 15GB  (Google Colab)                       
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,Text2Image_cuda:0"
                                
# Advice for 4 Tesla V100 32GB                            
python visual_chatgpt.py --load "ImageCaptioning_cuda:0,ImageEditing_cuda:0,
    Text2Image_cuda:1,Image2Canny_cpu,CannyText2Image_cuda:1,
    Image2Depth_cpu,DepthText2Image_cuda:1,VisualQuestionAnswering_cuda:2,
    InstructPix2Pix_cuda:2,Image2Scribble_cpu,ScribbleText2Image_cuda:2,
    Image2Seg_cpu,SegText2Image_cuda:2,Image2Pose_cpu,PoseText2Image_cuda:2,
    Image2Hed_cpu,HedText2Image_cuda:3,Image2Normal_cpu,
    NormalText2Image_cuda:3,Image2Line_cpu,LineText2Image_cuda:3"
OpenAI – API Keys

Demo

The Visual ChatGPT demo should now be running on your local machine. You can open a web browser and navigate to http://localhost:7868 to interact with the model.

It’s important to note that while Visual ChatGPT can generate responses based on visual prompts, it is not perfect and may sometimes generate inaccurate or inappropriate responses. As with any AI model, it is important to use it responsibly and carefully evaluate the accuracy of its responses.

Cheers!

Reference: https://github.com/microsoft/visual-chatgpt