1-click Similar Image Generator using AI

Expand Stock Photo Image Library using GPT Vision API and Segmind’s Stable Diffusion API

Ramsri Goutham
8 min readNov 17, 2023
Image by Author using Pixabay’s Free Stock Image

Introduction

In the ever-evolving world of digital content creation, finding an image that precisely fits your needs is invaluable. However, a given stock photo that aligns well with your content might be overused and ubiquitous across the internet. Using the same image can be uninspiring and not SEO-friendly. This is where our solution comes into play. In this blog post, we will explore how we can innovatively combine the GPT Vision API and Segmind’s image-to-image API to create a tool that takes a sample image as input and generates similar images in just one click, without any manual intervention.

This tool is designed to revolutionize the way you work with images. It simplifies the process of generating images similar to any stock photo of your choice. The best part? There’s no need to worry about the copyright of the stock photo image. Our tool uses it merely as inspiration, generating images that are similar to the original but are fully copyright-free and created using AI.

Here’s how it works: The GPT Vision API begins by analyzing and captioning your chosen image, providing a detailed description of its contents. This caption, along with the image itself, is then inputted into Segmind’s Stable Diffusion image-to-image model. The result is a set of similar images, crafted with precision and creativity, tailored to match your original selection. Whether you’re a graphic designer, marketer, or content creator, this tool is poised to be a game-changer in your creative toolkit, offering endless possibilities with just a click.

By the end of this tutorial, we are going to build a simple Gradio UI where you can upload any image and generate similar images in 1-click as shown in the image below!

Image by Author with Gradio UI

Step 1: Signup for services and get the API Keys

To get started, you’ll need to sign up for two important services: Segmind and OpenAI. We are going to use OpenAI’s GPT-4 Vision API to caption the uploaded image and use Segmind’s image-to-image API to generate a similar image to a given image given by the text prompt (caption).

Get OpenAI’s API Key

Go to platform.openai.com
Sign up or log in.
Click on the top right to go to “View API Keys”.
Create a new secret key if you don’t have one already and save it securely.
If necessary add your card to not run out of credits.

Get Segmind’s API Key

Go to Segmind.com and Log in/Sign up
Click on the top right to go to the console.
Once in the console, click on the “API keys” tab and “Create New API Key”.
You get a few free credits daily but for interrupted usage, you can go to “billing” and “add credits” by paying with your card.

If you want to know how much cost each API call incurs for Segmind, you can go to the corresponding model’s pricing tab and see it. An example of SSD Image2Image pricing is shown here.

Step 2: The Code

The Google Colab notebook containing the full code can be found here.

Now we are going to use Google Colab’s newest feature: Storing Secrets!

Once you are in the Colab Notebook click the “key icon” on the left which opens up secrets where you can securely store API keys to use across Google Colab instances without having to enter manually again and again.

Create key and value pairs as shown below with values coming from your previous step above where you created API keys!

Now let’s start our coding part!

Install the necessary Python libraries and retrieve your API keys easily from secrets as shown below! Just do some sanity checks by printing the length of the secrets without actually printing their values!

from google.colab import userdata
OpenAIkey = userdata.get('OpenAI')
SegmindAPIKey = userdata.get('Segmind')

print (len(OpenAIkey))
print (len(SegmindAPIKey))

Now let’s define our main functions getbase64image, getImageCaption, and generate_images

getbase64image(imagepath):
This function converts an image file into a Base64 encoded string. It opens an image file from the given path, reads the file’s binary data, and encodes it to Base64, a text-based representation of the binary data.

getImageCaption(b64image): This function receives a Base64-encoded image and sends it to OpenAI's ChatGPT API, specifically to the "gpt-4-vision-preview" model. It requests a concise, descriptive caption suitable for use on a stock photo website. The function then extracts and returns this caption from the API's response.

generate_images(base64image, imagecaption, count): This function is designed to generate a specified number (count) of similar images based on an input image (provided in Base64 format) and a text caption. It uses the Segmind platform's image-to-image generation API. The function sends the base image and the caption to the API, along with specific instructions (negative prompt) to avoid common image generation errors (like blurry images or poorly drawn elements). The API then returns new images, which are saved and the filenames are returned by the function.

import base64
import requests
import random
import io
from PIL import Image
import os

chatgpt_url = "https://api.openai.com/v1/chat/completions"
chatgpt_headers = {
"content-type": "application/json",
"Authorization":"Bearer {}".format(OpenAIkey)}


def getbase64image(imagepath):
with open(imagepath, "rb") as imagefile:
return base64.b64encode(imagefile.read()).decode('utf-8')

def getImageCaption(b64image):
chatgpt_payload = {
"model":"gpt-4-vision-preview",
"messages":[
{
"role": "user",
"content": [
{"type": "text", "text": "Create a concise yet fully descriptive one sentence caption for the image to use on a Stock Photo Website"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{b64image}",
},
},
],
}
],
"max_tokens":300,
}

# Make the request to OpenAI's API
response = requests.post(chatgpt_url, json=chatgpt_payload, headers=chatgpt_headers)
response_json = response.json()
caption = response_json['choices'][0]['message']['content']
return caption

def generate_images(base64image, imagecaption, count):
SegmindAPIKey = userdata.get('Segmind')
url = "https://api.segmind.com/v1/ssd-img2img"
generated_image_filenames = []

for i in range(count):
currentseed = random.randint(1000, 1000000)

# Request payload
data = {
"image": base64image,
"prompt": imagecaption + ", stock photo",
"negative_prompt": "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft",
"samples": 1,
"scheduler": "UniPC",
"num_inference_steps": 30,
"guidance_scale": "7.5",
"seed": currentseed,
"strength": "0.9",
"base64": False
}

response = requests.post(url, json=data, headers={'x-api-key': SegmindAPIKey})

if response.status_code == 200 and response.headers.get('content-type') == 'image/jpeg':
image_data = response.content
image = Image.open(io.BytesIO(image_data))
image_filename = f"image{i + 1}.jpg"
image.save(image_filename)
generated_image_filenames.append(image_filename)

return generated_image_filenames

Now let’s call the above-defined functions with a sample image!

Image from Pixabay

Download a sample image from Pixabay eg: Sparrow-bird and rename it however, you want (eg; sparrow.jpg) and upload it to the Google Colab instance.

# https://pixabay.com/photos/white-throated-sparrow-sparrow-bird-8377444/
image = 'sparrow.jpg'

base64image = getbase64image(image)
imagecaption = getImageCaption(base64image).strip('\"')
print (imagecaption)

count = 3
image_filenames = generate_images(base64image, imagecaption, count)
for filename in image_filenames:
print(f"Generated image: {filename}")

A sample image caption for the above image could be: Autumn Serenity: A Sparrow Perched on a Branch Amidst Fall Colors.

Now we can visualize the generated similar images using ipyplot library like this:

import ipyplot
ipyplot.plot_images(image_filenames, max_images=6, img_width=400,force_b64=True)
Image Generated by AI

Step 3: Creating a UI with Gradio

Now let’s convert this code into a UI with Gradio library so that we can combine these steps and do everything in 1-click

import gradio as gr
import requests
import base64
import random
import io
from PIL import Image

# Define your functions here
def getbase64image(image):
buffered = io.BytesIO()
image.save(buffered, format="JPEG")
return base64.b64encode(buffered.getvalue()).decode('utf-8')

def getImageCaption(b64image):
chatgpt_payload = {
"model":"gpt-4-vision-preview",
"messages":[
{
"role": "user",
"content": [
{"type": "text", "text": "Create a concise yet fully descriptive one sentence caption for the image to use on a Stock Photo Website"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{b64image}",
},
},
],
}
],
"max_tokens":300,
}

# Make the request to OpenAI's API
response = requests.post(chatgpt_url, json=chatgpt_payload, headers=chatgpt_headers)
response_json = response.json()
caption = response_json['choices'][0]['message']['content']
return caption


def generate_images(base64image, imagecaption, count):
SegmindAPIKey = userdata.get('Segmind')
url = "https://api.segmind.com/v1/ssd-img2img"
generated_image_filenames = []

for i in range(count):
currentseed = random.randint(1000, 1000000)

# Request payload
data = {
"image": base64image,
"prompt": imagecaption + ", stock photo",
"negative_prompt": "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft",
"samples": 1,
"scheduler": "UniPC",
"num_inference_steps": 30,
"guidance_scale": "7.5",
"seed": currentseed,
"strength": "0.9",
"base64": False
}

response = requests.post(url, json=data, headers={'x-api-key': SegmindAPIKey})

if response.status_code == 200 and response.headers.get('content-type') == 'image/jpeg':
image_data = response.content
image = Image.open(io.BytesIO(image_data))
image_filename = f"image{i + 1}.jpg"
image.save(image_filename)
generated_image_filenames.append(image_filename)

return generated_image_filenames

def process_image(uploaded_image, count):
base64image = getbase64image(uploaded_image)
imagecaption = getImageCaption(base64image).strip('\"')
generated_images = generate_images(base64image, imagecaption, count)
return imagecaption, generated_images


# Gradio interface
with gr.Blocks() as app:
gr.Markdown("## 1-Click Similar Image Generator using AI")
gr.Markdown("Upload an image to generate similar images using AI.")

with gr.Row():
with gr.Column():
image_input = gr.Image(type="pil")
count_dropdown = gr.Dropdown(label="Number of Images", choices=[1, 2, 3, 4, 5],value=3)
btn_genererate_images = gr.Button('Generate Images')
with gr.Column():
gallery = gr.Gallery(label="images", columns=3,preview=True,allow_preview=True,show_download_button=True)
textbox = gr.Textbox(label = "AI Generated Caption")

btn_genererate_images.click(fn=process_image, inputs=[image_input, count_dropdown], outputs=[textbox,gallery])

# Run the app
app.launch(debug=True)

This creates a simple gradio UI where you can upload any image, choose the count of how many similar images you want, and click on “Generate images” button which shows the output images as a gallery along with captions on the right side.

Gradio UI

Conclusion

Say goodbye to the limitations of overused stock photos and welcome a realm of unique, AI-generated visuals, inspired by stock photos but distinctively original. Hope you had a great learning experience where we explored a practical use-case of AI!

Happy AI exploration and if you loved the content, feel free to follow me on Twitter for daily AI content!

--

--