Generate Storybook Illustrations from any text using AI — NextJS 14 Template with OpenAI and Segmind’s API

A functional template with NextJS 14, Tailwind CSS to integrate into your production apps

Ramsri Goutham
Stackademic

--

Image by Author

Introduction

Illustrations are crucial in children’s books, capturing the imagination of young readers and enriching the story. However, drawing these illustrations can be a lengthy and demanding task, particularly for authors and storytellers lacking artistic expertise. This is where generative AI becomes invaluable. The ability to automatically generate illustrations for children’s stories is not just beneficial but also revolutionary. It enables storytellers to easily add colorful and customized images to their stories, improving the reading experience and helping them to present their creative work more effectively to the world.

In this blog post, we’ll see how to use AI to automatically generate illustrations for children’s stories with just a click of a button. This allows us to create custom images tailored to the story instead of using random stock photos. We’ll use OpenAI’s ChatGPT API and Segmind’s Stable Diffusion XL (SDXL) to extract illustration prompts from text and generate images.

We will have a ready-to-use UI and template built with NextJS 14, React, Tailwind CSS, and SwiperJS (a library for image navigation).

Let’s get started!

Code

The full code for the NextJS template can be found at this Github Repo: Story Illustrations NextJS Template

In order to run this, sign up for two services Segmind and OpenAI, and get their API keys. Place the API keys in a local environmental variables file (.env.local) if you are running locally or add them as environment variables if you deploy this on Vercel etc

.env.local

OPENAI_API_KEY=sk-xxxx
SEGMIND_API_KEY=SG_xxxxx

Then run the command:

npm run dev

Now let’s understand the two main files of this repository.

The UI (page.tsx)

The frontend UI code is located at app/page.tsx

The UI for generating illustrations

This code is a React component for a webpage that allows users to generate AI-powered storybook illustrations. Users can input text and select an illustration style from options like Watercolor, Comic Book, or Line Art, and specify the number of images to generate.

The component includes a text area for input, dropdowns for style and image count selection, and a submit button. After submission, the component displays loading animation and then renders the generated images using Swiper, a popular slider library, with options to view thumbnails and download the images.

"use client";

import { useState,useEffect } from 'react';
import ReactLoading from 'react-loading';
// import Header from '@/components/Header'
import Image from 'next/image';
import DownloadIcon from '@mui/icons-material/GetApp';
import { Swiper, SwiperSlide } from 'swiper/react'
import { FreeMode, Navigation, Thumbs } from 'swiper/modules'
import 'swiper/css'
import 'swiper/css/free-mode'
import 'swiper/css/navigation'
import 'swiper/css/thumbs'

export const revalidate = 0;

export default function Home() {
const [selectedOption, setSelectedOption] = useState('watercolor');
const [isLoading, setIsLoading] = useState(false);



const [textBoxValue, setTextBoxValue] = useState("Elon Musk has shown again he can influence the digital currency market with just his tweets. After saying that his electric vehicle-making company Tesla will not accept payments in Bitcoin because of environmental concerns, he tweeted that he was working with developers of Dogecoin to improve system transaction efficiency. \n\nFollowing the two distinct statements from him, the world's largest cryptocurrency hit a two-month low, while Dogecoin rallied by about 20 percent. The SpaceX CEO has in recent months often tweeted in support of Dogecoin, but rarely for Bitcoin. In a recent tweet, Musk put out a statement from Tesla that it was concerned about the rapidly increasing use of fossil fuels for Bitcoin (price in India) mining and transaction, and hence was suspending vehicle purchases using the cryptocurrency. \n\nA day later he again tweeted saying, To be clear, I strongly believe in crypto, but it can't drive a massive increase in fossil fuel use, especially coal. It triggered a downward spiral for Bitcoin value but the cryptocurrency has stabilised since. A number of Twitter users welcomed Musk's statement. One of them said it's time people started realising that Dogecoin is here to stay and another referred to Musk's previous assertion that crypto could become the world's future currency.");
const [outputValue, setOutputValue] = useState(null);


const [wordCount, setWordCount] = useState(0);
const [submitting, setIsSubmitting] = useState(false);

const [numImages, setNumImages] = useState(2);
const [thumbsSwiper, setThumbsSwiper] = useState(null);




const handleNumImagesChange = (event) => {
setNumImages(parseInt(event.target.value, 10));
};

const handleDownload = (imageIndex) => {
// Fetch the image from the outputValue state using the index
const base64Image = outputValue[imageIndex];
// Create a link and set the URL using the base64Image data
const link = document.createElement("a");
link.href = `data:image/png;base64,${base64Image}`;
link.download = `illustration-${imageIndex}.png`;
// Append the link to the body, click it, and then remove it
document.body.appendChild(link);
link.click();
document.body.removeChild(link);
};


const DownloadButton = ({ onDownload, imageIndex }) => (
<button
onClick={() => onDownload(imageIndex)}
className="absolute top-2 right-2 bg-gray-200 bg-opacity-50 rounded-full p-2 hover:bg-gray-400 transition duration-300 ease-in-out"
aria-label="Download Image"
>
<DownloadIcon className="text-gray-600 hover:text-gray-800" /> {/* Use the MUI Icon */}
</button>
);

const calculateWordCount = (text) => {
const words = text.trim().split(/\s+/);
return words.length;
};

useEffect(() => {
const count = calculateWordCount(textBoxValue);
setWordCount(count);
}, []); // Empty dependency array to run the effect only once



const handleChange = (event) => {
setSelectedOption(event.target.value);
};


useEffect(() => {
if (outputValue) {
// setTotalCount(outputValue.length);
}
}, [outputValue]);

const handleInputChange = (event) => {
const inputValue = event.target.value;
setTextBoxValue(inputValue);
const count = calculateWordCount(inputValue);
setWordCount(count);
};


const handleClick = async () => {

console.log("entered handleclick")


// Reset states
setOutputValue(null);
setIsSubmitting(false);

// setTotalCount(0);

setIsLoading(false);
try {



setIsLoading(true);

var requestOptions = {
method: 'POST',
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
"text": textBoxValue,
"style": selectedOption,
"imagecount": numImages
}),
cache: 'no-cache'
};

console.log("requestOptions ", requestOptions)

const response = await fetch("/api/generate-story-illustrations", requestOptions);
// console.log('response ',response);

const newData = await response.json();
console.log('newData ', newData);

// Check if newData has a status property and if it's 429
if (response && response.status === 429) {
toast.error(newData.message);
return; // Exit from the function
}

setOutputValue(newData.images);



} catch (error) {
setOutputValue("An error occurred, please try again later.");
} finally {
setIsLoading(false);
}
};

return (
<div className="
bg-white-900
rounded-lg
h-full
w-full
overflow-hidden
overflow-y-auto
p-4
ml-8
mr-8
">


<div className="flex flex-col items-center justify-center text-xl font-bold mb-4 text-black-500 md:text-2xl mr-4">
Generate Storybook Illustrations for children's stories using AI.
</div>

<div className="flex flex-col items-center justify-center text-l font-bold mb-4 text-black-500 md:text-xl mr-4">
Use the dropdown selector to pick different illustration styles like Line Art, WaterColor, Comic Book etc
</div>

<div className="flex flex-col items-center justify-center text-md mb-4 text-black-500 md:text-lg mr-4">
Generate HD images at a 1024 x 1024 pixel resolution, ready for commercial use.
</div>


<div className="flex w-full">

<div className="w-1/2 pr-2">


<div className={`text-md text-black-500`}>
Suggested text length: Upto 10,000 words. Supports English.
</div>

<div className={`text-sm font-semibold mb-2 ${wordCount >= 1 && wordCount <= 10000 ? 'text-green-700' : 'text-red-500'}`}>
<br />
Word Count: {wordCount}
</div>

<textarea
className="w-full h-[35vh] p-2 border rounded"
value={textBoxValue}
onChange={handleInputChange}
/>

<div className="w-full mt-2">
<label htmlFor="Style" className="block text-sm font-medium text-gray-700">Illustration Style</label>
<select
id="Style"
name="Style"
className="w-full p-2 mt-2 border rounded"
value={selectedOption}
onChange={handleChange}
style={{ maxWidth: '100%' }}
>
<option value="watercolor">Watercolor</option>
<option value="comic book">Comic Book</option>
<option value="line art">Line Art</option>
<option value="kawaii">Kawaii</option>

</select>
</div>

<div className="w-full mt-2">
<label htmlFor="Style" className="block text-sm font-medium text-gray-700">Image Count</label>
<select
id="numImages"
name="numImages"
className="w-full p-2 mt-2 border rounded"
value={numImages}
onChange={handleNumImagesChange}
style={{ maxWidth: '100%' }}
>
{[...Array(3).keys()].map((_, i) => (
<option key={i} value={i + 1}>
{i + 1}
</option>
))}

</select>
</div>

<button
className={`w-full mt-2 p-2 text-white rounded 'bg-green-500' }`}
onClick={
handleClick
}
style={{
backgroundColor: '#1a8a2a',
cursor: 'pointer',
opacity: '1',
}}
>
Submit
</button>




</div>
<div className="w-1/2 pl-2 relative overflow-y-auto">
{isLoading && (
<div
style={{
position: 'absolute',
top: '50%',
left: '50%',
transform: 'translate(-50%, -50%)',
}}
>
<ReactLoading type="spin" color="#A9A9A9" />
</div>
)}



{outputValue && (
<div className="mt-8 mx-4">
{/* Thumbnail */}
<Swiper
onSwiper={setThumbsSwiper}
loop={true}
spaceBetween={12}
slidesPerView={4}
freeMode={true}
watchSlidesProgress={true}
modules={[FreeMode, Navigation, Thumbs]}
className='thumbs mt-3 h-32 w-full rounded-lg'
>
{outputValue.map((base64Image, index) => (
<SwiperSlide key={index}>
<button className='flex h-full w-full items-center justify-center'>
<Image
src={`data:image/png;base64,${base64Image}`}
alt={`Generated Illustration ${index}`}
className='block h-full w-full object-cover'
width={120} // specify the width
height={120}

/>
</button>
</SwiperSlide>
))}
</Swiper>

<div className="mt-4" />

<Swiper
loop={true}
spaceBetween={10}
navigation={true}
thumbs={{
swiper:
thumbsSwiper && !thumbsSwiper.destroyed ? thumbsSwiper : null
}}
modules={[FreeMode, Navigation, Thumbs]}
className='w-full rounded-lg'
>
{outputValue.map((base64Image, index) => (
<SwiperSlide key={index}>

<div key={index} className="relative mb-4">
<img
src={`data:image/png;base64,${base64Image}`}
alt={`Generated Illustration ${index}`}
className="mb-4"
// width={512}
// height={512}
/>
<DownloadButton onDownload={handleDownload} imageIndex={index} />
</div>
</SwiperSlide>
))}

</Swiper>


</div>



)}



</div>
</div>

</div>
)
}

The API Route (route.ts)

The API route code is located at app/api/generate-story-illustrations/route.ts

This code is a server-side function written for a Next.js application. It defines an API endpoint that receives a request containing text, style, and image count, and then uses this information to generate briefs for illustrations.

These briefs are created by interacting with the OpenAI GPT-3 API, which receives specific prompts and returns ideas for illustrations. Once the briefs are obtained, the code makes requests to an external image generation API (Segmind), passing each brief along with specific parameters to generate images with Stable Diffusion XL 1.0 image generation model. The images are returned in base64 format in the response. The function includes error handling and retry logic for the API requests.

import { NextResponse, NextRequest } from 'next/server';
export const maxDuration = 100; // This function can run for a maximum of 100 seconds
export const dynamic = 'force-dynamic';



const texttoimageURL = "https://api.segmind.com/v1/sdxl1.0-txt2img";
const texttoimageAPIKEY = process.env.SEGMIND_API_KEY;

const chat_gpt3_url = "https://api.openai.com/v1/chat/completions"
const gpt3_headers = {
"Content-Type": "application/json",
"Authorization": `Bearer ${process.env.OPENAI_API_KEY}`,
}



export const POST = async (req: NextRequest, res: Response) => {

const { text, style, imagecount } = await req.json();
console.log("text: ", text);
console.log("style: ", style);
console.log("imagecount: ", imagecount);


async function fetchIllustratorPrompts(prompt) {
// let retries = 0;
let currentAttempt = 0;
const maxRetryAttempts = 3;
let success = false;

while (currentAttempt < maxRetryAttempts && !success) {

try {
// Define the payload for the chat model
const messages = [
{
role: "system",
content:
"You are an expert author who can read children's stories and create short briefs for an illustrator, providing specific instructions, ideas, or guidelines for the illustrations you want them to create.",
},
{ role: "user", content: prompt },
];

const chatgptPayload = {
model: "gpt-3.5-turbo-16k",
messages: messages,
temperature: 1.3,
max_tokens: 2000,
top_p: 1,
stop: ["###"],
};




const response = await fetch(chat_gpt3_url, {
headers: gpt3_headers,
method: "POST",
body: JSON.stringify(chatgptPayload),
cache: 'no-cache'
});

// console.log("responseJson ",await response.json());

if (!response.ok) throw new Error("GPT-3 API fetch error");

const responseJson = await response.json();

console.log("responseJson ", responseJson)
console.log("message choices ",responseJson.choices[0].message.content.trim())

const output = JSON.parse(
responseJson.choices[0].message.content.trim()
);

success = true;

return output;

} catch (err) {
currentAttempt++;
console.log(`Error on retry ${currentAttempt} parsing response or fetching data: `, err);
if (currentAttempt === maxRetryAttempts) {
console.log("Max retries reached");
return null;
}
}
}
// If max retries are reached without a successful response, return null
return null;
}


function generatePrompt(text, count) {
const promptPrefix = `${JSON.stringify(text)}

------------------------------------

Generate ${count} short briefs as a list from the above story to give as input to an illustrator to generate relevant children's story illustrations.

Strictly add no common prefix to briefs. Strictly generate each brief as a single sentence that contains all the necessary information.

Strictly output your response in a JSON list format, adhering to the following sample structure:`;

const sampleOutput = { illustrations: Array(count).fill("...") };

const promptPostinstruction = `\nOutput:`;

return (
promptPrefix + JSON.stringify(sampleOutput) + promptPostinstruction
);
}

const prompt = generatePrompt(text, imagecount);


try {

const illustratorPrompts = await fetchIllustratorPrompts(prompt);
console.log("illustratorPrompts ", illustratorPrompts)

if (!illustratorPrompts || !illustratorPrompts.illustrations) {
throw new Error('Failed to fetch illustrator prompts');
}

const images = [];
for (const illustrationText of illustratorPrompts.illustrations) {
// Generate a random seed between 400000 and 500000
const randomSeed = Math.floor(Math.random() * (500000 - 400000 + 1)) + 400000;
const requestBody = {
prompt: illustrationText,
negative_prompt: "ugly, tiling, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, blurry, bad anatomy, blurred, watermark, grainy, signature, cut off, draft",
style: style,
samples: 1,
scheduler: "UniPC",
num_inference_steps: 25,
guidance_scale: "8",
strength: 0.2,
high_noise_fraction: 0.8,
// seed: "468685",
seed: randomSeed.toString(),
img_width: "1024",
img_height: "1024",
refiner: true,
base64: true,
};

console.log(requestBody)

const response = await fetch(texttoimageURL, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'x-api-key': texttoimageAPIKEY,
},
body: JSON.stringify(requestBody),
});

// console.log("response ",response);

if (!response.ok) {
throw new Error("Error with generating Illustrations");
}

// Assuming the response from the API is in image/jpeg format
const responseData = await response.json();
const base64Image = responseData.image;
images.push(base64Image);
}

return NextResponse.json({ images: images }, { status: 200 });



} catch (error) {
console.error(error)
return NextResponse.json({ "message": `ERROR: Could not generate images for the text` }, { status: 500 });
}

}

In case if you want to understand the algorithm deeply you can also look at my previous blog where the core logic is explained in greater detail.

Conclusion

Creating illustrations for children’s stories is now incredibly simple, thanks to the integration of OpenAI’s ChatGPT and Segmind’s SDXL model. This tool allows you to generate custom illustrations that align precisely with your narrative, moving beyond the limitations of standard stock images. This article has not only demonstrated a practical application of AI but also how you can leverage these technologies to build production-ready applications, complete with a functional template that integrates NextJS 14, React, and Tailwind CSS.

Happy AI exploration and if you loved the content, feel free to follow me on Twitter for daily AI content!

Stackademic

Thank you for reading until the end. Before you go:

  • Please consider clapping and following the writer! 👏
  • Follow us on Twitter(X), LinkedIn, and YouTube.
  • Visit Stackademic.com to find out more about how we are democratizing free programming education around the world.

--

--