Generative AI is all over the place and judging by its recent capabilities, is here to stay. But alongside all the buzz, when it comes to actually using AI-generated artwork in commercial projects, the landscape suddenly does not look so optimistic (yet).
The reason for that is simple - all current mainstream tools operate on their own and some of them are relatively hard to control for the average user (sometimes, even professional). Of course, that is about to change as Artificial intelligence technology matures and companies find appropriate use cases in how to optimize (and monetize) their product offering.
Adobe is a special player in the generative AI landscape. Adobe has been experimenting with and implementing simple Artificial Intelligence models into their suite for years. They have effectively integrated complex AI features into a user-friendly UI interface, allowing intuitive control without the need for complex prompts, all within the same suite.
So as Adobe announced the release of its generative AI solution - Adobe Firefly, I was curious how they would stand up against current mainstream solutions such as OpenAI’s Dall-E or Midjourney. Not only in terms of the end result but also in terms of its approach to the interface and ultimately its ethical credentials.
To be clear - while this article could be written as a full-blown science paper, my intention is to simply give an understandable overview of where they’re at and how big the potential behind their technology is. So, let’s dive in a manner the average user would do - by attempting to generate an image. Let’s generate!
First up, I’ve decided to throw in a relatively simple task - one that could be widely available in any image database:
A few image rotations later, the best I could do without altering the command is quite similar to all three tools, but with some clear differences.
Midjourney did a solid job, although the image appears a bit illustrated and may lack a deeper level of photorealism.
Dall-E did a mediocre job - lousy details such as a strange coffee mug, unrealistic newspaper anatomy, and maybe not the expected handsome model.
Firefly had the most impressive result by generating a near stock-photo realistic image with perfect anatomy (fingers, ears, body) and going the extra mile on the face and hair details. This was by far the best result.
For the second task, I've made things a bit more complicated by combining known objects with unexpected environments. The reason was simple - it seems like an additional layer of complexity. I've used the following prompt:
The results were a bit different.
Midjourney did some decent images. Although a bit cartoonish, they still featured decent Capybara anatomy (although most variations looked like hamsters or guinea pigs) and nailed most of the prompt excluding 'legs walking by' and 'jumping' - still, a pretty decent result.
Dall-E, on the other hand, struggled with this task - not one single image was generated with the proper positioning of the object onto the canvas. The best I could do is crop the capybara's nose off. Despite that, the image was unsharp, with poor anatomy and lacking an overall level of sense. Unusable.
Now to Firefly - an absolute blast! The level of detail, the composition, and the overall setup were impeccable. Not only this image, but all other generated images were quite to the point and usable - very good job.
In the last round, I raised the bar by requesting an unrealistic scenario that might not be available on stock data. I also combined it with specific details such as gender and ethnicity to further push the limits. The results were a bit surprising (and funny) for the following prompt:
Midjourney did great - slightly illustrative, but still a very atmospheric, cinematic image that nailed the general idea, gender, and ethnicity. The spacesuit color and the two rings were left off, but the result was still above expectations!
Dall-E struggled with the color, composition, and details, and generated an overall mediocre result. Throughout other variations, I did not manage to fix the color and mood without altering the prompt.
A big surprise at the end was Firefly which generated a bunch of nonsense. From misshapen human anatomy to distorted faces, everything that could go wrong did go wrong. I suspect this is because the database lacks such images, and also because the system is still in beta and is figuring out how to handle more complex prompts.
In my opinion, Dall-E was worse than the rest as it struggled with composition, realism and details. Midjourney was far more reliable, while Firefly, despite being in beta, demonstrated huge potential by nailing simpler prompts with extraordinary quality. What it comes down to is the integration which for now, might be Adobe's strong point. Even the barebones UI that enables variant manipulations proved a very intuitive way to alter the image.
Going forward, I presume that this logic will be integrated across the Adobe suit and seamlessly interact with existing images in a very user-centric way. This path presents a challenge for competitors that lack such customer-facing friction points, highlighting Adobe's significant advantage in this field. In addition, they completely rely on their own image database and advocate for creator compensation and ethical usage guidelines - a topic that has been widely addressed.
Wrapped-up:
Adobe has a good shot at nailing and delivering the product that generative AI has been waiting for - a user-friendly, UI-enhanced product that builds upon ethical credentials!
Disclaimer: As generative AI is rapidly developing, even after writing this article, Midjourney 5.1 was launched, offering much more detail and better anatomy. Editing the article mid-way through felt logical, but it still demonstrates the changing tides within the AI landscape.
Like what you’ve read? Check out our open positions and join the Martian team!
See openings