DALL · E mini has a mysterious obsession with women in saris

Like most people who are extremely online, Brazilian screenwriter Fernando Marés has been fascinated by the images generated by the artificial intelligence (AI) model DALL · E mini. Over the last few weeks, the AI ​​system has become a viral sensation by creating images based on seemingly random and quirky queries from users – such as “Lady Gaga as the joker“”Elon Musk is sued by a capybara,” and more.

Marés, a veteran hacktivist, began using the DALL · E mini in early June. But instead of entering text for a specific request, he tried something else: he left the field blank. Fascinated by the seemingly random results, Marés ran the empty search over and over again. That was when Marés noticed something strange: almost every time he ran a blank request, DALL · E generated mini portraits of brown-skinned women wearing sariera type of attire common in South Asia.

Marés queried the DALL · E mini thousands of times with the blank command input to find out if it was just a coincidence. He then invited his friends over to switch to on his computer to simultaneously generate images on five browser tabs. He said he continued for almost 10 hours without a break. He built an extensive repository of over 5,000 unique images and shared 1.4GB of raw DALL · E mini-data with The rest of the world.

Most of these images contain images of brown-skinned women in saris. Why is the DALL-E mini apparently obsessed with this very specific type of image? According to AI researchers, the answer may have something to do with sloppy tagging and incomplete datasets.

DALL · E mini was developed by AI artist Boris Dayma and inspired by DALL · E 2, an OpenAI application that generates hyper-realistic art and images from a text input. From cats meditating, to robotic dinosaurs fighting monster trucks in a colosseum, the images blew everyone’s minds, and some called it a threat to human illustrators. Recognizing the potential for abuse, OpenAI limited access to its model to only a hand-picked set of 400 researchers.

Dayma was fascinated by the art produced by DALL · E 2 and “wanted an open source version that could be accessed and improved by anyone,” he said. The rest of the world. So he went ahead and created a sticky, open source version of the model and called it the DALL · E mini. He launched it in July 2021 and the model has been training and perfecting its outputs ever since.


DALL.E mini

DALL · E mini is now a viral internet phenomenon. The images it produces are not nearly as clear as those from DALL · E 2 and have remarkable distortion and blur, but the system’s wild reproductions – everything from Demogorgon from Stranger Things hold a basketball to one public execution at Disney World – has given rise to an entire subculture, with subreddits and Twitter handles dedicated to curing its images. It has inspired a cartoon in New Yorker the magazine, and the Twitter handle Weird Dall-E Creations has over 730,000 followers. Dayma told The rest of the world that the model generates about 5 million inquiries a day, and is currently working to keep up with an extreme growth in user interest. (DALL.E mini has nothing to do with OpenAI and after OpenAI’s insistence renamed its open source model Craiyon from June 20.)

Dayma admits he’s amazed at why the system generates images of brown-skinned women in saris for blank requests, but suspects it has something to do with the program’s dataset. “It’s pretty interesting, and I’m not sure why that happens,” Dayma said The rest of the world after reviewing the images. “It is also possible that this type of image was highly represented in the dataset, perhaps even with short captions,” Dayma said. The rest of the world. The rest of the world also contacted the creator of OpenAI, DALL · E 2, to see if they had any insight but have not yet heard a response.

AI models like the DALL-E mini learn to draw an image by analyzing millions of images from the web with associated captions. The DALL · E-mini model was developed on three large data sets: Conceptual Captions data sets, which contain 3 million image and caption pairs; Conceptual 12M, which contains 12 million image and caption pairs, and The OpenAI’s body of about 15 million images. Dayma and DALL · E mini co-creator Pedro Cuenca noted that their model was also trained using unfiltered data on the Internet, opening it up to unknown and unexplained biases in datasets that can seep down to image generation models.

Dayma is not alone in suspecting the underlying dataset and training model. To find answers, Marés turned to the popular machine learning forum Hugging Face, hosted by DALL · E mini. The computer science community weighed in, with some members repeatedly offering plausible explanations: AI could have been trained on millions of images of people from South and Southeast Asia who are “unmarked” in the training data corpus. Dayma disputes this theory when he said that no image from the dataset is without a caption.

“Typically, machine learning systems have the reverse problem – they don’t actually include enough images of non-white people.”

Michael Cook, who is currently researching the intersection of artificial intelligence, creativity and game design at Queen Mary University in London, challenged the theory that the dataset contained too many images of people from South Asia. “Typically, machine learning systems have the reverse problem – they don’t actually contain enough images of non-white people,” Cook said.

Cook has his own theory about the confusing results of the DALL · E mini. “One thing that occurred to me while reading around is that many of these datasets remove text that is not English, and they also remove information about specific people, i.e. proper names,” Cook said.

“What we might see is a strange side effect of some of this filtering or pre-treatment, where images of Indian women, for example, are less likely to be filtered by the ban list, or the text describing the images is removed and added to. the dataset without the attached labels. ” For example, if the captions were in Hindi or another language, it is possible that text may become cluttered in the processing of the data, resulting in the image having no caption. “I can not say for sure – it’s just a theory that came to my mind while I was exploring the data.”

Disruptions in AI systems are universal, and even well-funded Big Tech initiatives such as Microsoft’s chatbot Tay and Amazon’s AI recruitment tool have succumbed to the problem. In fact, Google’s text-to-image generation model, Imagen, and OpenAI’s DALL.E 2 explicitly reveal that their models have the potential to recreate harmful biases and stereotypes, just like the DALL.E mini.

Cook has been one vocal critic of what he sees as the growing numbness and outside revelations that draw on biases as an inevitable part of new AI models. He told The rest of the world that while it is commendable that a new piece of technology allows people to have fun, “I think there are serious cultural and social issues with this technology that we do not really appreciate.”

Dayma, the creator of the DALL · E mini, admits that the model is still in progress and the extent of its biases is not yet fully documented. “The model has aroused much more interest than I expected,” Dayma said The rest of the world. He wants the model to remain open source so his team can study its limitations and biases more quickly. “I think it’s interesting for the public to be aware of what’s possible so they can develop a critical mind towards the media they receive as images, just as much as media received as news articles.”

Meanwhile, the mystery remains unanswered. “I learn a lot just by seeing how people use the model,” Dayma said The rest of the world. “When it’s empty, it’s a gray area, so [I] still need to investigate in more detail. ”

Marés said it is important for people to learn about the possible harms of seemingly fun AI systems like the DALL-E mini. The fact that even Dayma is unable to figure out why the system spits out these images reinforces his concerns. “It simply came to our notice then [been] said for years: That these things are unpredictable and they can not control it. “

Leave a Comment