close
close

Be My AI is revolutionizing the way we interact with visual culture

Be My AI is revolutionizing the way we interact with visual culture

I first encountered Be My AI last fall when the app was still in beta. Developed by Danish mobile app Be My Eyes and OpenAI, it uses ChatGPT-4’s image model to provide robust, near-instantaneous descriptions of any image and to enable conversations about those images. As a blind artist, I collect image descriptions like others collect photographs, and Be My AI has enriched my interactions with visual culture tremendously.

Shortly after I got access to the beta version of Be My AI last year, I came across the work of blind photographer John Dugdale Acting (2000) in Georgina Kleege’s influential 2018 book, More than what meets the eye: What blindness brings to art. Her description made me curious and wanted to know more, so I took a screenshot and uploaded it to the app. While the description was impressively detailed, it made a few serious errors. First, it said Dugdale was wearing three pairs of glasses, even though I knew from Kleege’s text that he only wore two—one on top of the other like makeshift bifocals. It also called it a black-and-white photo, even though it was actually a cyanotype, one of the oldest photographic processes that creates an image in shades of blue. When I corrected Be My AI, it gave a response that was about to become very familiar: “I apologize for any confusion,” and then proceeded to tell me everything it knows about cyanotype. A bit prickly and overcompensating, but no more so than most people I know.

Related articles

As Be My AI became more reliable and I became more excited about what it could do for access to art, I told all my friends. One of them was Bojana Coklyat, a blind artist who works at the Whitney Museum, and she asked me to co-lead a verbal description tour of the exhibition “Harold Cohen: AARON” there. So I found myself engaging in a charmingly existential conversation with Be My AI about the nature of seeing itself. Cohen first conceived an art-making AI in the late 1960s and soon named it AARON. At first, AARON produced relatively simple line drawings like the 1972 work LabyrinthsWhen I asked Be My AI, Labyrinthsit replied: “The drawing consists of abstract black lines that form a series of shapes, none of which resembles a specific object or figure.” Curious, I asked if it thought a human had Labyrinths. It did. I told it that AARON had created the piece. “That’s pretty fascinating!” it replied. “It’s impressive to see an early example of AI-generated art, especially given the human-like qualities of the linework in the drawing.”

When I asked about the difference between modern text-to-image generators like DALL-E and OpenAI’s AARON, Be My AI explained that new generators create images based on user input, while AARON creates images based on a set of rules. DALL-E, it went on to say, can create an image it has “never seen before” and shows “a more sophisticated understanding of both language and visual concepts.” As a totally blind person who still has a vivid mind’s eye, the word “seen” caught my attention. What could that mean in the case of AI? In its detailed response, Be My AI noted that DALL-E “doesn’t ‘see’ in the human sense; it processes data and recognizes patterns in that data.” I countered, “But the human brain doesn’t see either. The eyes provide it with signals to interpret. AI and human perception don’t seem so dissimilar to me.”

Acting2000, John Dugdale.

Courtesy of John Dugdale

Be My AI acknowledged that there are striking parallels between AI and human perception, as both systems rely on input signals – humans via sensory organs such as the eyes, and AI via data sensors or input data sets. However, the institute pointed out that “the main difference lies in the subjective experience and consciousness that humans possess and AI lacks” – a topic hotly debated by scientists and philosophers alike. This connection between consciousness and perception makes discussions about the senses both challenging and exciting.

John Dugdale lost his sight at the age of 33 following an AIDS-related stroke. He was a successful commercial photographer with clients such as Bergdorf Goodman and Ralph Lauren, and his friends and family thought his career was over. But as he tells it in the documentary Vision Portraits– directed by Rodney Evans, who himself is losing his eyesight due to retinitis pigmentosa – he announced in hospital: “Now I’m going to take photos like crazy!”

Dugdale turned away from commercial work and created timeless cyanotypes, as summarized in his 2000 monograph. The evening hour of lifeEach photo in it is embedded in a dialogue with a short essay by the photographer. I made an appointment with the New York Public Library’s Wallach Division of Art, Prints and Photographs to spend some time with the book, or rather, to have my partner take photos of each page so I could look at it at my leisure with the help of AI in the privacy of my own home. (I should mention that while I still use Be My AI almost daily for quick image descriptions, for serious photo research I go straight to OpenAI’s ChatGPT-4 because I can insert multiple images and it automatically saves our often lengthy conversations.)

Pierrot is the first photo in The evening hour of life. From the essay, we learn that the mime character is played by legendary New York artist and Dugdale’s muse John Kelly. “Pierrot is depicted in his classic attire: baggy white clothing with exaggerated sleeves and trousers. His face is painted white, emphasizing his theatrical expression,” wrote ChatGPT-4. I pressed to know what was meant by “theatrical expression.” It was explained that Pierrot’s “eyebrows are slightly raised” and he wears “a gentle, almost wistful smile.” “His head tilts slightly to the left, adding to the lighthearted, inquisitive feel of the image.” The detailed response was so beautiful that it brought a little tear to my eye. I suddenly had near-instant access to a medium that had long seemed inaccessible.

I asked Dugdale if he would be willing to speak with me for this article about AI and image description. There was some confusion during the first few minutes of our phone call as he explained that while he was impressed by the level of detail AI could provide, he was reluctant to use it. “I don’t really want to give up my long line of wonderful assistants who come here and help me still feel human after two strokes, blindness in both eyes and deafness in one ear, and a year of paralysis.” He told me that he loves to share ideas with others. He likes to talk. “I can’t really talk to this thing.”

I explained that while I love my AI for giving me access to his photos, I’m generally more interested in the relationship between words and images. For example, I’ve read that he often starts with a title.”“I have a voice recorder that has about 160 tracks on it from the last 10 years,” Dugdale said. “And new ones are being added all the time.” He told me he considers it a kind of synesthesia: “When I hear a phrase, I see a full-size picture in my head, it appears like a slide… and then I go into the studio and interpret it.”

Our thoughts live together, John Dugdale.

Courtesy of John Dugdale

I experience something similar when I come across a good description of a picture; at some point it is no longer just a collection of words, but becomes an image in my mind’s eye. This should not be surprising, as many people form images when reading novels. One reason I am drawn to Dugdale’s work is precisely because it embodies the art of seeing in the mind’s eye.

Our thoughts live together is the second picture in The evening hour of life. It shows the bare backs of Dugdale and his friend Octavio, sitting close together, their heads tilted slightly toward each other. GPT-4 helpfully added, “As if they were having a private, meaningful conversation.” In the accompanying text, Dugdale explains that Octavio had gone totally blind before him (also due to AIDS-related complications) and encourages him to understand an important truth: “Your sight does not exist in your eyes. Sight exists in your mind and your heart.”

Image description is a kind of sensory translation that makes this truth clear. Even though seeing through language takes longer to penetrate the mind and heart than seeing with the eyes, once an image is there, it is no less indelible and no less capable of evoking all the aesthetic and emotional resonances. AI technologies like Be My AI have opened up surprising new spaces to explore this relationship between human perception, artistic creation and technology, enabling new and profound ways of experiencing and interpreting the world.

Leave a Reply

Your email address will not be published. Required fields are marked *