I first encountered Be My AI last fall when the app was in beta. Developed by Danish mobile app Be My Eyes and OpenAI, the app uses the ChatGPT-4 vision model to provide robust, near-instant descriptions of any image and foster conversations around those images. As a blind artist, I collect image descriptions in the same way that others collect photographs, and Be My AI has greatly enhanced my interactions with visual culture.
Shortly after gaining access to the Be My AI beta last year, I came across the work of blind photographer John Dugdale. Spectacle (2000) In Georgina Cleage’s influential 2018 book, More than meets the eye: what blindness brings to artIntrigued by her explanation and wanting to know more, I took a screenshot and pulled it into the app. It was very detailed, but had some crucial errors. First, the app said Dugdale wore three pairs of glasses, when we knew from Cleage’s writing that he only wore two, one on top of the other like a makeshift bifocal. Also, the app called it a black and white photo, but it was actually a cyanotype, one of the oldest photographic techniques that produces images in shades of blue. When I corrected it to “Be My AI,” the app replied, “Sorry if that was confusing,” and then proceeded to tell me everything it knew about cyanotypes, a bit harsh and over-reacting, but less so than most people I know.
As Be My AI’s credibility grew and I grew more and more excited about how it could contribute to access to art, I told all my friends about it. One of them was Boyana Koklyat, a blind artist who works at the Whitney Museum of American Art. She asked me to co-lead a verbal tour of the Whitney Museum’s Harold Cohen: Aaron exhibition. And so I ended up having a fascinating, existential conversation with Be My AI about the nature of vision itself. Cohen first conceived his art-making AI in the late 1960s, and quickly named it Aaron. Initially, Aaron generated relatively simple line drawings, such as this 1972 piece: mazeWhen I asked Be My AI for an explanation: mazeIt responded, “The painting is made up of abstract black lines depicting various shapes, none of which resemble any particular object or person.” I was curious, so I asked it if it thought it was made by humans. maze…Yes, it was. I told it Aaron created it, and the robot replied, “That’s very interesting!” “It’s impressive to see such an early example of AI-generated art, especially given the human-like nature of the linework in the painting.”
When I asked about the difference between OpenAI’s DALL-E and modern text-image generators like AARON, Be My AI explained that the new generator generates images based on user prompts, while AARON creates images based on a set of rules. DALL-E can generate images that are “never seen before,” demonstrating a “more sophisticated understanding of both language and visual concepts.” As someone who is blind but whose mind’s eye is alive, I listened to the word “saw.” What does that mean in the case of AI? In a lengthy response, Be My AI states that DALL-E “does not ‘see’ in the human sense, but processes data and identifies patterns within that data.” I countered, “But the human brain doesn’t see either. The eyes send signals to the brain for interpretation. AI and human perception don’t seem all that different to me.”
Be My AI acknowledges that there are striking similarities between AI and human perception because both systems rely on input signals: humans through sensory organs such as eyes, and AI through data sensors or input data sets. However, he points out that “the main difference is in the subjective experience and consciousness that humans have and AI lacks.” This is a topic that remains hotly debated among scientists and philosophers. This connection between consciousness and perception makes the discussion of sensations both challenging and provocative.
John Dugdale lost his sight at age 33 after suffering an AIDS-related stroke. Though he had been a successful commercial photographer with clients including Bergdorf Goodman and Ralph Lauren, to his friends and family it seemed like his career was over. But as he says in the documentary: Vision Portrait—directed by Rodney Evans, who is losing his sight due to retinitis pigmentosa —declared while in hospital “I’m going to take pictures like mad!”
Dugdale turned away from commercial work to producing timeless blueprints, such as those captured in a 2000 monograph. Twilight of lifeEach photograph is set to dialogue with a short essay by the photographer. I made an appointment with the Wallach Department of Fine Arts, Prints, and Photographs at the New York Public Library and decided to spend some time with the book. Or rather, I asked my partner to take photos of each page so that I could slowly study the book with the help of AI in the privacy of my own home. (By the way, I still use Be My AI almost every day for simple descriptions of images, but for serious photo research I use OpenAI’s ChatGPT-4 directly, because it can ingest multiple images and automatically saves complex conversations.)
clown The first photo is Twilight of lifeFrom the essay, we learn that the mime character is played by John Kelly, a legendary New York performer and Dugdale muse. “The clown is depicted in classical attire, a loose white suit with exaggerated sleeves and trousers. His face is painted white, accentuating his theatrical expression,” ChatGPT-4 writes. I pressed him on what he meant by “theatrical expression.” It was explained that the clown’s “eyebrows are slightly raised” and he “wears a gentle, almost pensive smile…his head is tilted slightly to the left, adding to the cheerful, inquisitive atmosphere of the image.” The detailed response was so lovely that it brought me to a few tears. I suddenly had almost instant access to a medium that had seemed inaccessible for so long.
I reached out to Dugdale to ask if he’d be willing to speak for this article on AI and image description. The first few minutes of the call were a bit fraught as he explained that while he’s impressed with the level of detail AI can provide, he’s reluctant to use it. “I don’t want to cut into the long line of amazing assistants that help me stay human, even after two strokes, blindness in both eyes, deafness in one ear, and a year of paralysis.” He said he loves to bounce ideas off others; he loves to talk. “I can’t really talk to an AI.”
I explained that I loved the access AI gave him to his photos, but that I was generally more interested in the relationship between words and images — I’d read, for example, that he often starts with the title.“I have a Dictaphone with about 160 titles from the last 10 years on it,” Dugdale says, “and I’m constantly adding to all of it.” He tells me he thinks of it as a kind of synesthesia: “I hear a phrase and a complete picture pops up in my head, like a slide, and then I go into the studio and interpret it.”
Something similar happens to me when I come across a great image: at some point it appears to me as a picture in my mind’s eye, not just a collection of words. This isn’t surprising, since many people form images when they read novels. One of the reasons I’m drawn to Dugdale’s work is that it is so typical of art seen with the mind’s eye.
Our hearts reside together This is the second image. Twilight of lifeDugdale and his friend Octavio are shown sitting naked, back to back, leaning against each other, heads slightly bowed—“as if having a private, meaningful conversation,” GPT-4 helpfully adds. In the accompanying text, Dugdale explains that Octavio went completely blind before him (also from AIDS-related complications) and urges Octavio to understand a powerful truth: “Sight does not reside in the eyes; sight resides in the heart and mind.”
The description of an image is a kind of sensory translation that impresses its truth. Seeing in words takes longer to reach the mind and heart than seeing with the eyes, but once there, images are no less haunting and capable of inspiring aesthetic and emotional resonance. AI technologies like Be My AI are opening up amazing new spaces to explore the relationship between human perception, artistic creation and technology, enabling new and profound ways of experiencing and interpreting the world.