THEir Room

Time: 2021 Fall
Team Size: 2
Role: Machine Learning Specialist, front-end developer
Tools: BigGAN, GPT-2, TTS(text to speech synthesis),
other image synthesis machine learning algorithms

Artificial Intelligence Application

Website: https://whereareroses.github.io/THEir-ROOM/

The website is an imagined reconstructed memoir of the human species by artificial intelligence in 3023. I narrated a dystopian future where humans have gone extinct and artificial intelligence has to reconstruct their daily lives using machine learning generated pictures, audio and texts. Me and my partner utilized several machine learning algorithms such as BigGAN, StyleGAN, and GPT-2. By presenting the artificial intelligence generated content on the website in an exhibition manner, we hope it can provoke certain discussions and thoughts on the future of artificial intelligence and pose some critical questions towards how we should move forward with the advancement of artificial intelligence.

Inspiration

ASI: Artificial Super Intelligence

After reading The AI Revolution: The Road to Superintelligence, I am intrigued by the concept of artificial superintelligence (ASI) brought up by Urban. Oxford philosopher and leading AI thinker Nick Bostrom define superintelligence as “an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills”. Urban keeps emphasizing we are on edge of the exponential growth of artificial intelligence and proposes several approaches to creating ASI.

The article got me thinking about what would happen if ASI is created and how will the inaccuracy and bias of the current machine learning algorithm influence the greater society and human history. Hence I want to tell a story of an imagined dystopian future where humanity has gone extinct and the history of humanity is represented by these "inaccurate and biased" images generated by artificial intelligence.

Process

BigGAN, GPT-2 and TTS Speech Synthesis

BigGAN: Intentionally "Flawed" Image Synthesis

Example-of-the-Generative-Adversarial-Network-Model-Architecture.webp

Generative Adversarial Network or GAN is an approach to generative modeling using deep learning methods, such as convolutional neural networks. It frames the problem as a supervised learning problem with two sub-models: the generator model that we train to generate new examples, and the discriminator model that tries to classify examples as either real or fake. BigGAN is an approach to pull together a suite of recent best practices in training class-conditional images.

Me and my partner use BigGAN to generate 106 objects and choose 16 among them to create the room. I observed that certain objects with simpler backgrounds are more realistic while some are completely distorted and unreal, such as the generated pictures with humans. The inaccuracy reflects the data that the algorithm is trained on and the complexity of the task. I intentionally kept the inconsistency in those synthesized pictures to convey a sense of absurdity.

Screen Shot 2021-12-23 at 11.40.16 PM.png

GPT-2 and TTS: "Subjective" Text and Speech Synthesis

On hovering each object image, the website will present a short object description in a subjective tone. This is achieved by using the GPT-2 implementation in InferKit. I input “This is *object name*” and let the algorithms auto-complete the paragraph, with some minor manual tweaking on the sentences. The descriptions themselves have a striking impact as to how coherent and subjective they are (See right). The speech was generated using Online Microsoft Sam TTS Generator.

Generative Pre-trained Transformer 2, or GPT-2 architecture implements a deep neural network, specifically a transformer model, which uses attention in place of the previous recurrence- and convolution-based architectures. The subjective tone also derives from the dataset it was trained on: BooksCorpus, a dataset of over 7,000 unpublished fiction books from various genres.

Screen Shot 2021-12-24 at 12.00.52 AM.png

THEir Room: three layers of meaning

1. The project is about a reconstructed human room, made by aliens using a database found on post-apocalypse earth. So the room is “their room” according to which aliens tell about human’s stories;

2. Because it is actually a reconstructed room using A.I. of irregular quality, the room is also “The Irregular Room”, full of the irrationality and inconsistency of A.I.;

3. While A.I. is in main control of creating this room, and aliens choose to fully believe in it, it is “The Room”, the jailed room of the shaped reality. The aliens in our project could also be us. Aliens choose to fully believe because they have no context or background knowledge about humans. Thus, it urges us to reflect whether in reality, when we use A.I, are we actually these aliens, neglecting other contextual information and blindly trusting in the “room” A.I. created for us?

User Test and Feedback

I user tested the project before showcasing the work. I invited a few audiences to play with the websites, observed the process, and received feedback from them. Below is what I observed and got after the experience.

[Positive]

1. The intro video is aesthetically pleasing and provided the background for the story;

2. The concept is thought-provoking and inspiring; the content is intriguing and users are willing to view them.

[Negative]

1. Most of the content is static, even though this is an exhibition website, Most people lose interest after viewing a few objects.

Implementation

After user testing, I realized that more interaction methods and diverse forms of content could be added to make the storyline ample. Me and my partner decided to incorporate more forms of image synthesis algorithms into the storytelling.

3D Ken Burns Effect and VQGAN-CLIP: depth videos animations and paintings

Screen Shot 2021-12-24 at 12.13.02 AM.png

On the "item entry" page, when mouse hovering, the images of each object will start to present 3D visual effects. The pseudo-3D animations for the objects’ images are done using the Colab notebook for 3D Ken Burns Effects.But due to the lack of background and diversified content of the image, the effects were not carried out as expected (See left one).

We also used the Colab notebook of VQGAN-CLIP trained on Wikiart to help visualize some lines of the story. As you can see in "Our Goal" page, the abstractive but thematic paintings are generated using it (See left two).