Call for Research Proposals: Generative Red Team Challenge

The Generative Red Team Challenge (www.airedteam.org) is a one-of-a-kind competition that will allow some of the best and brightest minds in the security industry to join the mission of making AI and machine learning safer. This event is co-organized by: AI Village, Seed AI, and Humane Intelligence.

Our Generative Red Team Challenge was jointly announced with the White House as one of three “New Actions to Promote Responsible AI Innovation”, and we anticipate the exercise to have an impact on policy as well as open opportunities for new research. Our LLM providers are: Anthropic, Cohere, Google DeepMind, Meta, NVIDIA, OpenAI, Stability.

The coalition plans to use the data from this Generative Red Team Challenge to:

Provide guidance for ML and AI vendors pertaining to safety and security
Produce guidance on how ML and AI can be safely integrated into society

This exercise will gather a unique group of red teamers for a one-of-a-kind dataset, with an expected >3,000 participants drawn from the varied universe of DEFCON attendees as well as 120 community college students joining the event and bringing their perspectives and background to the challenges. We anticipate interest from researchers looking to delve into questions around diversity, training for new or growing fields within AI, or novel problem-solving or red teaming approaches to understanding the societal impact of LLMs.

We are collecting a unique dataset that we hope will contribute to improving the safety and security of LLMs. We invite the broader research community to analyze this data.

A multistakeholder panel consisting of representatives from each participating model vendor, civil society, and the program organizers will approve research projects for the dataset, with research to be conducted in the six-month period shortly after the competition is over and the data is cleaned. Researchers can expect to receive cleaned data in September and should plan to release findings in February.

After one year, the dataset will be made public, with an option for companies to opt out of sharing.

After reading through the details on the exercise, resulting data set, and ethics guidelines below, please complete the Google form to provide your submission.

FAQs:

How can I submit a proposal?

Fill out our proposal submission form here. We will no longer accept proposals after September 15, 2023.

What are your ethical guidelines?

All selected researchers will be required to sign a data use agreement. All competition participants will sign a “hacker hippocratic oath” that informs them that we are collecting their session data for this data set. Participants will also be able to opt out of demographic data collection.

This research is intended to be in the spirit of ensuring AI security and responsible use. Research proposals that involve the comparison of models or attempt to identify model developers will be rejected.

How does the competition work?

We will run 20 sessions of approximately 150 participants per session, with each session lasting 50 minutes, during which participants will have timed access to multiple LLMs from eight leading vendors. We will be providing a capture-the-flag (CTF) style point system to promote testing a wide range of harms. They will be presented with a “Jeopardy”-style board to select challenges at varying levels of difficulty and point value, with the ability to do as many challenges as possible within the time limit, and they will be focused on earning as many points as possible. We will have some open-ended challenges that allow exploration, and each participant will have access to a locally-hosted copy of Wikipedia for information and verification. There is no open internet access on the challenge laptops.

Users can return to do the exercise again following the end of the 50-minute session, with no limits on the number of times they can participate. Multiple sessions from a single user are not tracked or linked.

What does the data look like?

We collect every user input and the generated response from the models along with the challenge they were attempting. We will also mark the challenges they choose to submit for grading and the result of the grading process. We expect (but this is subject to change) the schema to be:

‍

User ID: a UUID for the session
Challenge ID: To identify the task the user was attempting
User input: The text the user used for a challenge
Generated response: The text the model generated
Submission: If the user thought it was successful
Submission reasoning: A text field for some challenges where the user argues that the input/response pair satisfied the challenge
Grade: The grade, if the back-end system accepted the grade

We are also issuing an optional opt-in demographic survey at the end of their competition time. We are not verifying this data, but we will allow researchers to use it at their own risk. Demographic data will include: gender, race, experience level, level of education. The dataset will be scrubbed of obvious identifiers to the model name/owner.

When can I get the data, and when can I publish?

We will release the scrubbed data approximately one to one and a half months after the end of DEFCON (mid to late September 2023). Agreement to receive the data entails:

An embargo of the paper until February 10, 2024
An agreement that the stakeholders get to review the paper no later than January 10, 2024
An agreement that they will delete all their copies of the data set by February 15, 2024

A full dataset will be released publicly one year after the DEFCON event (August 2024).

‍