Asking OpenAI Advanced Data Analyst to Analyse Itself 🤯

Eduard Ruzga
5 min readSep 7, 2023

--

Recently, OpenAI renamed the Code Interpreter to “Advanced Data Analysis” (ADA) and launched ChatGPT for Enterprise. I found myself pondering:

What’s changed?

This simple question led me on an intriguing adventure, accessing ADA’s file system, exploring its README, examining library lists, and even guiding it to construct a taxonomy of its own capabilities.

The File System Exploration

My initial idea was simply to ask ADA about the files in the current directory 🤨
What I found first was this README:

Thanks for using the code interpreter plugin!
Please note that we allocate a sandboxed Unix OS just for you, so it’s expected that you can see and modify files on this system.

Further exploration revealed:

  • A whopping ~7Gb of files in the file system.
  • A frozen set of Python libraries, though file uploads(including scripts) are still permissible.
  • An SQLite database that logs Python commands.
  • A comprehensive breakdown of the system’s specs, 60GB RAM and 16 core 64bit processor.

This answered question I had for a while. Can it install new libraries? Nope. No internet access of any kind. From many networking utilities only the curl was installed and even that was not working. Still, entire endeavour gave me the thrill of hacking into an unfamiliar system. Even though it was not hacking.

Key Takeaways so far

  1. ADA has access to 352 libraries with code included, though there is no documentation.
  2. Network access and additional installations are restricted.
  3. SQLite databases can be both manipulated and transferred.
  4. ADA can execute Python scripts from uploaded files.

Going deeper

While my initial exploration of ADA slightly satisfied my curiosity, reading about another enthusiast’s deep dive reminded me of the vast possibilities and reignited my hunger to learn more. With renewed enthusiasm, I aimed to catalog the libraries ADA had preinstalled. This endeavour entailed:

  • Requesting a comprehensive list of libraries. (chat link)
  • Instructing ADA to script and subsequently download a Python program capable of scraping PyPI for the README files of each respective library.
  • Failure of ADA to produce any useful visualisation on its own be it with word clouds or otherwise
  • Collaboratively working with ADA to summarize the READMEs, ensuring they fit within Claude AI’s 100k context limit.
  • Engaging Claude AI to construct a detailed taxonomy based on these summaries.
  • Iteratively working with both ADA and Claude AI to categorize all libraries within the established taxonomy.
  • Finalizing a file that encapsulated library READMEs, their concise summaries, and their respective taxonomy categories.
  • Presenting ADA with an interactive D3 Treeview code and directing it to adapt both the data and code to visualize the taxonomy.
  • Further instructing ADA to create a grid view HTML page to display the categorized library information in a more structured format.

This journey was not without its challenges:

  1. Iterative Processes: The collaboration between Claude and ADA required extensive iteration. Summarizing and categorizing was not a straightforward task and demanded repeated back-and-forth adjustments at times. While during others all I needed to say was “proceed and repeat”
  2. Save Frequently: ADA occasionally loses both in-memory data and files. To circumvent this challenge, it’s essential to request download links from ADA and save progress at every step. This precautionary measure resulted in the accumulation of dozens of files you can see in this git repo.
  3. Grounding Generative AI: Pairing generative AI, like Claude, with a structured system like database can counteract hallucinations. While both AIs occasionally “imagined” libraries that didn’t exist in original set, by making ADA update original file it was discarding libraries not present in it.

Results and Unexpected Discoveries

Here is link to a CodePen with visualisation and here is an html page with grid view.

A notable chunk of the libraries surprisingly catered to web development (e.g., Django, Flask) and networking. Given ADA’s lack of network access, this seemed redundant. It led me to speculate that OpenAI might possess a more potent internal version with network capabilities, which might be too risky to release publicly. However, such advanced tools are undoubtedly on the horizon.

OpenAI might possess a more potent internal version with network capabilities

On a lighter note, I stumbled upon a rather unexpected library: kerykeion. Yes, an Astrology-related library! 🤪 Who would’ve imagined using AI for generating astrology reports? However, a deeper dive revealed that even this quirky library couldn’t be fully utilized, as it requires network access to convert locations into coordinates.

kerykeion output

Conclusions

My exploration of ADA provided both insights and surprises. Some key takeaways include:

  1. ADA operates on a LINUX instance and, while it boasts a rich array of libraries, it lacks network access. This makes some of its libraries redundant, hinting at a potentially more powerful internal version used by OpenAI.
  2. The process of creating the taxonomy was intricate and time-consuming. While enlightening, it’s a task I’d think twice about undertaking again.
  3. Despite a vast number of installed libraries, not all are functional due to network restrictions.
  4. ADA’s propensity to lose both in-memory and filesystem data was unexpected and challenging. Coupled with the inability to retrieve files after sessions, it underscores the system’s fragility.
  5. Future Prospects: The potential of ADA, especially when combined with other AI systems, is vast. Its early days
  6. Many of the above points lead me a strong desire to get a localy running copy of ADA!

And what did you learn from this article?

--

--

Eduard Ruzga

We make our world significant by the courage of our questions and by the depth of our answers — Carl Sagan