I tried to write an academic paper in four weeks using AI… and finished in four days.1 The paper, titled Investing in Artificial General Intelligence, is on SSRN and on GitHub.
I’m not writing to brag about how fast I was able to write the paper. My motivation is simple: I was blown away by the result, so I want to share my experience and insights. To be clear, I think it is a good paper2 that looks nothing like what I would categorize as “obviously written by AI”. Yes, there are em-dashes, but not more than in my pre-ChatGPT papers. I would be very proud of that paper if I had written it myself without AI. Now, I don’t know how to feel about it and I wonder how academic research will evolve in the coming years as AI becomes more powerful and accessible.
A second reason for writing this post is transparency. The use of AI in the research process, while very common, is still a gray area ethically. I went all in on AI for this paper, so it feels like full disclosure is in order, especially since I will be submitting this paper to conferences and (eventually) journals.
The initial trigger to go all in on AI for a research paper was the call for papers for the Human × AI Finance conference:
An original research exercise: write a finance paper using as much AI assistance as possible — in just four weeks. All papers reviewed by AI agents. Top four presented and discussed at the Fink Center Conference on Financial Markets at UCLA Anderson.
This was a perfect opportunity to test the limits of AI in research. To be clear, I consider myself an aggressive early adopter of AI, and I had attempted to use it to speedrun research papers in the past with limited success. However, the capabilities of AI have improved significantly since then, and I was curious to see how far I could push it (plus I just finished teaching, so it was time to have some fun).
So the challenge was set. I had four weeks to write a research paper using as much AI assistance as possible, i.e. to “vibe research”. Here is how it went down.
Day -1: The Idea
The idea for the paper came to me last Friday while listening to Dwarkesh Patel’s interview with Dario Amodei, the CEO of Anthropic. There was some discussion about his views on investing in infrastructure, and why Anthropic does not invest as aggressively in infrastructure as they could. I thought that was an interesting angle to explore, especially given the recent surge in investment in AI infrastructure. However, this was just a vague idea at this point, and clearly one that falls under “theoretical corporate finance”, which is not my area of expertise. In fact, I haven’t studied corporate finance models since the PhD (I graduated in 2013). This is where I find AI most exciting: it can help you pursue ideas that you would not have explored otherwise because of the time and effort required to get up to speed on the relevant literature and tools.
Day 0: The Outline
On Saturday, I spent my afternoon at a tennis club watching my son play in a tournament. I had my phone with me, and I decided to use that time to start fleshing out the idea. I see this as the starting point of my work on the paper. Using Claude, I started with a simple prompt:
I just listened to an interview with Anthropic’s CEO (transcript here: https://www.dwarkesh.com/p/dario-amodei-2). I found the discussion on the economics of investment in compute infrastructure very interesting and thought that it would be nice to write down a real option model of investment that would embed the idea of limited competition and default risk discussed in the interview. Can you first summarize the relevant parts of the interview related to investment in compute and how Anthropic is approaching those issues?
I then turned on “research” mode in Claude, and asked it to “search the academic literature for existing models that would have similar features (even if not all features are present). Focus on top finance and economics journals.”. After ~30 minutes, I got a nice (but incomplete) literature review, then asked it “from this literature review, what approach do you suggest to build a model that incorporates all of these features?”. It suggested two possible approaches: one for a paper and a simpler one appropriate for a blog post. For my final prompt of the afternoon, I asked it to write a plan to implement the paper:
approach A is the way to go; i’m looking to write a fully fledged academic paper on this. i think it would be nice to solve a simplified version of the model anatically (ie duopoly) to illustrate the key mechanism insights, then we could solve a mire complex version numerically. we could even calibrate a version of the model using publicly stated numbers from the frontier labs and other sources. the main questions I want to tackle is how to think about optimal investment in compute for these firms. it would be great if we could also use it the model to tackle (even partially) the problem of valuing these firms and provide insights as to how external observers (ie market participants) could infer the probability of a regime switch (the lambda) from frontier labs investment decisions. In other words, will we be able to tell the agi is near (assuming AI labs will know first that it is near). For the numerical exercises, I will use python with numpy (unless something else more appropriate exists). For the anslytical solutions, I would like to try using python with simpy3, but I might revert to solving the model « by hand ». Can you make a detailed plan of action for deriving the model, implementing the numerical solutions and writing the full paper? Assume that we are aiming for a top journal such as AER, econometrica, or the journal of finance.
I had no idea if using code (sympy and numpy) to solve the model would help. From my experience doing data analysis, AI agent are great at writing and reviewing code, and great at writing tests for code, so I figured that would add another layer of confidence in the results.
I got a very detailed plan of action, and my focus was back on tennis. Later that evening, I created a private GitHub repository for the project (now public, so feel free to snoop around), and set it up using a template of mine (note that the template is broken, a blank project needs some manual fixes so I wouldn’t recommend using it for now). The key components: Python with uv for the code, Quarto for the paper, Claude Code (CC) for most of the work, and VS Code with a Devcontainer for improved security when letting CC work with --allow-dangerously-skip-permissions (aka YOLO mode).
I added the lit review and the plan of action to the repository, and I was ready to start working on the paper. Or, to be exact, let CC work on the paper. I asked it to read the plan, create an issue for each phase of the plan (you can see the first issue here), and then to work on all issues in sequence.4 I activated YOLO mode and made sure it was working hard on my paper. Then I went to bed.
Day 1: The First Draft
I woke up on Sunday morning to find that CC was done. There it was, I had a first draft of the paper, with all the sections filled in. It was complete; not in the sense that it was ready to be submitted, but in the sense that it had all the components of a research paper. I was amazed. I wanted to read the paper, but I didn’t have time: it was Sunday morning, and there was a Olympic gold-medal hockey game to watch. But that was no reason to let AI rest, so I decided to iterate on the paper while watching the game.
Following an approach I used for coding projects, I launched a fresh instance of CC, an instance of Codex CLI, and an instance of OpenCode (with Gemini), and I asked each agent to review the paper, the code, and the proofs and write a report. You can have a look at the full instructions (written by Claude, of course). When they were done, I asked a fresh instance of CC to read the reports, write a consolidated report with suggested fixes, and then to implement those fixes. When it was done, I repeated the whole process again one more time.
After watching a very entertaining hockey game (with an unfortunate overtime loss for Team Canada), I was back at the tennis club but this time with my laptop so I could work between matches (my son made the final, so I spent a good chunk of my week-end there.) Looking at the AI feedback, I noticed that one area the AI wouldn’t be able to fix by itself was getting better references on public statements by tech executives related to the topic under study. For that, more research was needed, so I tasked Claude with it:
I want to better understand how executive of AI frontier labs make decisions regarding their investments in compute infrastructure, how they choose to allocate their computing power between training and inference, and when they believe they will achieve AGI (or a similar inflexion point in performance). Focus of news articles, interviews, podcasts, etc., from the last 12 to 18 months.
That was enough to get a much better set of media references for the introduction. I asked CC to read the document and update the introduction, and then loaded the PDF on my iPad to have a first read of the introduction. Later that evening, I gave my feedback to CC and asked it to incorporate it in the paper. At that point, I still hadn’t written a single word of the paper directly myself. All changes to the writeup were made through prompts to CC.
I continued to iterate on the paper using the coding agents to review, but I also started asking feedback from the web versions of ChatGPT 5.2 (with thinking activated) and Claude.ai with Opus 4.6, combining their feedback and asking CC to implement the relevant suggestions. These were different because they didn’t have access to the code and the Quarto markdown files, only to the rendered pdf. Still, I found that having multiple agents with different capabilities and perspectives was very helpful in improving the paper.
Day 2: Iterating
I spent most of my work time (besides a few meetings) the next day iterating on the paper with the same loop (either the coding agents or the chat-based agents). However, at that point I also started paying more attention to the reviews and asking for more specific feedback on the model and the results. These interventions, while not time consuming for me, were significant in terms of the changes they generated in the paper. The human in the loop was not doing the heavy lifting, but it was providing the needed guidance and feedback to steer the paper in the right direction. I also started making more significant changes to the model and the writing, but I still didn’t write anything directly myself; changes were made through CC.
I don’t know exactly how many rounds of iteration I did on the paper, but looking at my chat history, I had Claude.ai and ChatGPT do seven reviews each over the four days, and I had the coding agents do a similar number of “full repository” reviews.
I eventually realized that my Day 0 typo of “simpy” instead of “sympy” had caused the AI to completely ignore the use of sympy. That got me worried that the derivation of the model could be a complete fabrication, so I asked CC to write the code to solve the model using sympy, which it managed to do it without issues. I also had it write a Jupyter notebook with the model derivation to make it easier for me to review the math.
Another concern I had was the accuracy of the references. The literature review was done by Claude.ai using its research feature which is supposed to give accurate references, but I still don’t fully trust it. Verifying the references was thus a must, but I didn’t want to do it manually. Instead, I tried having CC check the references for me (CC has a --chrome mode which allows it to connect to an instance of Claude in Chrome). It didn’t quite work (probably because of the Devcontainer setup), so I reverted to using Claude Cowork instead (which can also connect to a Chrome instance).
I don’t have the exact prompt I used, but it was something along the lines of: “Go over all the academic references in the bibtex file, and search them one at a time on Google Scholar. For each reference, check that the title, authors, journal, and year are correct. If you find any discrepancies, write them in a report. Also note down the references you can’t find.” It took a while, but the next day I woke up to find a nice report waiting for me. Besides minor fixes, the main issue was that one of the references was completely fabricated, and one other one had the wrong title and journal.
Day 3: Simplifying the Model
Day 3 was mostly a continuation of the iteration process, but I also started asking CC to make changes to the figures and tables. My AI reviews kept complaining that the data for my calibration exercise was too limited, so I had Claude.ai do another targeted search for the relevant data and used the resulting report to update the calibration section.
At the end of the day, I actually read the full paper for the first time, and found the model to be very complex, maybe even too complex. I thought of a potential simplification: take out the option to invest in the post-AGI regime. This was a substantial change to the model and the code, so I asked CC to implement it and waited until the next day to see the results.
Day 4: Finishing Touches
CC implemented the simplification, but also left me a note. It turns out that the simplification made it impossible to solve the model analytically. I took the explanation to Claude.ai and had a good chat with it about the issues before concluding that the AI was right. The nice thing about using git to track your changes is that when you try things in a branch and they don’t work, you can just leave it there (or delete it) and go back to where you were before, so this is what I did.
I spent the rest of the day looking for more ways to trim the paper and make it more concise, to simplify the model as much as possible without losing the key insights, and to polish the figures and the introduction. I also validated the non-academic references in the introduction and added a few more (manually this time because I wanted only credible media sources, which I found to be a bit hit or miss when I asked Claude to do it).
I also received an email from a colleague of mine who pointed me to Refine, a tool for “AI-powered peer review”. I had heard of it before, but I had never tried it. I signed up and used my free credit to review the paper. The free review is somewhat limited, but the report was nonetheless helpful in catching some issues the other agents had missed.5 The report was detailed enough that CC was able to read it, understand the issues, and implement the relevant changes. After some more edits and polishing, I was finally happy with the paper.
At 9:55 PM on Wednesday (4 days and 8 hours after starting work on the paper), I created the first “release” of the paper on GitHub and submitted the paper to SSRN. I was done (for now). I had a research paper that I was proud of, and that I thought was a good contribution to the literature, and it only took me four days. This is only the beginning of the story for this paper as I am now moving on to conference submissions (for those not aware, the publication process in economics and finance is quite long, with multiple rounds of revisions and resubmissions, so it will be a while until this paper is “done”).
Concluding Remarks
I wanted to share my experience and insights from this experiment, as I think it has implications for how we will do research in the future. If I can write a decent paper so quickly, what does that mean for the future of academic research? Will we write (and review) more papers? Will we set a higher bar for what constitutes a “good” paper? I don’t have the answers to these questions, but one thing I’m sure of is that we can’t put the genie back in the bottle.
One thing that reassured me is how much the human in the loop still matters, even when AI is doing almost all of the writing and coding. The key decisions that shaped the paper — the initial idea, the simplification attempts, the choice to verify references independently, the judgment calls on which reviewer suggestions to accept and which to ignore — were mostly mine. AI made me faster, not smarter. It lowered the cost of exploring ideas, but the research taste and direction still had to come from me.
I also want to be honest about the limitations. I don’t fully understand every line of the math in my own paper, and that is uncomfortable. I have verified the key results using sympy and reviewed the derivations, but I would be lying if I said I could reproduce them on a whiteboard from memory. The fabricated reference that slipped through the literature review is another reminder that trust but verify is not optional when working with AI. These are not reasons to stop using AI for research, but they are reasons to be thoughtful and transparent about how we use it.
Software Stack
Here is a list of the main platforms and software tools I used for this project:
- The workhorse was Claude Code. I’m on the Max plan (USD 100/month) but still managed to hit the 5-hour limit a few times. This also lets me use Claude.ai, Claude Cowork and Claude in Chrome, which I used for specific tasks as described above.
- I use GitHub for version control and collaboration in most of my projects. As an academic, I get a free GitHub Copilot Pro subscription, which I used to access Gemini Pro in OpenCode.
- Quarto for writing the paper in markdown and rendering it to PDF. Very convenient when working with AI agents as markdown is as close to a natural language as you can get while still being structured enough for the AI to understand the document structure and make specific edits.
- VS Code with a Devcontainer.
- Codex CLI and ChatGPT. I’m on the Plus plan, which was plenty for the use I made of it.
- Python and uv for the code. I also used sympy for the analytical solutions and numpy for the numerical solutions.
- Refine (using a free review) for an additional round of feedback on the paper.
Footnotes
Less than 5 days would be more accurate, but four days makes for a better headline↩︎
As I explain below, the paper falls in a subfield of finance which is not my core expertise, so I can’t rule out that the paper is actually very bad and will get destroyed at conferences, if it gets accepted at all.↩︎
This was a typo on my part, I actually meant sympy. I only realized on Day 2 when I inspected the code and noticed that sympy was not even installed.↩︎
My actual prompt was a bit more complex because I wanted to make sure that each issue was worked on in a separate branch and then merged with a pull request to line up with my usual workflow.↩︎
If you end up subscribing to Refine, you can use my referral code which will give me (but, unfortunately, not you) free credits for future reviews↩︎