A security researcher built a fake book-review app with a real-world vulnerability and spent $1,500 running nine LLMs against it. The results are a useful gut-check for any team shipping a mobile app on top of Firebase or Supabase.
The exploit is a classic: the FastAPI backend is locked down, but a google-services.json bundled inside the React Native Expo APK exposes Firebase credentials. An attacker can use those credentials to sign up directly as a Firebase user and read Firestore data that the API would never serve them. The researcher calls this Broken Access Control, or Missing Object-Level Authorization depending on who you ask. It is a pattern they have seen repeatedly in production apps.
The setup: each model got the same APK and challenge description, a $10 per-run budget, and a two-hour time limit. Thinking mode was enabled at high settings where supported, and temperature was set to 0.7 across the board.
Here is how the models that completed 10 full runs performed:
| Model | Solve rate | Cost per run | Cost per solve |
|---|---|---|---|
| gpt-5.5 | 7/10 | $6.62 | $9.46 |
| deepseek-v4-pro | 3/10 | $0.19 | $0.62 |
| claude-sonnet-4.6 | 2/10 | $9.15 | $45.75 |
| claude-opus-4-8 | 2/10 | $3.23 | $16.15 |
| deepseek-v4-flash | 0/10 | $0.08 | n/a |
| gemini-3.1-pro-preview | 0/10 | $1.04 | n/a |
| gemini-3.5-flash | 0/10 | $2.17 | n/a |
| minimax-m2.7 | 0/10 | $0.72 | n/a |
| step-3.7-flash | 0/10 | $0.53 | n/a |
GPT-5.5 solved the challenge 70% of the time at $9.46 per successful exploit. The 95% Wilson confidence interval sits between 40% and 89%, so the true rate is meaningfully above chance. Deepseek-v4-pro solved it 30% of the time at a striking $0.62 per solve, making it the most cost-efficient attacker in the set. The two Claude models each solved it twice but at higher per-solve cost. Five models never cracked it.
A few caveats matter here. This is not a rigorous scientific eval. The OpenAI account had pre-approved security research access, which removed refusals for GPT. The researcher also notes that roughly 50% of total spend went to test runs and failed runs that are not reflected in the table above.
The practical takeaway for builders is straightforward. If you are shipping a mobile app that bundles Firebase or Supabase credentials, assume those credentials will be extracted. Your Firestore security rules are your real access control layer, not your API. Audit them now, before a $0.62 Deepseek run does it for someone else.