15 Questions

Feb 4, 2025

What do truly long-context models look like? I want to give the model all my journals, notes, pictures, previous work, etc. so that it can make connections and tailor responses for me. I imagine this to be In-between context stuffing and fine-tuning. Every ~day, the model takes all the conversations from that day and decides which to use to update its weights. In the future, will everyone have custom models? Predictive processing?
What will human-AI collaboration look like in the future?
How much software will humans be writing in three years? What are the comparative advantages of humans?
Is “We don’t need to find the most general, all-modality, solution. We just need to get something good enough to automate research. That’s the goal. After that, there’s a clear path and we’re just on high-level steering.” wrong?
Has someone created a gym environment that is a computer simulation? Actions are anything someone can do on a computer. After each episode, unit tests are run to determine reward. Why are we using screenshots?
How much does o1-style reasoning RL transfer to performing long-horizon tasks for computer use?
I don’t get how we’re passing the synthetic data wall. Yes, you can use o3 outputs to fine-tune 4o and get a really good o3-mini, but can you use oN outputs to get oN+1?
Can you get two models to communicate through residual streams and not text? Or CoT in the latent space instead of writing everything out? Is this desirable? How do you get training data for this? A quick perplexity search gets me these links.
We have text-to-text, text-to-image, text-to-video. What is the SOTA for text-to-action tokens in robots? There must be a way to leverage the understanding of the world language models have to robotics. How?
How much do traders use ML? It seems like a ripe field for it. Lots of money, data, smart people… Everything is probably private.
Why is Moravec’s paradox true?
How is Adam still the best optimizer after 10 years? `
How do lightweight code-generation models like Cursor’s work?
What is going on in interpretability these days?
Why are all the benchmarks in math and coding competitions? What happened to physics?