Sounds like everyone was ok. Apparently they were in the middle of repairing is, I guess they started too late. NIMBYs had been holding up repairs since 2016, Their CEQA lawsuit aimed to block changes (cached) to the restaurant at the end of the pier, looks like they failed at that. CEQA is not progressive, it's nothing but stasis (cached), which is not good policy.
For some context, the ARC prize is meant to be a test of AI that humans can do easily but AIs can't. The cost per question for o3-high is wild! A thousand dollars worth of inference for each question. I did some very rough back of the napkin estimates, but a relatively expensive OpenAI model is billed at two cents per thousand tokens. So very roughly, 1000 books worth of text per question. I really enjoyed this write up on o3’s performance on the ARC AGI benchmark. An even simpler explanation of how it works is that it’s searching for a set of words that represent a series of steps that reach a valid solution. Check out the problems that it couldn’t source at the end to get an idea of the kinds of problems in the set.
If you'll just indulge a bit more discussion of the Arc Prize: here's a thread (cached) arguing that a key part of why LLMs have trouble with the ARC prize tasks is that it’s extremely difficult for them to perceive large grids (cached). Since o1 is publicly available it was possible for someone to check if o1 was able to solve larger versions of questions already succeeded on in the benchmark. It fails, indicating that the sudden jump in o3 ability would also probably go away if the grids were larger, even though the effective question is the same. The idea for these questions is to find out whether the AI can generalize from just two examples, but Mikel Bober-Irizar asks (cached): “How much are we testing the LLMs ability to generalise from 3 examples, and how much are we testing its ability to de-linearise grids?”
I don’t think I’ve ever seen a chart of “growth velocity”, but the reasoning for the dip and then rise in velocity makes sense to me.
One commenter pointed out (cached) that he “outlived one of the reporters who wrote his NYT obituary”. Much of the deregulation credited to Reagan actually took place under Carter. Yglesias argues (cached) that Carter's deregulation was mostly for the best, here's a whole article on the subject.
Longer Reads
• Thread of highlights of Dan Frommer’s yearly roundup of Internet trends. Highlights include that TikTok shop is huge, and that despite what commenters say, Americans have super positive opinions about Amazon. (src)(cached)
Flotsam and Jetsam
– More on Waymo's safety stats. In comparing with human drivers, one thought about the data was that it must be a problem that Waymos don’t drive on the highway, where accidents happen more. But of course it’s the opposite. Since it’s measured per mile, and there are fewer crashes per mile on the highway, this study actually underestimates Waymo’s improvement over humans. Although, while we're at it, it seems that a Waymo AV hit a delivery robot that was running a red light a day or two ago, and there's video, the delivery robot was unharmed enough to drive away immediately. (src)(cached)
– State department meeting with the head of Syria’s HTS went so well that they revoked the $10 million bounty on his head. Must’ve been a good meeting. (src)(cached)
– Argument that the success of o1 and Gemini tuning indicates that open source LLMs will be competitive with closed source ones from OpenAI and others (src)(cached)
– Maybe everyone knew this, but California did gain population in 2024, it’s just that it was outpaced by other states. Only three states shrank, and US population was up 1% this year (src)
– Since work has started on AVs, one restriction blocking serious change has been that no one can make AVs without steering wheels. But now the federal regulator is lifting that restriction as long as companies turn over detailed data. (src)
– Marco Arment’s data shows that 100x as many people share podcast clips privately as publicly (src)
– a GOP congresswoman who had been missing for six months was found in a dementia care home. This raises the obvious question: why are we only just now finding out a Congressperson was missing for six months? (src)(cached)
– A short example of what expert FOV done piloting looks like. This one is flying through a Dollar General. Terrifying as a weapon (src)(cached)
– You don’t really want to read about Hollywood harassment lawsuits, but long story short: Lively’s lawsuit Baldoni is very well crafted, he’s probably screwed. Never send a text about how you better not put XYZ in writing. (src)(cached)