Developers are increasingly publishing model specs. Developers should also describe their processes for ensuring that models adhere to their specs, and report on the degree of adherence.
Given the large amount of actions with widespread deployment, you could have a "spec violation bounty" program which incentivizes external researchers to find and report violations. Maybe eventually we want to build up a bunch of "case law" for violations.
Re: hardware-based attestation, a simpler intermediate step is just doing a hashed/merkle log of sysprompt from prod and then revealing it if there's a dispute.
I wonder what to do about internal deployment specs. It seems pretty cheap/easy to just ask companies to attest that their internal deployment specs are ~the same or publish them or something, just to have some assurance that there isn't shenanigans going on internally as models get better.
Nice post! A couple random thoughts.
Given the large amount of actions with widespread deployment, you could have a "spec violation bounty" program which incentivizes external researchers to find and report violations. Maybe eventually we want to build up a bunch of "case law" for violations.
Re: hardware-based attestation, a simpler intermediate step is just doing a hashed/merkle log of sysprompt from prod and then revealing it if there's a dispute.
I wonder what to do about internal deployment specs. It seems pretty cheap/easy to just ask companies to attest that their internal deployment specs are ~the same or publish them or something, just to have some assurance that there isn't shenanigans going on internally as models get better.