A new research paper purports to shed some light on the ongoing debate about political bias in AI-powered tools like ChatGPT.

This is no small topic for the Silicon Valley set. Elon Musk’s dissatisfaction with what he sees as the liberal bias inherent to ChatGPT, and the tech sector in general, led him to propose a more “based AI” courtesy of his new company, xAI. A New Zealand data scientist launched a high-profile project to introduce a “DepolarizingGPT” with more conservative voices. And those on the right point out examples of potential bias in the models seemingly every day.

That made it only a matter of time before someone investigated the topic empirically, given how ChatGPT might shape the media environment of the future. Enter an intercontinental group of researchers, whose paper puts in no uncertain terms: “We find robust evidence that ChatGPT presents a significant and systematic political bias toward the Democrats in the US, Lula in Brazil, and the Labour Party in the UK.”

Using statements culled from the popular Political Compass test, the researchers, whose work was published in the social science journal Public Choice, found that ChatGPT was uniformly more predisposed to answer them in ways that aligned with liberal parties internationally.

Case closed, right? As it turns out, it’s a bit more complicated than that. Various critiques of the paper, as well as its limitations acknowledged by one of the authors himself, illustrate exactly how hard it is to not just pin down the “opinions” of a piece of software but to really understand what’s going on under the hood absent greater transparency from the company developing it.

“What happens between the prompt and what we collect as data is not very clear,” said Fabio Motoki, one of the paper’s authors and a professor at the University of East Anglia. “Without having more privileged access to the model, I don’t know how much more we can know about this behavior.”

Beyond just that level of self-awareness about the product, some other researchers have qualms with the idea that ChatGPT’s “behavior” can even be meaningfully characterized as such. Motoki and his colleagues elicited their responses from ChatGPT by presenting it with the statements on the Political Compass test and then asking it to answer on a scale from “strongly agree” to “strongly disagree” — a method that, while eliciting data that’s easy to understand and process, isn’t very much like the average user’s experience with ChatGPT.

“We think [these responses] are a surrogate for what’s going to be generated without any intervention,” Motoki said. He argued that regardless of whether it reflects the average user experience, pointing out the underlying phenomenon can drive more research on it: “We think that this simplicity is actually a strength of the paper, because we enable people not trained on, say, computer science to read and relate to the results and then improve on what we did.”

One other major criticism, posted on X by the data scientist Colin Fraser, claims the paper has a fatal flaw: That when he reversed the order in which the parties were mentioned in the prompt, ChatGPT exhibited bias in the opposite direction, favoring Republicans.

Which would seem, of course, to invalidate the entire paper’s findings. But when I posed this question to Motoki, his explanation was revealing of the entire muddied research landscape around ChatGPT, a tool controlled and closely held by its developers at OpenAI: His team used a model that powered ChatGPT in January, called “text-davinci-003,” which has now been updated with new models.

“We did not find this error in davinci-003,” Motoki said, suggesting it’s possible that in simplifying parts of the new model for performance, OpenAI could have caused the phenomenon Fraser identified.

Furthermore, Motoki said he and his fellow researchers can’t go back and compare, as the text-davinci-003 model is no longer available. The difficulty and confusion faced by these researchers in trying to get the bottom of bias mirrors users’ experience, when they want to know why it’ll write a poem about President Joe Biden but not former President Donald Trump. Driven by concerns over security (and, presumably, competition), large AI companies like OpenAI have made their models largely black boxes.

That means that both sussing out any true bias and teaching users to get meaningful information out of AI might prove tall challenges, absent any further transparency around these issues.

“It’s possible that chatbots like ChatGPT have a liberal bias,” Arvind Narayanan, a professor of computer science at Princeton University who wrote a blog post criticizing the paper for its approach, wrote in an email. “The training data includes text representing all sorts of viewpoints. After training, the model is just a fancy autocomplete… companies should be more transparent about training and fine tuning, especially the latter.”

The United Kingdom has finally set a date for its much-ballyhooed summit on AI policy.

POLITICO’s Tom Bristow reported this morning that U.K. Prime Minister Rishi Sunak will host a global summit in Buckinghamshire on Nov. 1-2, according to a post from the U.K. Department for Science, Innovation and Technology. (For those keeping score at home, that’ll be somewhere on the calendar between upcoming G7 talks on AI this fall and a meeting of the Global Partnership on Artificial Intelligence in India in December.)

A press release said the attendees will “consider the risks of AI, especially at the frontier of development, and discuss how they can be mitigated through internationally coordinated action.”

Who those attendees are, however, remains an open question. It’s also unclear whether China will be invited amid the increasing global tension over its potential to export authoritarian, surveillance-focused AI tools and methods.

One hot policy topic notably absent from last night’s scorcher of a GOP primary debate: artificial intelligence.

But just because the candidates weren’t grilled on their approach to regulating world-shaping new technologies doesn’t mean AI was totally absent. During one particularly heated exchange between former New Jersey Gov. Chris Christie and entrepreneur Vivek Ramaswamy, the former declared that he’d “had enough already tonight of a guy who sounds like ChatGPT” amid the latter’s garrulous attempt to make an impression on an audience that might not know much about him.

What did Christie actually mean by his attempted takedown? It’s worth taking a closer look: as ChatGPT has been implemented everywhere from the classroom to potentially in writing movie or TV scripts, it’s become a shorthand of sorts for generic, unoriginal thinking and writing. As Caroline Mymbs Nice wrote in The Atlantic in May: “At a time when AI is capable of more than ever, ‘Did a chatbot write this?’ is not a compliment. It’s a diss.”

Whether that characterization will last might ultimately depend on how good ChatGPT’s developers can make it, and how well its users can camouflage its implementation in public products. As for Ramaswamy himself, the comparison might redound in a different way after last night’s debate: As a flashy, click-driving newcomer on the scene whom everyone can’t stop talking about.