Humans + AI

Humans + AI


Diyi Yang on augmenting capabilities and wellbeing, levels of human agency, AI in the scientific process, and the ideation-execution gap (AC Ep24)

November 26, 2025

“Our vision is that for well-being, we really want to prioritize human connection and human touch. We need to think about how to augment human capabilities.”

–Diyi Yang

About Diyi Yang

Diyi Yang is Assistant Professor of Computer Science at Stanford University, with a focus on how LLMs can augment human capabilities across research, work and well-being. Her awards and honors include NSF CAREER Award, Carnegie Mellon Presidential Fellowship, IEEE AI’s 10 to Watch, Samsung AI Researcher of the Year, and many more.

Website:

Future of Work with AI Agents:

The Ideation-Execution Gap:

How Do AI Agents Do Human Work?

Human-AI Collaboration:

LinkedIn Profile:

Diyi Yang

University Profile:

Diyi Yang

What you will learn
  • How large language models can augment both work and well-being, moving beyond mere automation
  • Practical examples of AI-augmented skill development for communication and counseling
  • Insights from large-scale studies on AI’s impact across diverse job roles and sectors
  • Understanding the human agency spectrum in AI collaboration, from machine-driven to human-led workflows
  • The importance of workflow-level analysis to find optimal points for human-AI augmentation
  • How AI can reveal latent or hidden human skills and support the emergence of new job roles
  • Key findings from experiments using AI agents for research ideation and execution, including the ideation-execution gap
  • Strategies for designing long-term, human-centered collaboration with AI that enhances productivity and well-being
Episode Resources Transcript

Ross Dawson: It is wonderful to have you on the show.

Diyi Yang: Thank you for having me.

Ross Dawson: So you focus substantially on how large language models can augment human capabilities in our work and also in our well-being. I’d love to start with this big frame around how you see that AI can augment human capabilities.

Diyi Yang: Yeah, that’s a great question. It’s something I’ve been thinking about a lot—work and well-being. I’ll give you a high-level description of that. With recent large language models, especially in natural language processing, we’ve already seen a lot of advancement in tasks we used to work on, such as machine translation and question answering. I think we’ve made a ton of progress there. This has led me, and many others in our field, to really think about this inflection point moving forward: How can we leverage this kind of AI or large language models to augment human capabilities?

My own work takes the well-being perspective. Recently, we’ve been building systems to empower counselors or even everyday users to practice listening skills and supportive skills. A concrete example is a framework we proposed called AI Partner and AI Mentor. The key idea is that if someone wants to learn communication skills, such as being a really good listener or counselor, they can practice with an AI partner or a digitalized AI patient in different scenarios. The process is coached by an AI mentor. We’ve built technologies to construct very realistic AI patients, and we also do a lot of technical enhancement, such as fine-tuning and self-improvement, to build this AI coach.

With this kind of sandbox environment, counselors or people who want to learn how to be a good supporter can talk to different characters, practice their skills, and get tailored feedback. This is one way I’m envisioning how we can use AI to help with well-being. This paradigm is a bit in contrast to today, where many people are building AI therapists. Our vision is that for well-being, we really want to prioritize human connection and human touch. We need to think about how to augment human capabilities. We’re really using AI to help the helper—to help people who are helping others. That’s the angle we’re thinking about.

Going back to work, I get a lot of questions. Since I teach at universities, students and parents ask, “What kind of skills? What courses? What majors? What jobs should my kids and students think about?” This is a good reflection point, as AI gets adopted into every aspect of our lives. What will the future of work look like? Since last year, we’ve been thinking about this question. With my colleagues and students, we recently released a study called The Future of Work with AI Agents. The idea is straightforward: In current research fields like natural language processing and large language models, a lot of people are building agentic benchmarks or agents for coding, research, or web navigation—where agents interact with computers. Those are great efforts, but it’s only a small fraction of society.

If AI is going to be very useful, we should expect it to help with many job applications, not just a few. With this mindset, we did a large-scale national workforce audit, talking to over 1,500 workers from different occupations. We first leveraged the O*NET database from the Department of Labor Statistics to access occupations that use computers in some part of their work. Then we talked to 10 to 15 workers from each occupation about the tasks they do, how technology can help, in what ways they want technology to automate or augment their work, and so on. Because workers may not know concretely how AI can help, we gave summaries to AI experts, who helped us assess whether, by 2025, AI technology would be ready for automation or augmentation.

We got a very interesting audit. To some extent, you can divide the space into four regions: one where AI is ready and workers want automation; another where AI is not ready but workers want automation; a third where AI is ready but workers do not want automation; and a low-priority zone. Our work shows that today’s investment is pretty uniformly distributed across these four regions, whereas research is focused on just one. We also see potential skill transitions. If you look at today’s highly paid skills, the top one is analyzing data and information. But if you ask people what kind of agency they want for different tasks, moving forward, tasks like prioritizing and organizing information are ranked at the top, followed by training and teaching others.

To summarize, thinking about how AI can concretely augment our capabilities, especially from a work and well-being perspective, is something that I get really very excited.

Ross Dawson: Yeah, that’s fantastic. There are a few things I want to come back to. Particularly, this idea of where people want automation or augmentation. The reality is that people only do things they want, and we’re trying to build organizations where people want to be there and want to flourish. We need to be able to—it’s, to your point, some occupations don’t understand AI capabilities. With some change management or bringing it to them, they might understand that there are things they were initially reluctant to do, which they later see the value in.

The paper, Future of Work with AI Agents, was really a landmark paper and got a lot of attention this year. One of the real focuses was the human agency scale. We talk about agents, but the key point is agency—who is in control? There’s a spectrum from one to five of different levels of how much agency humans have in combination with AI. We’re particularly interested in the higher levels, where we have high human agency and high potential for augmentation. Are there any particular examples, or how do we architect or structure those ways so that we can get those high-agency, high-augmentation roles?

Diyi Yang: Yeah, that’s a very thoughtful question. Going back to the human agency you mentioned, I want to just provide a brief context here. When we were trying to approach this question, we found there was no shared language for how to even think about this. A parallel example is autonomous driving, where there are standards like L0 to L5, which is an automation-first perspective—L0 is no automation, L5 is full automation. Similarly, now we need a shared language to think about agency, especially with more human-plus-AI applications.

So, H1 to H5 is the human agency scale we proposed. H1 refers to the machine taking all the agency and control. H5 refers to the human taking all the agency or control. H3 is equal partnership between human and AI. H2 is AI taking the majority lead, and H4 is human taking the majority lead. This framework makes it possible to approach the question you’re asking.

One misunderstanding many people have about AI for work is that they think, “Oh, that’s software engineering. If they can code, we’ve solved everything.” The reality is that even in software engineering, there are so many tasks and workflows involved in people’s daily jobs. We can’t just view agency at the job level; we need to go into very specific workflow and task levels. For example, in software engineering, there’s fixing bugs, producing code, writing design documentation, syncing with the team, and so on.

When we think about agency and augmentation, the first key step is finding the right granularity to approach it. Sometimes AI adoption fails because the granularity isn’t there. An interesting question is, how do we find where everyone wants to use AI in their work for augmentation? Recently, we’ve been thinking about this, and we’re building a tool called workflow induction. Imagine if I could sit next to you and watch how you do your tasks—look at your screen, see how you produce a podcast, edit and upload it, add captions, etc. I observe where you struggle, where it’s very demanding, and where current AI could help. If we can understand the process, we can find those moments where augmentation can happen.

This is an ongoing effort, thinking about how we can bring in more modalities—not just code, but looking at your surrounding computer use—to see where we can find those right moments for the right intervention.

Ross Dawson: So what stage is that research or project at the moment?

Diyi Yang: We just released a preprint called “How Do AI Agents Do Human Work,” this is  exactly related to the Future of Work article. We sampled some job occupations from O*NET, hired both professionals and found a set of AI agents, and recorded the process of how they do tasks. Then we compared how AI agents make slides, write code, and how professionals do the same. We observed step by step where agents are doing things really well, where humans can learn from them, where humans are struggling, and where there might be a better solution offered by human or AI.

With this workflow induction tool, you can really see what’s exactly happening and where you should augment.

Ross Dawson: I looked at that paper, and in the opportunities for collaboration section, it had different workflows. It turned out that where the machine struggled and the human could do something was in finding and downloading a file. So it suggested that the human should download the file and the AI should do the rest, because it could do a lot more, faster—pretty accurately, but not necessarily accurately enough.

So there’s this point: where can humans help machines, and where can AI help humans? But I think there can also be an intent to maximize the human roles, so that where we can augment capabilities, the AI assists, making the workflow more human rather than more AI. That’s one of the problems—call it Silicon Valley or just a lot of current development—it’s about bringing in agents as much as possible. How can we take an approach where we’re always seeking to incorporate and augment the humans, as opposed to just finding where the agent is equivalent or faster, but where the human could benefit by being more involved?

Diyi Yang: That’s a very interesting question. I want to say that I never view this as a competition between humans vs AI or humans vs agents. I view it more as an opportunity: can human plus AI help us do things we couldn’t do before? Our current set of tasks may be much bigger than what we have today. It’s not just about bringing more augmentation or automation to current tasks; it’s about finding more tasks relevant to society that human plus AI can work on together.

Going back to the terms you mentioned—automation versus augmentation—this is a key construct today. But I want to point out something amazing: emergence. It’s not only about automation versus augmentation, because that concept assumes we only have a fixed set of tasks. But what if there are more tasks? What if we solve many existing routine workflows and realize humans can work on higher-value things? That’s the opportunity and emergence we’re thinking about.

From a research perspective, we’re looking at how the technology feels today and how we should think about augmentation, though some of this is constrained by current AI agent capabilities. I’m sure they’ll get much better in the next six months. If we’re just thinking about one task, then maybe models aren’t doing very well for that task, so let’s bring in people to collaborate and get better performance. But from a counter-argument perspective, by observing how humans work with AI, we get more training data, which can be used to train better AI. That means, for that specific task, automation could take a bigger part of the pie, which might not be what we want.

There are both short-term and long-term considerations in human-AI collaboration. Personally, I’m very excited about using current insights and empirical evidence to find more emergence—new areas and discoveries we can do together as a team, rather than framing it as a competition between humans and AI.

Ross Dawson: Yeah, absolutely. I completely agree. As we’re both saying, a lot of the mindset is about getting humans and AI to work together so AI learns to do it better and better, eventually taking the human out. But I think there’s another frame: my belief is that every time humans and AI interact, the human should be smarter as a result, rather than just cognitive offloading.

To your point about emergence, this goes to the fallacy around the future of work being fixed demand. As we can do more things, there’s more demand to do more things—software development is an obvious example. I love this idea of emergence: the emergence of new roles to perform and new ways to create value for society. Is there anything specific you can point to about how you’re trying to draw out that emergence of roles, capabilities, or functions?

Diyi Yang: I think this is a really hard question—can you forecast what new jobs will occur in society? The reality is, I cannot. But I can share some insights. For example, there’s a meme or joke on LinkedIn about coding agents: because coding agents can produce a lot of code, now the burden is more on review or verification. So there’s this new job called “code cleanup specialist.” The skill is shifting from producing things to verification.

I’m not predicting that as a job, but we do have some empirical methods or methodologies that can help. Of course, there are many societal and non-technical factors involved. One thing we’ve been thinking about is identifying hidden skills demonstrated in work that even people themselves aren’t aware of. The workflow induction tool is one lens for that.

All of us find certain parts of our jobs very challenging or cognitively demanding, or sometimes we think, “I could find a different way to approach this,” or “This method could be used for something else,” or “Maybe it inspires a new idea.” There are many non-static dimensions in current workflows. If we could have a tool to audit how we’re doing things—how I’m doing my work, how you’re doing yours, what’s different—we might be able to abstract shared dimensions, pain points, or missing gaps. That could be a very interesting way to think about new opportunities.

For example, if you’re thinking about coding-related skills or jobs, maybe this is one way to reflect on where engineers spend most of their time struggling, and whether we should provide more training or augmentation. I prefer an evidence-based approach. That’s our current thinking on how we can help with that.

The last point I want to add—this is also why I really love this podcast, Human Plus AI. Over time, I’ve realized that talking to people is becoming more valuable, because you get to hear how people approach problems and the unique perspectives they bring, especially domain experts. It’s hard to capture domain knowledge, and much of it is undocumented. That’s the part AI doesn’t have. But if you talk to people and hear how they view their work and new possibilities, that’s how many new AI applications emerge—because people keep reflecting on their work. So I think a more qualitative approach to understanding the workforce today is going to be very valuable.

Ross Dawson: Yeah, absolutely. I believe conversations are becoming more valuable, and conversations are, by their nature, emergent—you don’t know where you’ll end up. In fact, I find the value of conversations is often as much in the things I say, which I find interesting, as in what the other person says. That’s the emergent piece.

Going back to what you said, of course you can’t say what will come out of emergence—that’s the nature of it. But what you can do is create the conditions for emergence. If we’re looking at latent capabilities in humans—and I believe everyone is capable of far more than they imagine, though we don’t know what those things are—how do we create the conditions by which latent capabilities can emerge? Now, AI can assist us in various ways to surface that, maybe through the way it interacts, suggesting things to try. Can you envisage something where AI allows our latent capabilities to become more visible or expressed?

Diyi Yang: That’s also a hard question. Maybe I’ll just use some personal experience. I definitely think that now, when I think about how AI is influencing my own work—as a professor, teaching and doing research—there are many dimensions. For example, I teach a course on human-centered large language models, and I really want to make the human-plus-AI concept clear to my students. Sometimes I’m frustrated because I want to find a really good example or metaphor to make the idea clear, and it’s hard. But AI can help me generate contextualized memes, jokes, or scenarios to explain a complicated algorithm to a broader audience.

On the other side, it helps me reveal capabilities I wasn’t aware of—maybe not capabilities, but desires. The desire to be creative in my teaching, to engage with people and make things clear. I wouldn’t say those are latent skills, but AI helps make my desires more concrete, and certain skills shift in the process.

Earlier, I mentioned that in the future of work, we observe skill shifting in the general population—from information processing to more prioritizing work and similar tasks. I hope we can have more empirical evidence of that. In terms of research, right now it’s more about bi-directional use, rather than helping me discover hidden skills. But we’ve been doing a lot of work to think about how AI can be a co-pilot in our research process.

Ross Dawson: Oh, right. I’d love to hear more about AI in the scientific process. I think it’s fascinating—there are many levels, layers, or phases of the scientific process. Is there any specific way you’re most excited about how AI can augment scientific progress?

Diyi Yang: Yes, happy to talk about this. When we were working on the Future of Work study, I was thinking about scientists or researchers as one job category—how we could benefit or think about this process. One dimension we’ve approached is whether large language models can help generate research ideas for people working on research. This is a process that can sometimes take months.

We built an AI agent to generate research ideas in natural language processing, such as improving model factuality, reducing hallucination, dealing with biases, or building multilingual models—very diverse topics. We gave AI agents access to Google Scholar, Semantic Scholar, and built a pipeline to extract ideas. The interesting part is our large-scale evaluation: we recruited around 50 participants, each writing ideas on the same topic. Then we had a parallel comparison of AI-generated and human-produced ideas. We merged them together, normalized the style, and gave the set to a third group of human reviewers, without telling them which was which. In fact, they couldn’t differentiate based on writing.

We found that, after review, the LLM-generated research ideas were perceived as more novel, with overall higher quality and similar feasibility. This was very surprising. We did a lot of control and robustness checks to make sure there were no artifacts, and the conclusion remained. It was surprising—think about it, natural language processing is a big field. If AI can generate research ideas, should I still do my own research?

So we did a second study: what if we just implemented those ideas? We took a subset of ideas from the first study, recruited research assistants to work on them for about three months, and they produced a final paper and codebase. We gave these to third-party reviewers to assess quality and novelty. Surprisingly, we found an ideation-execution gap: when the ideas were implemented, the human condition scores didn’t change much, but the AI condition scores for novelty and overall quality dropped significantly. So, when you turn AI-generated ideas into actual implementations, there’s a significant drop.

Now we’re thinking about approaches to supervise the process of generating novel research ideas, leveraging reinforcement learning and other techniques.

Ross Dawson: I was just going to say, that paper—the ideation-execution gap—is extremely interesting. Why do you think that’s the case, where humans assess the LLM ideas to be better, but when you put them into practice, they weren’t as good as the human ideas? Why do you think that is?

Diyi Yang: I think there are multiple dimensions. First, with the ideas themselves, you can’t see how well the idea works until you try it. An idea could be great, but in practice, it might not work. On the written form, LLMs can access thousands or millions of papers, so they bring in a lot of concepts together. Many times, if you read the ideas, they sound fancy, with different techniques and combinations, and look very attractive. So, the ideas produced by LLMs look very plausible and sound novel, probably because of cross-domain inspiration.

But when you put them into practice, it’s more about implementation. Sometimes the ideas are just not feasible. Sometimes they violate common sense. The idea isn’t just a two-sentence description—it also has an execution plan, the dataset to use, etc. Sometimes the datasets suggested by AI are out of date, or they’ll say, “Do a human study with 1,000 participants,” which is really hard to implement. That’s our current explanation or hypothesis. Of course, there are other dimensions, but so far, I’d say AI for research idea generation is still at an early stage. It’s easy and fast to generate many ideas, but very challenging to validate that.

Ross Dawson: Yeah, which goes to the human role, of course. I love the way you think about things—your attitude and your work. What are you most excited about now? Where do you think the potential is? Where do we need to be working to move toward as positive a humans-plus-AI world as possible?

Diyi Yang: This is a question that keeps me awake and excited most of the time. Personally, I am very optimistic about the future. We need to think about how AI can help us in our work, research, and well-being. We see a lot of potential negative influences of this wave of AI on people’s relationships, critical thinking, and many skills. But on the other side, it provides opportunities to do things we couldn’t do before. That’s the broader direction I’m excited about.

On the technical side, we need to advance human-AI interaction and collaboration with long-term benefits. Today, we train AI with objectives that are pretty local—satisfaction, user engagement, etc. I’m curious what would happen if we brought in more long-term rewards: if interacting with AI improved my well-being, productivity, or social relationships. How can we bring those into the ecosystem? That’s the space I’m excited about, and I’m eager to see what we can achieve in this direction.

Ross Dawson: Well, no doubt the positive directions will be very much facilitated and supported by your work. Is there anywhere people should go to look at your work? I think you mentioned you have an online course. Is there anything else people should be aware of?

Diyi Yang: If anyone’s interested, feel free to visit the Human-Centered Large Language Model course at the Stanford website, or just search for any of the papers we have chatted.

Ross Dawson: Yeah, we’ll put links to all of those in the show notes. Thank you so much for your time, your insights, and your work. I really enjoyed the conversation.

Diyi Yang: Thank you. I also really enjoyed the conversation.

The post Diyi Yang on augmenting capabilities and wellbeing, levels of human agency, AI in the scientific process, and the ideation-execution gap (AC Ep24) appeared first on Humans + AI.