Transcript: Preston So - Voice Content and Usability part 1

This is a transcript. For the video, see A fireside chat with Preston So, author of Voice Content and Usability.

[00:00:00] Michael Meyers: Hello, and welcome to Tag1TeamTalks, the podcast and blog of Tag1 Consulting. Today. We have a really special episode for you.

[00:00:07] I'm going to be talking with Preston So Tag1's Editor in Chief about his new book, Voice Content And Usability, which was just published by A Book Apart. I'm Michael Myers, the Managing Director of Tag1 Consulting and I'm really excited about today's talk. Preston is one of the leading subject matter experts in voice content. And this is the first book ever written on the topic. If you create content, if you're interested in voice communication with computers, you're going to love today's episode.

[00:00:35] We have so much to cover that we broke this down into two segments. This is part one. We're going to talk about voice content and usability. In general, we're going to give you an overview. We're going to talk about content strategy, information architecture, as it applies to voice content, usability, testing, how you deploy it and get into the future where all this is going.

[00:00:57] Please make sure you check out part two. We're going to talk about georgia.gov and give you a mini case study on the first voice interface built for the residents of Georgia and how all of this is put into practice. Preston. Welcome. So great to have you as a guest. Congratulations on your, your, you know, your latest book.

[00:01:16] I know it's a tremendous effort to write these and, and thank you so much for joining us.

[00:01:21 Preston So: Hey, thanks, Michael for having me, it's a real pleasure to be on the other side of the conversation this time, and a real pleasure to talk about my book today.

[00:01:28] Michael Meyers: Yep. Usually you're doing the interviewing. It's great to mix things up.

[00:01:31] So what is voice content and, and why is spoken content so unique and different from, from what we're used to doing?

[00:01:40] Preston So: That's a great question to kick off things. And, you know, the very first thing I will say is one of the biggest issues that we've had with the way that we think about content, the way we design content and the way that we approach the whole problem of content is that many times it's really rooted in a lot of the paradigms and a lot of the trappings of the web.

[00:02:01] And one of the examples of this is that when you think about the ways in which the web and websites have completely revolutionized the way that we deliver content, one of the biggest things that we notice about content these days is it's linked, it's related. You've got a lot of breadcrumbs site maps, all of these motifs of the web that really emulate and exemplify some of the most important facets of the web and some of the most important features of the web. But one of the problems with that whole approach is that in many ways, we've gone away from the most natural form of content or the most human form of content, which is a simple conversation.

[00:02:40] Like the one you and I are having right now, Michael. And one of the things that I think is really tough about content these days is that when it's trapped in this written form, when it's trapped in these documents, these pages that are really focused on these. Fixtures of the web, a lot of organizations and a lot of folks in the content world have not really given them a whole lot of thought to how we can actually recast and reformulate and refactor this content.

[00:03:07] So it makes sense in a more human, organic, free flowing, natural, organic, and ultimately more context, less form in the case of voice content and spoken content. And voice content is a really good example of exactly this kind of approach that we mean, which is how do we actually take a lot of the content that we have on the web that we have in newspaper archives that we have in encyclopedias that we have all over that are fixtures of our written world, and actually transform those into voice interfaces and conversational interfaces that can have a meaningful, authentic, and ultimately bonafide conversation with us just as we would have a conversation with somebody like a docent at the museum or a tour guide at a tourist attraction, or even let's say, you know, our favorite person at the deli counter, who's going to tell us about the new sobrasada that he's got.

[00:04:05] Michael Meyers: I'm glad you, you brought this up at the start because this is something that I was thinking a lot about coming into this conversation. You know, we've created so much content, we've got all these great Tag1TeamTalks and, you know, Voice communication is so different. And I was wondering, you know, are there ways that we can leverage the content that we've created? Is that possible?

[00:04:26] Is, is voice communications so different that like, you know, it's just so loosely applies to what we have, like

[00:04:34] Preston So: Yeah. You know, the answer of course is really challenging because the answer is yes and no. Right. I think one of the things that's really important to recognize, and I just wrote a pretty long article about this very problem in my blog, Preston.So. One of the things that's a really challenging paradox of voice content versus written content, is that the way that we speak is very different from the way that we write. One of the examples of this is, you know, you don't really say to whom it may concern when you talk to somebody, you don't really write the word literally, as often as you say the word, literally in conversation with your friends and one of the really big challenges, I think that a lot of folks still have not quite gotten a handle on, and this is still very much let's say unexplored territory is.

[00:05:19] Well, we've got all this written content, but it's written fundamentally in this very formal register in this format that is really conducive to reading, but not necessarily conducive to being recited or being spoken out loud. And I think a lot of the blog posts that we write a lot of the articles that we write, a lot of the marketing content that a lot of organizations produce is still firmly rooted and still firmly planted in this notion of, Oh, this is going to be read by somebody who's scrolling on their phone or scrolling on their browser.

[00:05:50] And not necessarily somebody who wants to actually consume that content on an Alexa device or on a Google home device or on a smart speaker, like a Sonos one. So I think one of the challenges is that, but also the good news is that a lot of the ways that we approach content today are actually very conducive and very amenable to spoken content. And one example of this is, you know, in chapter two of my book Voice Content and Usability. I talk at length about this notion of voice friendly content and how we can actually discover and find the kind of voice friendly content that lends itself to some of these aspects of spoken content that we really want to point towards.

[00:06:31] One thing that we talk about very often at Oracle, for example, is the notion that. Spoken content as part of this new paradigm of pageless content, content beyond the web content off the website content office screen. And it's very important for a lot of folks to think about content management systems like Drupal, like Oracle content management, in the sense of how can we actually deliver this content in a way that could be reusable in a lot of different settings.

[00:06:56] How do we engage with this concept of an omni-channel content strategy that keeps the content completely agnostic to where it's going to end up, but it's still eminently maintainable while being eminently appropriate for all of the settings that you might actually encounter that content in. So one of the good news about a lot of the content that we have is quite a lot of our content is actually in a conversational cadence, a lot of our content is already dialogic in structure or largely, you know, conversationally credential and how it actually manifests itself. And one really good example of this is the really often maligned frequently asked questions, pages that are often on websites, where you ask a question, you get an answer and these questions are usually phrased in a more informal or colloquial let's say register. And a lot of these questions are things that people you could feasibly ask in a conversation that somebody who is, you know, sitting on the other side of the phone hotline or a sitting on the other side of the counter that you're walking up to. So one of the challenges that I think is solved these days is actually finding that voice friendly content is really easy now, transforming that voice friendly content into voice ready content, which means, you know, it's ready to go.

[00:08:07] It's structured semantically, a well-defined for a conversational interface or for an interface that's a whole different story altogether. And a lot of that involves some of the techniques that I outlined not only in chapter two of my book, voice content and usability, but also in my recent list of part article from last year, usability testing for voice content, where I discuss and explore some of these.

[00:08:30] Interesting insights that you can glean when you take the concept, that's already been a fixture of content strategy and content designed for years now, the content audit and apply that content audit beyond its, you know, usual settings of let's say, you know, making sure that you're adhering to GDPR or making sure that you've got content, that's going to be appropriate for SEO and pointing that more towards, okay, how does this piece of content, or how does this series of content actually sound when we put it through an Alexa and listen to it and it becomes oral as opposed to something that's strictly visual.

[00:09:04] So there's a lot of things to unpack there. I think one of the good news is that a lot of folks who are already on some of these content management systems like Drupal, like Oracle content management that have these underpinnings and this foundation of how to actually deliver content in a headless and in a fashion that's really amenable to these experiences beyond the web. Those folks are already in a very good position and it really behooves a lot of these organizations to think about. Okay, now that we're seeing a lot of the pandemic reveal, some of the really very striking sales numbers of immersive headsets, gaming headsets and conversational interfaces and voice interfaces and smart speakers and smart home systems, how will you, your content strategies, your content design approaches, your content management workflows actually adjust to this new reality.

[00:09:54] Michael Meyers: We have a lot to think about in, in how we create content moving forward and how content is being consumed. I really want to dive into usability, but before we get to that at a basic level, you know, how do you create content for voice? Is it essentially a dialogue decision tree? You know, is there a particular approach or is it contextual?

[00:10:16] Preston So: Yeah, it's a great question. And I think a lot of people have this misconception which is a perfectly valid misconception, which is that voice interface design is really just about kind of writing dialogues and writing what are in essence screenplays, right? What are an essence, these scripts that you follow.

[00:10:32] And that is very true. You know, I have an article on my blog called voice design is about good writing and it's absolutely true because you have to write these dialogues that make a lot of sense. That sounds great. And a lot of times I think one of the things that's missing from a lot of the discourse surround voice interface, design, and conversation design at large is that a lot of this work really involves not just writing and not just kind of thinking about how it's written, but also reciting this out loud, doing table reads, doing a lot of this kind of work that you might imagine to be more of something you might find on a Saturday night live production meeting, but is really important to the spirit of voice content. Now that's only part of the equation though. There's several other aspects of voice content that are very important. And a lot of these things have to do with the technology that undergirds voice content. A lot of it has to do also with some of the unique ways in which we have to think about building voice interfaces.

[00:11:26] And I think today, especially as we think about moving some of this content off of the web, really expanding our horizons so that some of this copy some of this media, some of these, you know, some of this texts can actually find its way into, let's say a VR headset or into a voice interface. One of the things that a lot of organizations have to think about, right, is the fact that you've got the written content, but now how do you actually piece that together into something that makes sense? And how do you take the existing content that you've now found that you've made voice ready and actually situate that in something that makes sense? And there's two very crucial structural, or let's say~~,~~ you know, systematic approaches to voice content and artifacts and voice content design that are very important.

[00:12:08] And that is dialogues, which is, you know, basically those screenplays where you have to basically write the dots that you know, how to connect the dots between your content and some of the interface texts, some of the feedback that you want to deliver to the user. Some of the ways in which the interface actually, you know, forms the trusses and the lattice work around those pieces of content that you're actually connecting. And then of course, the most important part that a lot of new conversation design tools have really started to pursue these days is that visual flow or that user journey that really constitutes the information architecture of your voice content.

[00:12:45] One of the most crucial things to recognize is that this is a kind of multi-step process. And all of this involves voice content design. You've got to take your dialogues that are basically this really interesting fusion between your content and interface, text, and help text and feedback, and all of these things that the user wants to hear to better navigate the interface together with this mental map and this mental model of your voice content itself, which is very similar to a site map on a website, but it couldn't be further in terms of how it appears morphologically.

[00:13:16] One of the things that I think is really interesting about voice interfaces is that right now, and this is what I alluded to earlier. When I talk about the fact that we've been really biased towards the web in a lot of ways. And the web has really changed a lot of our perceptions of how content should work.

[00:13:30] One of the problems of our mental models when it comes to the web is that they're very visual. You know, we can navigate websites, just like Hansel and Gretel. With breadcrumbs, we can navigate websites through sitemaps using this kind of road Atlas that gives us the website structure. We use nav bars, but nav bars and links and breadcrumbs than some of these really important things that we use on a daily basis to navigate the web are not something that exists in any situation on a voice interface, especially when you have no visual component in front of you.

[00:14:01] There's no screen sitting in front of you and all you've got is your ear. And all you've got is your tongue. So one of the most important things about information architecture and content design, when it comes to actually structuring these mental models in ways that make sense for users really has to do with more linear, as opposed to networked.

[00:14:20] And more unidirectional as opposed to hub and spoke models that allow the user to feel like they're taking a pathway that doesn't necessarily leave them in many forks in the road, or it takes them down in many different paths, in many directions that can end up confusing them. And this is one of the things that's really interesting is that we have a big privilege with the visual nature of the web, in that the fact that we can see content and structure content within the ways in which our optic nerve actually transmits those things to our brain is a really big luxury because when we actually talk to people in the form of the voice interface, we're not actually you know, navigating through a network of different pieces of [00:15:00] information, we're in a very tunnel vision kind of mindset.

[00:15:03] And it's very important for us to follow this linear. You know, let's say a uni-directional approach to structuring our content and structuring our information architecture in ways that really make sense to the user. And I talk about this quite a lot in my new book.

[00:15:18] Michael Meyers: I mean, so much about what we do changes is, is really a brave new world.

[00:15:23] You know, I think about big calls to actions and emails and, you know, that's like basic fundamentals of, of what we think about in our approach to writing content doesn't always translate well or has to be done really differently when you're creating voice content. I would imagine, like, not all content translates well to voice, like, you know, you send the FAQ's or a clear map you know, are there things in particular that translate well, and are there things in particular that don't like, you know, we're going to talk later about a case study, like.

[00:15:57] Had can that match you a voice world, you know, like or, or does it really just lend itself to very specific kinds of things?

[00:16:05] Preston So: Yeah. This is a really interesting question and a very tough quandary, right? Because I think one of the biggest challenges that we face today is that most of our content is long form.

[00:16:15] Right? Most of our content is what I have called in the past macro content, which means that it's not very granular it's paragraphs and paragraphs and paragraphs of content. It's these Russian novels of content that are really difficult to, you know, split up into different chunks that are actually much more amenable to a voice interface.

[00:16:34] And it's really a tough decision because a lot of organizations, they want that easy peasy. Let me just like take this existing content I have and just put it through some kind of a machine, like a woodchipper that just gives me all of these pieces of content that I can now just plunk into my voice interface.

[00:16:52] Unfortunately, it doesn't really work that way. And I think a lot of content strategists and content designers really do understand this. This notion of the fact that one of the biggest problems in paradoxes that we face with this whole new world of voice content is it requires a lot of potentially new content or a lot of new writing of content.

[00:17:10] But it's important to recognize as well that, and I, and I know we'll get to it in the next installment of this, but one of the things that's important to recognize as well as that, we don't have the luxury of writing, let's say 15 different versions of content and one that's destined for a voice interface.

[00:17:22] One that's destined for a website, one that's destined for you know, for a VR headset over here. So a lot of organizations are kind of stuck. And one of the things that I find is very helpful is to think about voice content as potentially a subset of the content you've already got on the website, because let's be, you know, let's face it.

[00:17:38] A lot of the content that we've currently got on our websites is not something that we can very easily make available through a voice interface. Maybe things like teasers, or maybe things like summary versions of our content or things that are more. Appropriate. But ultimately when it comes to a lot of users who are going to be interacting with this content, many of them are still going to feel more comfortable with interacting with this long form macro content in the context of a browser, as opposed to an, a voice interface.

[00:18:03] However, as you said, There are plenty of examples of let's say voice friendly content or voice friendly applications of content that we can talk about right now. And some of those are things like you know, lists of content or things like FAQ's, as we mentioned earlier, which is a big focus of what we did for Georgia.

[00:18:21] But I think the, you know, the other kind of, let's say a dark horse in a lot of this is that a lot of the kinds of content that we work with today involve progressive disclosure, right? Involve this situation where we have. Various pieces of content that are really, really small and really atomic that actually reveal themselves down across the page and kind of follow this hierarchical, hierarchical, excuse me, hierarchical order.

[00:18:45] So when you talk about things like, for example you know, accordion based content, where each time you go down the page, you've got some kind of you know, new piece of content that opens up based on a previous piece of content you've read. A lot of those kinds of structures are really convenient for applying to voice content.

[00:19:03] But it is one of those things where a lot of organizations have gone both ways. Right? I think a lot of organizations have chosen to go in the direction of writing completely new content. Yeah, writing parallel content. That's really solely destined for the voice interface. And a lot of times that is the right approach, but the big question that a lot of these teams really need to answer and come to terms with is okay, what does that do for long-term maintenance?

[00:19:25] What does that do for content planning? What does that do for content design and content strategy where you have potentially these two versions of content that might be going out of sync very quickly, and who's going to manage those. Do you have a budget to potentially manage the, you know, these two different content silos and what's involved in that sort of very, very tough calculus.

[00:19:43] I talk about this quite a bit in my talk actually on June 11th called How to make the move to from headless CMS to true Omni channel and Omni channel X, which was really about this problem of when we talk about CMSs and content management, we really want to focus on a [00:20:00] single source of truth for our content.

[00:20:01] But what happens when we need to think about content in all of its different rich manifestations, this kaleidoscope of content that really is potentially very context heavy, but also very context light.

[00:20:14] Michael Meyers: You're giving me a lot of anxiety, man, thinking about, you know you know, having to maintain this body of content, like it re you know, it really resonates with me because that's what I've been thinking about in the back of my mind is limited budgets, limited resources.

[00:20:28] You know, it's hard enough for many organizations to create quality content to begin with. You know, now this is an entirely separate thing that we want to consider that, you know, it's a lot to manage and it has to derive a lot of value. I'm wondering, are there even more permutations? Like, is it device specific or dependent, you know, like within voice content, is it somewhat fractured or, you know, does the voice content I create translate well across the board?

[00:21:00] Preston So: Yeah. So this is a really, really interesting question. And, and, you know, I'll use this also to pivot into a different topic that I think is very important to cover as well before we.

[00:21:08] You know, close the books on this today. The first is that one of the problems that I think we have is that a lot of this voice technology and a lot of this spoken technology has been either very, very tough to implement very tough to architect, very time off to build for because it's very hardware specific, but in recent years, in particular, we've seen a huge proliferation of some really exciting tools, some low code, no code platforms that are out there and even platform agnostic platform tools that are out there like bots society or dialogue flow that allow for designers to create these flows and dialogues and various structures that actually translate at the end of the day into these versions of their voice interfaces or conversational interfaces.

[00:21:50] That could be a chat bot, a Slack bot, a Facebook messenger bot you know, a Google home assistant or an Alexa skill, all of these different things. This does call into question a lot of the things that I mentioned earlier around. Okay. But you know, written conversational interfaces, are they the same as spoken conversational interfaces?

[00:22:10] And is it really a good idea to kind of allow for these things to just kind of be merged into a single let's say set of dialogues or a single set of information flows? And I think it's a really debatable question. One of the things about the fracturing of the voice technology world, though, is that just as we've seen with social media, just as we've seen with a lot of the mobile world, for example there is this let's say very problematic, but you know, very, let's say challenging oligopoly that's emerged when it comes to the fact that, you know, fundamentally when it comes to the main players in the voice space, you've got Apple, Amazon, Google you know, Samsung, you could say IBM, you could say as well.

[00:22:48] And I think one of the biggest issues with that as well, when you have this really strong concentration in this you know, very. You know, high degree of power and this high degree of control all bundled together within these companies. What does that do for the sorts of voice assistance and the sorts of conversational interfaces that people can design and that people can hear themselves in.

[00:23:09] One of the big concerns that I shared yesterday on another podcast is that I think one of the biggest issues with a lot of this concentration of power in these very small, these very large technology companies is that we're building oftentimes a lot of this assistance and a lot of speed synthesizers and a lot of these you know, pieces of hardware that really don't reflect the richness of the ways that we speak.

[00:23:32] And one of the key questions that I ask in my book is, you know, when you actually hear. Someone like you know, Amazon Alexa or Microsoft Cortana or Apple Siri, or you know, some of these assistants speaking, who is the person that you're actually picturing in your head. And is that something that is potentially different from, or impacting your ability to trust in others who don't share, let's say that same exact voice.

[00:24:00]So I think one of the biggest issues around these, this, this, this, this whole notion of conversational interfaces and voice interfaces in particular is the fact that the more that we let these voice assistants not reflect the world that we actually live in, the harder it gets to actually achieve, let's say language equity or dialect equity, where we're hearing a lot of the same toggles that we use when we code switch between different languages or different dialects actually reflected in the same modes of speech that our voice assistants that our voice interfaces use.

[00:24:32] So when it comes to. A lot of these companies, I think there's a lot of interest in these technologies. There's a lot of really interesting approaches that will happen. But one of the big concerns I have, especially with the approaching conversational singularity, which is this notion that someday we'll have the ability for voice interfaces to have a conversation with us.

[00:24:54] That's indistinguishable from a conversation that we'd have about, let's say you know, some kind of a new product that at, at the, at the deli counter with our, with our favorite person, there that's something that is very worrying because the question is that's going to be an estate, you know, this kind of conversation, that's perfect and optimal, but for whom and for whom does this, you know, Realm of new conversation, these new capabilities actually privilege.

[00:25:19] And who does this this whole potential in this whole visionary realm of voice interfaces is actually potentially devalue and end up oppressing in the end of the day. So it's a very, very tough problem. And I talk about this quite a lot in my book because it's a very important question that we have to keep asking, especially as some of the most problematic you know, aspects of big technology continued to kind of bear themselves and, and show themselves to the world.

[00:25:45] Michael Meyers: Yeah. We need to think a lot more about diversity inclusion, equity and how these technologies reflect that. ~~T~~hat's a really important point. W w we want to get onto, you know, the case study, but before we wrap this up where do you see things going? You know, you talked a little bit about the, you know, the singularity you know, that that's a little bit longer term, you know, what do you see as sort of the, you know, the, the more immediate future of things?

[00:26:10] Like, what should we be looking forward to?

[00:26:13] Preston So: Yeah. There's a really interesting paradox right now. And it's very interesting because on my recent podcast recording with CMS wire I had this very conversation about this really interesting paradox that we see about the ways in which voice interfaces are growing.

[00:26:26] One of the problems right now is that every organization has their own voice interface. If you think about the fact that capital one has one Domino's pizza has one, you know, XYZ brand has one, ABC brand has one. Well, what that's actually doing is that's causing some of the fragmentation of some of the fracturing that we talked about just a little bit earlier.

[00:26:45] And it's a really interesting conversation because. Well, the whole goal of a lot of these companies, these large companies, especially Google and Amazon and Apple, they actually compete with each other on the basis of being able to melt away some of these differences and blur the lines between some of these distinct organizations.

[00:27:03] And. Make available the entirety of the web, right? And that's the whole idea of this notion of conversation centric design. This whole notion of the conversational singularity is, Hey, you know, the fact that I can't ask capital one about how to order a pizza is something that is arbitrary, right? It's an arbitrary line between these voice interface is.

[00:27:22] So I think one of the most important things we'll see in the future of voice content and the ways in which these organizations deliver their content handle these transactions is what is this tension? That is, you know, how is this tension going to resolve itself, where you've got all these different organizations who are building their own chat bots, building their own voice interfaces.

[00:27:40] But at the end of the day, Amazon, Apple, Google, they want to melt away all of those differences, those distinctions, those lines in the sand and make their own core interface the way in which you interact with the world at all. And that means, you know, potentially the way you call an Uber is through Apple Siri or through Amazon Alexa, not through, let's say a voice interface that Uber has created or an interface that is separate to, or installable on some of these interfaces. It's attention that is really being talked about quite a bit these days. I'm very interested to see how it's going to actually manifest in the next few years.

[00:28:15] Michael Meyers:] That's fascinating. I hadn't considered that, but you know, I do so much Google searching to find everything. It makes total sense what you're talking about uh how things are gonna consolidate and be more homogenous. And that has some positives, but also a lot of concerning factors. And there is so much, I wish we could talk about so many topics but we're at a time.

[00:28:35] I thank you so much for joining us for the folks that are listening, make sure to stick around for part two, we're going to talk about the georgia.gov and a little mini case study about how they're using a voice interface for their residents and put all this into practice. There's a ton of links in the show notes.

[00:28:51] Please check them out. Preston has written so much about this content and I love your domain name, Preston. Preston.So is phenomenal. You really killed it on that front. Folks, you can check out our past Tag1TeamTalks at Tag1.com/TTT as always. We'd love your feedback and input on this episode.

[00:29:10] Topics suggestions for the future, you can write to us at Tag1TeamTalks@Tag1.com that's T A G. The number one.com. Thank you so much for tuning in until next time. Take care. _