This is a transcript. For the full video, see eLearning with Kids at Home featuring HTML5 Speech-to-Text: Tag1 TeamTalk #013.
Preston So: [00:00:00] Hello and welcome to another episode of Tag1 Team Talks. I'm Preston So your host and moderator, and I'm going to be speaking today with our dear friend Laslo Horvath based in Vienna, senior level developer at Tag1. And of course, our good friend Michael Meyers, managing director at Tag1.
Welcome to our special edition of our Tag1 TeamTalk series. We're doing a little bit of a different bent on the series today. This month we are all dealing with coronavirus and this awful pandemic. the impact that social distancing is having on our lives is, is palpable. and for those of us who have children, it's an incredible challenge to be able to work with our kids, make sure they're not bothering us while we're trying to work and all in all, and make sure that they're keeping healthy and keeping also mentally healthy during this really tough time.
One of the biggest things that we know is happening with a lot of parents around the world is everyone has to be homeschooling now, whether you're using distance learning programs or e-learning tools or their own homeschooling curricula, and we wanted to share a little bit about how on the Tag1 team, we're using technology to make it more interesting for the kids.
And also easier on us and joining us today. We've got Laslo, and I want to talk a little bit about the challenges that led you on this interesting path. We've got a solution for your own schooling. so as I understand it, also in Vienna. Schools have stopped. and kids still have to keep on learning, right?
I mean, what's, what's your kind of situation, right? Like right now.
Laslo Horvath: [00:01:26] Yeah. So, schools in like, mid-March, schools were closing, actually. They have like, like, emergency, running schools. Basically, if you are some of the staff that has to go to work like a medical staff or, or like grocery stores or police.
Things like that. You can take your kids to school. They have like, they will be taken care of there. But like the government ask everyone who is in a position to stay at home with the kids to do it. And this is like what, I don't know, like 75% or 80% of the people did. Of course you want your kid that told me material.
If I'm staying home, I want my kid to be here as well. Not taking him to school. And so they did a lot of distance learning programs, but those are more focused on like. More more exam, like more, they just want to see if your kid did the exercises and if it did learn and the learning itself is something that, that you have to do at home with the kids.
So you get a lot of paper and you get a lot of things or you also get, get the exercises you have online, but that's just material. So like the real, the teacher role is still on you. It's still something that you have to do and different countries have different approach for this. There are countries that like have a whole television programs dedicated to learning.
So you know, and here they decided, okay, let's, let's use the cool learning platforms that we have. Unfortunately, they couldn't use Google Scholar because of privacy concerns. As it always is with EU. So they have their own platform. It's called Anton. It's pretty cool. But as I said, it's mostly just for distributing material and like examining, if it's your kids, like if you submit it, then they know like, okay, you took a picture of of everything that they did, and that's it.
So like my kid is preschool, so what they do is they just learn the letters and learn how to read. And basic mathematics, like subtraction, addition and things like that. So like for me, the biggest thing was like they, my, my son, he has to learn how to read, write and has learned all the letters. And I was like, okay, since I have a lot of work, we have, we have a project that should launch soon.
So I have to invest a lot of hours on that. And you know. The most productive hours during the day when you can actually work. Also the most productive hours for the kid as well, when he has most concentration and he can, he can perform, perform at his best. So I don't want him to start learning at 7:00 PM. I want him to do it like he did it in school from nine or 10 until the noon.
Because preschoolers, they don't have such as long school day says it's our kids. So I was, I had the idea after talking with a friend, because while a while back I did some, some speech to text and speech recognition implementation. So I was like, yeah, maybe you can, we can use this. Let's see what's the current state and how.
Like how good it is and you know, if we can automate it, why not? We're software developers. We're like used through automating a lot of things. So this was something that sounded, sounded like a, a nice challenge. And if successful, it's good, actually helps save time. So that's where the vertical thing came from.
Preston So: [00:04:57] Very cool. And one thing you mentioned that's very interesting to me is that, you've got a background in speech to text, text to speech, and, you know, one thing I know about you Laslo, was that, you know, years ago you built an ERP system, and this was back in the early days of kind of speech recognition and speech synthesis.
Could you talk a little bit about what that was like and, and sort of, what was the prototype you ultimately came up with?
Laslo Horvath: [00:05:19] Of course, so, well, back then, I used to work for a pretty big company here that, that was building an ERP system. So basically they were building on top of SAP. So, you know, SAP has a lot of modules, but if you want something custom built, it costs a fortune.
So they were basically dealing with, with companies that needed something specific that was not there in SAP. And then they built the whole ERP enterprise based on that. So, they needed something, like, they wanted to do a showcase. There's like a - It was like a summit in, in Germany that's yearly.
And they wanted to show off something cool. So they wants to do AI and back in the day, like the whole AI movement was going towards the chatbots and things like that. So I was like, yeah, why not? We can, we can do a sort of like a chat bot on the, on in the whole front end of the. Of the application where you could just like type in sentences and the software will interpret those and you know, do something with it.
So , there were a couple of platforms that were active back then. So Google published theirs, Facebook published theirs and Microsoft just came out with it. It was called Lewis. I'm not sure. I haven't checked in the meantime if they change it, but it was their, their AI platform and it was the best of the bunch.
So this is not like, not, not marketing or anything. It was just really the easiest to use because the goal was that also the management itself could teach the platform, the sentences. So I don't know. You can type in random sentences. You, you have in the Microsoft tool itself, you teach it. Like you say, okay, this is what I want to do.
This is the intent. These are the subjects in the sentence. And then after you give it a bit of learning. It was able to like return your, the intent. So the goal was, I dunno, you want to order hand sanitizers, 50 of them, you can just type in, like order me 50 hand sanitizers and it would go through the backend because the backend already had some workflows and everything and you would end up with an order that was ready to be approved, and that was it.
So it was pretty cool for a showcase. We weren't sure if there's actually some. Real life use cases for it, you know, because executives and things like that, so tend to be older people who are not so technology prone, but for a showcase. It was cool. And then I was like, okay, I know that HTML5 has a speech recognition, standard.
So I was like, why not try that? So I just built a layer upon the whole whole, natural language processing that was first do the speech text. And then I just passed the sentence on and, and do it from there. So it was pretty cool and it actually worked. But that was 2016 so , it was just the beginning and it wasn't that good.
I mean, my, I have a bit of an accent in English, but it was terrible at recognizing what I wanted. So if a German tried it with a thick accent, it would never go, go, go as planned. And they just had the English back then. So. The locals were, we're also depending on like the browser you were using. So like Google Canary had German and Spanish and Italian and French, but the stable Google, Google Chrome version didn't have it.
So after just, you know, just the showcase and I tabled it, it was fun. It was like a nice research project. I did something cool and you know, I, it was in the back of my mind and now I was like, okay, let's revisit it. Let's see how far it does come. And it has really come a long way. So it has with Siri, with, with Echo, with, with all the other other tools, they really learned a lot about natural, about speech recognition.
So it's now in a much better state. So it's really usable now. So let's say.
Preston So: [00:09:18] Very cool. And, you know, I have a very strong interest in voice myself and chatbots. And I think, you know, the, the, the problem just pointed out around, speech recognition being very difficult for people who don't speak, let's say the expected dialects or the expected languages is a really big problem.
Very much so. I'm so, I'm so very curious, you know, could you show us, how it works? And, you know, this tool you built is really, really fascinating. I think, you know, it's a language pedagogy tool. It's to help, you know, students learn, by reciting sentences that are written on the page. and just for those of us who are listening, instead of, watching this episode of Tag1 TeamTalks, Laslo, or I, you know, either one of us, could go ahead and describe what's going on. Actually, maybe it's easier for you Laslo to do that since you've gotten the controls, and if you could just describe what's going on at the same time so we can make sure that we're, we're doing this for our listeners and anyone using the assisted devices.
Laslo Horvath: [00:10:10] Just to notice, this is now in German because you are in Austria, and my kids goes to German school, but we will switch to English, the letters page, but let's start with this one, with the default one. So you should be able to see my screen now. So basically what we have is a very simple user interface where we have a text area to input the input, the text that you want to read for the kid to read.
And this is something that kids, the kid does itself basically. First he has to type in the text, practicing letters, and after that he has to read it. And then the software checks if he's reading correctly. So I, I would just go through to a short sentence that's like a normal, normal sentence that the preschooler should be able to read here.
So, that's, "Das Monster wird gans blass vor Glück", "Bis morgen! ruft es, Okay. And then they get like a nice, nice animation fireworks that they know that they did their job correctly. So this, I mean, normally kids, preschoolers read a bit, read a bit slower, which is good because it's easy for the software to check one word at a time.
And yeah, that would be, that would be it. Yeah. In a short demo. So, it's, it's actually like, we are doing it daily and like kids are used to the touch screen technology. Kids are used to computers, you know, and it's something that they are eager to do. Like. I, I really, I'm really fortunate.
My son loves books as well, so it's not a problem to get to give him a book and he'll like read it for half an hour or something or just look at it. But you know, they're, I know a lot of parents who have good kids, but you know, if you have a tablet, if you have an iPad, why would you take a book?
You know, it's interactive. You can have like a video. So you have everything done. A book is, it's kind of lame compared to that, you know? So. It's like interactive ways, I think much better for them, for the kids themselves.
Preston So: [00:12:33] So question about that solution you built. I think it's a really amazing tool.
And I can see you're going up against Duolingo here very soon. There's a, there's a big question that I have, which is, is there a third language available beyond just German and English? Can you do it in Hungarian as well?
Laslo Horvath: [00:12:49] Yeah, it's possible. So I then tried it in Hungarian. huh. I'm not sure, but, every language that's supported by the browser itself, you can do it.
So basically just involves changing one line of code. We can do it in English. So you'll see if you just changed the local and type a sentence in that language, and it works.
Preston So: [00:13:10] So that's, that's one of the benefits of using, I think, this, HTML5, underlying API because you can rely on the browsers instead of having to supply all of the, you know, speech recognition algorithms yourself or any of that kind of stuff, which is just how people used to do it. You know, back in the early two thousands and before it was wild time.
Laslo Horvath: [00:13:28] I was actually doing my university studies. I was part of, like when I was doing my masters, I was part of a team that was doing it in Serbian because I studied in Serbia and they had to do everything from scratch and they did like a automated, like if you call the bus line, you want to know when the bus is leaving.
So they did it for blind person so that you know, they can do it with just language. And it was so much work. And I did I think a Skype game back then. Skype had this API where you can build games and we did a game, I think a Mark Young puzzle or something like that for, for blind people that they can just do by voice.
It's really cool. But that was nothing like easy, like the HTML standard, you know, this is what this whole thing is a hundred lines of code. So it's really, really simple. It's nothing major.
Preston So: [00:14:18] Yeah. Let's dig into that a little bit. You know, we talked a little bit about the HTML5 standard and how speech recognition is now part of it. and it's come a really long way in the last few years. I know, I know. I know that as well. but I understand that, you know, you, you put together a hundred lines of code, about a hundred lines of code, and it's HTML5, which is probably a fair amount of the, of the lines of code, and JavaScript and SVG animations.
I mean, so, how did you set up the JavaScript to be so simple here?
Laslo Horvath: [00:14:43] Yeah, it's, it's basically the speech recognition itself. So all the, all the logic in the code is just parsing out which word was read and comparing it to the results that the speech recognition API gives back and it has a, , it triggers, like after hearing , it listens for a bit.
You can configure that as well, and then it returns you a result. And after it is certain that it heard the final result, if you, after you pause in your speech, it tells you, okay, now this is the final result. And it gives back. normally it gives back multiple words with a certain, like. Probability they say like, okay, I'm 98% certain that this is what the user said.
So for me it was simple because I don't need to do any parsing. I just compare the word with the highest probability with the word I was expecting to hear. And if that, if it matches, I Mark it as green. The kid knows, okay, now I can read the next, next word. It was easy, but in a more like sophisticated software, you would maybe go through the list and check, okay, maybe one of these.
If, if I wanted to achieve this, maybe like this is, this is what the user actually meant. Or you can actually, how the chatbots are doing, they ask back. So did you mean like whatever, or even Alexa asks, maybe you meant this and not that. So this is this, these are some, like if you have a workflow where you go through steps, you can backtrack, one step, ask question, get it right, and then you can continue.
So this is, but this is all integrated in, in the standard itself, so it's really easy to work with it. So my job was comparing strings.
Michael Meyers: [00:16:25] Is this all built into the browser? Would it work offline?
Laslo Horvath: [00:16:28] Yeah, it's actually, I, I'm, I'm doing it offline now, so I mean, not offline, but that just opened the HTML file and that's it.
Michael Meyers: [00:16:38] Wow. That's really amazing.
Laslo Horvath: [00:16:40] No, server, nothing. Just, I didn't want to do any Laravel backend or anything. That's why I kept it vanilla JS and HTML because I can just send it to anyone and they can just open it. Then that's it.
Preston So: [00:16:52] Yeah. And I think that's one of the real benefits of doing this in, and by the way, that telltale sign was the URL bar for any of us who were watching that demo.
You could see that it was not on the web. so, you know, you said, you know, no server, no, nothing's just HTML JavaScript in the browser. And, and I, I love the fact that you did that architecture in such a way where, you know, you don't have to set up Laravel, you don't have to have somebody you're going to share this with use Laravel.
I think this is really one of the key things that makes this application so compelling for those of us who are potentially working with a lot of folks who are dealing with social distancing and a lot of the coronavirus kind of effects of the pandemic. we want something simple. We want something that's very quick and easy.
Now what, so I want to talk a little bit about your plans for this because you, cause you just mentioned, you just shared with us that you're going to be potentially sharing this with other people. what, what's, what's your kind of next steps in the roadmap for this? you know. Like, how many years will it take for you to beat to Duolingo to a pulp?
Is there any sort of a prediction you have on that front?
Laslo Horvath: [00:17:54] I don't, I don't have any major, major plans for now. It's just I would, I would like to publish it because there are three of us now, which I shared it with a couple of friends and they tried it and they were like, yeah, why not? It's cool. So let's share it with more people.
So I want to clean up the code a bit cause I have some logging and things like that where I experimented a bit. So I'll publish. Then I make the repository, GitHub repository public. So I just like share it with people so that they can check it. They can, they can take a look. Maybe somebody will want to contribute as well.
And I do have a couple of minor tweaks that I want to do because this is for like really basic preschool level kids. I want, I would like to introduce, like levels. Because if somebody has a kid who was in third grade, they should be reading fluently. So they want to, maybe the software should check for whole sentences and not just words.
So though the matching algorithm has to be specific for, for that level, and the for even more advanced kids mean it's not, I don't think they will need to use it, but, but maybe whole like paragraphs should be read like in one go and then you can just compare. If it was correct. Yeah, that was, that was one thing.
And I would like to introduce lessons because now you have to type in, it was purposefully done because the kid first types it in and then it's read. But, maybe just choosing lessons, you know, where you have predefined text and, and that's it. So you don't have to copy paste it from somewhere and you just have it in there.
So these would be things that would be interesting. For, for the future.
Preston So: [00:19:35] Absolutely. And I, you know, I actually think that there's a lot of potential here, not just for preschool. you know, students are those who are just looking to read and write, but also for language learners in general. you know, I think if you're, if you're learning German, or if you're learning a Hungarian, let's say, then, then these are really great, approaches for that.
Laslo Horvath: [00:19:51] Yeah, of course.
Preston So: [00:19:52] Very cool. Now, so the, you know, I want to dig a little bit into some of the speech recognition, because I think that's really interesting here. You know, one of the, one of the challenges that you just called out was there's a very big difference between analyzing an initial, just to just a single word and matching that to a string versus, let's say, an entire paragraph.
Do you see any challenges, in kind of scaling up to the ability to get to that advanced level where you can read an entire paragraph and have that be interpreted by the parser as well.
Laslo Horvath: [00:20:24] It's actually possible. So with the standard, with the API itself, you can configure it in a way you have to play around with it.
So you have to tweak it in that case. So this is, this is like basic default settings. If you tweak the timeout. So the wait for, for the person to read, if you manipulate the results a bit. It's fairly easy. So it would not, it won't take just 90 minutes. It could take a bit more work, but it's not too complicated.
So actually, you know, when you, when you have a challenge, if you don't have an idea how to do it, it's complicated. If you actually have like a pattern in your head and you know what to do, it's not that complicated. So in this case, it's doable. It's not something that's extraordinary or takes like really , certain amount of skill. It's, it's built into the API. It's just leveraging what's, what's already in there.
Preston So: [00:21:20] And I think that just points to, you know, how much of this is becoming. More and more a part of the foundation of browsers that we already have. You know, I think that there's a really good benefit to living in this era where we can rely on these APIs that already exist, that are, that are, you know, we're standing on the shoulders of giants.
We don't have to rebuild all this stuff from scratch. and I think it really points to a great, exciting future for, for voice. in this context,
Laslo Horvath: [00:21:42] Of course it's, for me, it's amazing, like what you can build together. You know, you have tools like if this then that you can. You can leverage that. You can combine it with your own implementation.
For something like this, you know, you can trigger things. It's amazing if you have the time that's required for it and people have time now, you know, you're sitting at home, you can do something besides your normal work. And you know, I like to learn things. I, I'm sure, I mean, most of the developers are like this.
You know, you like to tinker with things you'd like to try things out, and this is something that you can do. It's really crazy how much you can automate right now. You know? Like I, I'm already starting to build like a command line command interface for me where I can just tell my computer, okay, start my Tag1 customer X project and start the timer.
You know, I don't have to click manually, always, and it will do it. Then they finally , I'll just tell it, please stop. And you know, so things like this, you can play around with it, see where it goes. This is, this is the interesting part, and as you said, it's amazing what, what everything that's on offer, the what you can leverage.
And I think a lot of people are not even aware of how easy it is. You know, you think you look at Alexa and you're like, wow, you know how this is so complicated? No, it's not. You know, it's just try it and you'll see.
Preston So: [00:23:05] Definitely. And, well, you know, I'm definitely looking forward to seeing that GitHub repository because I'm certainly gonna pull it down.
You know what's funny is that, I, I started learning at the Hungarian recently and I have a copy of the book, Süsü which is a very popular children's book in Hungarian. And, it will be great for me to be able to put in some of those sentences into this and see if I can get my Hungarian pronunciation to be a little better.
Laslo Horvath: [00:23:29] You chose ? I mean, hung Hungarian is like my native language. Serbian Hungarian. I'm from a mixed marriage. So you chose the hardest language. Everybody says like, you know, Hungarian is like really hard, but yeah, feel free to try. I will. I will try it out the Hungarian and I will. I'll let you know, like you can just configure it, then try it.
Preston So: [00:23:50] That'll be a fun experiment. We'll, well we'll report back on another episode on how that went. All right. Well, we are running out of time here, so I wanted to take a little bit of time to do our normal segment that we always do. Every single episode, which is the Aside Tag, looking for a better name still. If you have one of your open for suggestions, it's not a very good name, but I like it.
so just for maybe one minute. For each of us. Let's share something that we've been interested in recently. Something cool that's been going on. something that we're doing right now that we find interesting Laslo, let's start with you.
Laslo Horvath: [00:24:21] Yeah. So actually, I was looking forward to Laravel 7 release and because it has the technology that we talked about last time, the Livewire, but it came out with really, really a lot of cool features.
And, One of them is they changed like the way you can manipulate strings in Laravel. So yeah, you can now chain calls together. So it's really, it's really like they took what was a really great development developer experience and they took it like to a whole other level. So you know, it's now, it's now, it now offers so many things that, you know, it takes me and I'm experienced in Laravel, you know, I always follow everything.
And it takes me like two hours to go through the release log and see like, okay, now I can use this, I can use this. And, and it's just, yeah. For me, the biggest news in the, in the last two, three weeks was the release of Laravel 7. I've been slowly migrating the projects to use it. There's just one breaking change or braking-ish, so to say.
So they really do this amazing job where. Where migrating to new version is a pleasant experience. So it was for me, that's like what, what I like noticed in the, in the last couple of weeks. What, what, what was like my preoccupation.
Preston So: [00:25:42] That's amazing. I haven't actually tried out the new version of Laravel, so I'm definitely gonna pull that down.
Laslo Horvath: [00:25:47] We have xBlade now you don't have to use the, the Blade templating style. You can use like this, this the same one that would use in Vue.js. So that's pretty cool. And you have Livewire. Now you have seamless integration between front and back and so try it out definitely, it's cool.
Preston So: [00:26:05] That's awesome.
Wonderful. Well, thanks for that. And I'll share something that's going on with me. Speaking of voice and speaking of speech recognition and all of these things that relate to conversational interfaces like chat bots and so on and so forth. I just shared recently, last week, my first ever article for A List Apart, it's called usability testing for voice content.
it's that alistapart.com if you're interested in anything to do with what, we talked about today from more of the user experience side of things, the designer side, the usability researcher side of things, or general user experience practitioner ideas about voice design. that's all there. So, please feel free to check out that article.
and Meyers, what's going on in your world? Oh, he's frozen.
Michael Meyers: [00:26:48] The boonies, the country in the mountains, and I'm loving it up here. It's kind of fortuitous given all the social distancing, the challenges. We have no cell reception and almost no internet connection. And now, you know, in addition to myself and my wife, we have, other people, you know, working from home all the time.
And so our internet connection is horrific. And so I started looking into how I could solve this problem because I promised my wife I could. And, I came across this amazing solution called a Open MPTCP router. And it leverages multipath TCP, which is the ability to communicate over multiple IP simultaneously.
So I've set up multiple modems, using, I was able to get a cell connection, you know, with an antenna array and , our DSL line, which unbelievably gets less than two megabits down and barely one up. and it does true aggregation. So I can glue together these lines, and get the aggregate bandwidth.
it basically communicates with a private server that I have at Google cloud. and so the modems, you know, through a VPN talk together to the VPS, and it then talks to the internet and strings it back down to us. and you know, the previous solutions I looked at or you know, were typical load balancing, you know, like least use connection.
I'm robbing people between modems, and I'm, I'm blown away. And I've tested another solution called a Speedify which is a little bit more sane. It does all of this for you, out of the box. but the Open MPTCP system has been performing better. and I love to do a Tag TeamTalk on it because it's one of the most fascinating things I've worked on in a while.
And it was really cool to get back to, you know, system administration and some coding and testing. so I'm really excited about it and, and it seems to be working. Our connections, you know, in addition to being slow are really unreliable. And it also adds a high availability component because any one of the modems can drop out seamlessly in the background and the connection continues to work.
So I'm, I'm thrilled and, we're, we're close to a, a viable solution for the absolute basics.
Preston So: [00:28:57] This has been a long running saga for, for you, for across many episodes. well I do wish you luck and I hope that you can get back to the reception that we all know and love.
And, we are just about at a time. thank you so much for joining us on this latest episode of Tag1 Team Talks, all the links that we mentioned today, whether that means the HTML5 speech recognition API or some of the other things that we talked about today. Also, whenever Laslo's repository goes live.
All the links that we mentioned are going to be posted online with this talk and when the code is up. We'll also link to it there. If you liked this episode of Tag1 Team Talks, please remember to Share, Upvote, Subscribe . send it to your grandparents and your loved ones. Check out past talks at tag.com/tagteamtalks.
As always, we'd love your feedback and any topic suggestions if you want to hear about a certain topic. If you want to bring Laslo back to talk more about Laravel . If you want to bring some of our old guests back, talk more about their projects write to us at tagteamtalks@tag1consulting.com I want to give a big thanks to our friends Laslo and Michael Meyers here today and thank you so much for joining us.
Until next time.
Laslo Horvath: [00:30:00] Thanks guys.