#CSUNATC18 Audio: A New NVDA Add-on Can Describe Website Images, Plus Accessible Approaching Buses

J.J. Meddaugh Apr 06, 2018 2:32 PM ET

DescribeIt is an NVDA add-on which uses Microsoft's Computer Vision API to identify graphics on web pages. J.J. speaks with Manshul Belani, Project Associate for the Indian Institute of Technology who explains how the add-on works as well as a system for identifying busses being tested in India.
Blind Bargains audio coverage of CSUN 2018 is generously sponsored by the American Foundation for the Blind.

Transcript

We strive to provide an accurate transcription, though errors may occur.

Hide transcript
Transcribed by Grecia Ramirez

Almost live from beautiful San Diego, it’s blindbargains.com coverage of CSUN 2018, featuring team coverage from across the Exhibit Hall and beyond, brought to you by the American Foundation for the Blind.
On the American Foundation for the Blind website, you’ll find everything you need to know about blindness and visual impairment. Search our national job bank, discover the history of Helen Keller, read our blog on current issues, find professional resources, and even more. Our site is completely accessible. Check it out at www.afb.org.
Now, here’s J.J. Meddaugh.
J.J. MEDDAUGH: We’re in the Blind Bargains suite, CSUN 2018. I’m with Manshul Belani, who is working with the Indian Institute of Technology and has some really cool new ideas and concepts and working prototypes and products that they’re developing and wanted to tell us a little bit about them.
Welcome to the podcast.
MANSHUL BELANI: Hi.
JM: Thank you so much for coming on. We brought you on for a couple different reasons. One of them – and the one that caught my attention when I saw it in the session list – is you’ve developed, or are working on, an NVDA add-on to help describe images on websites and other places. Is that an accurate description?
MB: Yes. Yes.
JM: Cool. Tell us a little bit about it.
MB: So, like, we came across this problem – we started with the solution for defining, basically, custom labels in web elements for NVDA. So while developing them, we came across a point where we realized that, for images, it would be really helpful if the blind person, instead of taking help from an outside person to add the custom label to the image, can directly generate a description for the image himself or herself and add it as a custom label for future reference.
So this is when we started working on this problem, and then we basically went through a lot of research, found out about the different kind of technologies which we can use to basically provide these descriptions.
JM: And you’ve, kind of, picked the perfect time for this, because a lot of major companies have come out with API’s and means for development now, so you’ve chosen the Microsoft – the Cloud Vision API --
MB: Exactly. Yeah.
JM: -- which of course is the same one that Seeing AI –
MB: Yes.
JM: -- and also the descriptions that are being done in Office, et cetera, et cetera, so – pretty powerful; right? Over the past couple of years, the optic recognition –
MB: Yeah. Yeah.
JM: This wouldn’t be possible even, maybe, two or three years ago.
MB: Yeah. And in fact, while developing the add-on, I have seen the API improving over time. So I think it’s going to get better with time.
JM: So how would it work? If you’re on a website, would you click – right-click on an image, or what’s the means that you would describe?
MB: So – like, when you’re navigating through a webpage –
JM: Uh-huh.
MB: -- and you come across an image, you simply have to use the NVDA key shortcut, NVDA plus G. So whatever you’re NVDA key is, Caps Lock or Insert –
JM: Yeah.
MB: So NVDA plus G gives you the one-line description of the image, and the confidence level from the API as to, how confident is the API about the description. Because sometimes, the API is, like, giving a description, but is not very – like, the confidence is 30 or 40 percent, then probably, it’s not a very appropriate description.
JM: Sure. That makes a lot of sense. Is it in English, or is it in different languages, or –
MB: So right now, we are only fetching it in English. But I think, if we can word it – the NVDA language – then, of course, it will automatically be –
JM: Sure.
MB: Yeah.
JM: And of course, I guess, somebody, later on, could try to do translation.
MB: Yeah. And also, you can have a detailed description by the shortcut, NVDA plus G, G. So it gives you the foreground and background color as well.
JM: Is that coming from the same API, or is it –
MB: Yup. That – yeah. Yeah. From the same API.
JM: Oh. Okay. I didn’t realize they provided that information.
MB: Yeah. And you have it in a virtual box -- virtual message box, so you can navigate – since it’s a lot of information, then if you have all the four things together – description, confidence, and foreground and background color so you can – we are showing it in a virtual box and not just as the speech output so you can navigate up and down again and again using the screen reader.
JM: So a lot of graphics on webpages are a little harder to find because they may not even have any sort of alt text, so they’re sort of –
MB: Alt text.
JM: -- or they’re hidden from view. Is there a way to navigate to graphics where, maybe, you aren’t even sure – I’m thinking, say, like, on Twitter, where they have images posted, but – sometimes, it’s hard to actually locate where the actual image is in the virtual buffer. Is there a way to account for that, or how would you do that?
MB: So right now, we are not confident with that. What I have simply -- that is, like, you use the shortcut G for moving from one graphic to another.
JM: Ah.
MB: Yeah. So –
JM: And that’s going to stop it – right. And that will actually move the focus.
MB: Yeah. Yeah. So as soon as you know that you are on a graphic, you can use the shortcut.
JM: What’s the time that it takes to do one of these?
MB: So, like, when you press, it says, fetching image description. Please wait. And then it takes about a second or a second and a half or so to return and give the output.
JM: Have you thought about inserting the description into the virtual buffer at the point – at the cursor when that’s been done?
MB: Yes. We have been thinking about that, but then again, the problem comes – what if the description is not accurate?
JM: That’s true.
MB: Yeah.
JM: Maybe – there would almost have to be a way for someone to –
MB: Yeah.
JM: -- discard –
MB: Because alt text, again, has its own guidelines and a way which has – it has to be returned. So then, we need some intelligent national language processing techniques and all, put it together in a way that it matches the guidelines for alternate text, and then, of course, we can put it in the virtual buffer of NVDA.
JM: Sure. And alternate text and an image description are not necessarily the same thing –
MB: Yeah. True.
JM: -- as well. You know, you might – alt text might just be the text of something, where the image would say, this is a logo with text, et cetera, for instance.
What about collaboration to try to say, you know, okay. If one person already identified something, would you think about doing sharing? Or do you think the API is going to improve over time to where you’d want to always get a current description?
MB: So, yes. I believe the API is going to improve over time. It’s like, a self-learning kind of API.
JM: Uh-huh.
MB: But, like, today in the morning, in the session, also I got the suggestion that maybe, you can have collaboration when -- where one person has identified, they can put it at a place where other people can directly access it. So I think that kind of a solution can work hard upon.
JM: Sure. And of course, a lot of these things –
MB: Yeah.
JM: -- could be options. Is there a way to report back to Cloud Vision, your confidence level or say, they messed this one up, to let Microsoft know?
MB: So Microsoft – I haven’t checked this feature. I know they have a feature where you can go and annotate your images. Like, you put images, annotate them, give their descriptions, and you can give – pass them for the, you know, machine learning process to take it into their processing. But I’m not sure if you can actually tell it, okay. This is my confidence level of a particular image.
JM: Sure. I know on caption – if you do that captionbot.ai, which I think uses the same system, it will ask you, is this correct --
MB: Oh. Okay. Uh-huh.
JM: -- or something like that.
So it’s on webpages only right now, or can it be expanded to other areas?
MB: So right now, it’s on webpages. But yeah. We can definitely do it for documents as well, as long as the image is retrievable from the document.
JM: Now, is this something that’s available now for people to download and use on NVDA?
MB: Yes. So, yeah. It is available now.
JM: Okay. And I think – I think I saw it on GitHub; right? Or where is the – how do people get ahold of this?
MB: Yeah. So it is there on my GitHub account for sure. And we – I will soon be posting it on the Assistech website as well. So as soon as I get back, I’ll get it posted on the Assistech website.
JM: And we’ll definitely link to that in our notes for this podcast. What’s the website for Assistech?
MB: So that’s www.assisech.iitd.ac.in.
JM: And can you spell Assistech?
MB: A-s-s-i-s-t-e-c-h.
JM: Okay. Now, this is just one project. This is the one that you’ve been working on, but there are others as well. Let’s just kind of briefly mention a couple of these.
Hey. Another low-cost refreshable braille display. You guys are working on that. So obviously, lots of people are in this space right now, so there’s lots of exciting upgrades and involvement in this area.
MB: Uh-huh.
JM: What makes this braille display that you guys are working on special? What’s unique about –
MB: So like, the Assistech lab – our motive here is to basically create affordable technologies. So this braille display is, again, going to be an affordable braille display, much lesser in cost than the other braille displays in market right now.
JM: Uh-huh.
MB: Also, we have – we are coming out with two versions: The 20-cell display and the 40-cell display. The 20-cell display will be having a Perkins-style keyboard, but for the 40-cell display, we are actually providing a full QWERTY keyboard with the braille display device.
JM: So where are you in the process right now? How far away from release is this going to be.
MB: So we are planning to launch it in the next six to eight months, hopefully; if everything goes well.
JM: And when you say much less expensive, are you thinking around the cost of some of the other ones that are out there, or will it be cheaper than them, even?
MB: Oh. I am not very sure about the cost right now, but yeah. Definitely much cheaper than the most famous ones out there now.
JM: Sure. And it doesn’t even have a name; right? It’s just – it’s on the –
MB: Yeah. Yeah. So – we have to, like – we are in the final processing and finalizing stage of the product.
JM: Very exciting time for braille, especially.
One other thing you mentioned to me, which -- I thought it was really interesting, on a completely different problem -- dealing with navigation and trying to figure out which bus is pulling up.
MB: Yeah. So this was one problem which we came across -- is very common in India. When a bus approaches, a visually impaired person does not know which bus is on the bus stop. Also, the busses do not stop at the very precise location. They can stop right – just behind the bus stop or a little farther away from the bus stop.
So the problem – this problem we are solving, we have a small, handheld device for the user, which basically connects to a bigger device, which is there in every bus of the fleet –
JM: Uh-huh.
MB: -- of the city. And both of them are connected through IR. So the person can basically use a Scan button to scan all the busses which are within the vicinity of 30 meters of the user. Like, I’m on a bus stop, and I press the Scan button, so any bus which is around 30 meters of me, the device will tell me the list of all the busses.
JM: Oh. So if there was more than one, the IR would recognize –
MB: Yeah. So it will give me the list of all the busses which are there. The bus, whichever I want to select, when the device announces that bus, I press this Select button. And now, this response goes to the bus device, which is there, mounted on the bus. And there’s a speaker on the bus -- would start speaking the bus number. So basically –
JM: When you activate it. Yes. You activate the bus from the button –
MB: Yeah.
JM: -- or whatever on the device.
MB: So this has two or three advantages. That one, the person knows that, okay. My bus is on the bus stop. Two: Since the speaker is just beside the entry door of the bus, the person basically gets an idea of, okay. This is where the bus is stopped right now, and this is from where I have to enter. That’s an audio cue. And the third is that the driver on the bus also gets to know that a VI person is trying to board the bus.
JM: Okay. Why IR versus smartphones? Is it because of the smartphone usage over there, or was this cheaper?
MB: Yeah. So that has – that is one question that we have come across a lot. A lot of times, because a lot of people say that, you know, we already carry a smartphone. We don’t want to carry another device. So we are also in the process of creating on-boarding, as well as off-boarding, apps for your mobile as well.
JM: Okay. So you can –
MB: -- within the same solution.
JM: Sure. Do the same idea – an idea –
MB: Yeah.
JM: What’s the cost per device and for, like, a bus fleet to integrate –
MB: So the bus device is a little more expensive, but that is something which has to be taken care of by the government. The handheld device -- again, the price is not fixed, but I think around 600 to 800 Indian rupees per device.
JM: Which really isn’t that much, if you were to convert it, so –
MB: Yeah. Yeah.
JM: Okay. Great. Lots of really cool projects and ideas over there, so definitely – thank you so much for sharing. And again, the – give that website one more time.
MB: It’s www.assistech.iitd.ac.in.
JM: Awesome. And if people want to send comments to you or anybody else in the team, what’s the best way to Email?
MB: Yeah. So it’s assistech.iitd@gmail.com.
JM: Awesome. Thank you so much. We really appreciate you coming by.
MB: Thank you so much. Thank you for having me.
For more exclusive audio coverage, visit blindbargains.com or download the Blind Bargains app for your iOS or Android device. Blind Bargains audio coverage is presented by the A T Guys, online at atguys.com.
This has been another Blind Bargains audio podcast. Visit blindbargains.com for the latest deals, news, and exclusive content. This podcast may not be retransmitted, sold, or reproduced without the expressed written permission of A T Guys.
Copyright 2018.

Listen to the File

File size: 18.7MB
Length: 0:

Check out our audio index for more exclusive content
Blind Bargains Audio RSS Feed

This content is the property of Blind Bargains and may not be redistributed without permission. If you wish to link to this content, please do not link to the audio files directly.

Category: Shows

No one has commented on this post.

You must be logged in to post comments.

Or Forgot username or password?
Register for free

J.J. Meddaugh is an experienced technology writer and computer enthusiast. He is a graduate of Western Michigan University with a major in telecommunications management and a minor in business. When not writing for Blind Bargains, he enjoys travel, playing the keyboard, and meeting new people.

#CSUNATC18 Audio: A New NVDA Add-on Can Describe Website Images, Plus Accessible Approaching Buses

Transcript

Listen to the File

Share this Post

More you May Like