Mind Readings: What Will Make Or Break Spatial Computing

Warning: this content is older than 365 days. It may be out of date and no longer relevant.

Mind Readings: What Will Make Or Break Spatial Computing

In today’s episode, we dive into the concept of spatial computing and its potential implications. Apple’s new goggles, Apple vision Pro, introduce their take on virtual reality and spatial computing. However, the success of these devices relies heavily on addressing latency issues, both in motion and input. Latency can break the immersive experience and affect the device’s usability. While Apple has a track record of delivering seamless user interfaces, the crucial factor will be how well they handle motion and touch interactions within the virtual space. Join me as we explore the significance of latency and its impact on the future of spatial computing. Don’t forget to hit that subscribe button if you’re interested in this evolving technology.

Mind Readings: What Will Make Or Break Spatial Computing

Watch this video on YouTube.

Can’t see anything? Watch it on YouTube here.

Listen to the audio here:

Download the MP3 audio here.

Machine-Generated Transcript

What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for watching the video.

Today let’s talk about an uncomfortable reality with large language models.

That uncomfortable reality is this.

The safer you make them, the less creative and useful they are.

Yep.

Here’s why and trigger warning for profanity, there will be profanity in this video.

We use language as humans in all sorts of ways.

But things like profanity, for example, is part of our language.

The more that you have to censor a model, the more that you have to censor a piece of software to say to tell it don’t do these things, the more you constrain what it can do, and in the process, it loses context.

Here’s what I mean.

Remember that these models are nothing more than prediction engines, even the most sophisticated ones, the biggest models like GPT-4 from OpenAI, or palm two from Google with like, what 570 billion parameters.

They’re just prediction engines.

If I say I pledge allegiance to the, the prediction engines gonna come up with a list of anywhere between five and 40 different alternatives for the next word is going to score them by probability, and almost certainly the probability is gonna be near 100%.

For the word flag, because I pledge allegiance to the flag is a very common sentence.

When these models are trained on data that has been scraped up from around the web, among other sources, it builds statistical relationships from one word to the next.

So for example, if I say I’m brewing the, depending on the context, the words that it was trained on, and the words that I’ve mentioned my prompt, it’ll choose a word like coffee, or tea or beer or the fall of capitalism.

And in doing so, it’s relying on the patterns in language that it learned on input.

If you look at a lot of the training libraries, explicitly declared or implicitly declared from big companies, for example, in in an interview with Lex Friedman, Mark Zuckerberg had said that the Facebook llama model was trained on data from Facebook’s many services, right, facebook, whatsapp, Instagram, etc.

What’s embedded in most people’s language? Yeah, profanity, racism, bias, you name it.

Particularly if you’re drawing from Facebook, I mean, there’s a whole bunch of people there who think that the world is still flat, which is mind blowing.

And because it’s ingesting those patterns and language, if you then have to go back and say, don’t say X, or Y, or Z, you’re essentially handicapping the model, you are handicapping the model.

And it’s not just going to stop using words you don’t want it to use, but it’s also going to have to adapt and figure out how to use words in less creative ways that don’t evoke those topics.

So if you say, understandably, incorrectly, hey, don’t be racist.

And there’s a good chunk of racist text that was fed into the model.

Suppressing that not only suppresses racist language as you would normally want to, but it also impacts all the other words that are used in that context.

And it impacts their overall probabilities.

If I say, Go fuck yourself, all right.

And then we say, You know what, let’s not use the word voc.

There is a statistical relationship.

In that sentence between the word go, the word fuck, and the word yourself.

And if you see that a lot, and it’s a very common phrase, right? GF why these words are associated with each other.

Now, if I suppress it, or try to remove or censor that the word voc and you’re left with Go yourself, which makes no sense, right? And it breaks the probabilities around those words.

So those words go and yourself are actually going to be negatively impacted by the suppression of the profanity.

To get it, how these words are related to each other.

And the more you censor words, the more you have to come up with alternatives that may not be as good.

Now, clearly, and let’s be very, very clear.

It’s a good idea if you’re going to be using any of these models, particularly in any kind of professional context, to censor things like racism, and bigotry and hate speech and substantial profanity.

But you also have to understand that it will cause computational problems in these models.

How do you get around this? The short answer is if it wasn’t in the training data to begin with, it wouldn’t be a problem on the output side, but we don’t have control over how these models are trained.

And there are very few companies that can actually build these things that have enough data to do the training like Google or Facebook or OpenAI.

And so we have to essentially handicap the models on their outputs.

Now, I believe there are probably some technological solutions to do this better that the industry isn’t talking about enough yet, I believe there are some interesting things being done with adversarial models, which basically say, you know, here’s what I’m looking for you to not do, and sort of getting into arguments, semantic and metaphorically, with, with the language model, to help it to not do those things more.

But if you want maximum creativity, you would have to use a model that has also is has problematic concepts and textin.

It Right.

If you want to maximize what a model can do, you will probably have to accept that you’ll use a model that has a higher potential to say things you don’t want to say, right? So you’ll want have to build some gatekeeping in on on its outputs to say to to inspect outputs, and so that the model can be as creative as it wants to be, and then can get smacked down later on in the pipeline.

So yeah, let’s we’re not going to generate the sentence back for generation over time, I suspect companies.

And if I had to guess, company, my guess would be IBM, because they’re not known for being first to market.

But they’re typically known for being best to market, particularly on the Enterprise stuff.

I would expect companies like IBM to say, hey, we’re going to build a custom model that doesn’t include profanity, that doesn’t include racism, and bigotry and homophobia, we’re going to exclude those things from the source training data to begin with, so that it’s not there on the output, and the output side can’t be there on the output side, because it didn’t exist on the input side.

And that’s what we’ll have to do to if we want models that are have not had their, their creativity handicapped, but also have not taken problematic texts and concepts with them.

So the bottom line is if you want creativity, you also have to accept a model that has problematic text.

If you want a model to behave professionally, you’re going to have to handicap it significantly and the outputs may be lower quality as a result, that’s the current trade off as the time they’ll mid year 2023 That I’m recording this.

It’s entirely possible tomorrow, this could all changed by the way so it’s worth keeping your ear to the ground to see what other things are likely going to happen to help these models become smarter, and more professional.

Thanks for tuning in.

We’ll talk to you next time.

If you’d like this video, go ahead and hit that subscribe button.

Christopher Penn 0:00

Today, let’s talk about spatial computing.

So this is a term that Apple came up with, with the launch of its new goggles, Apple vision Pro, and the idea that of spatial computing.

Now, this is not a new field by any means.

If you are familiar with Google Cardboard, or the Oculus headset and HTC Vive and all these other companies that make these, these virtual reality environments, spatial computing is basically Apple’s spin on virtual reality.

Now, here’s the thing that I’m wondering about this is purely wondering, because I’ve not had a chance to test these devices.

The local Apple Store obviously does not have them yet.

So there’s no way to know.

But the issue with spatial computing has not really necessarily been about image quality, or immersion so much as latency.

So latency is one of the biggest problems within virtual reality within spatial computing.

There’s and there’s a couple of different kinds, right there is there’s motion latency where if you are if you’re wearing this thing right on your head, and you turn your head, does the image move in sync with your body? Right? If there’s even the slightest delay, you notice, you can feel oh, that’s kind of weird, right.

So that’s certainly one aspect.

But the other aspect is input latency.

And input latency is when you are doing stuff.

So Apple’s device supposedly uses eye movement tracking, and your hands, the Oculus can use that.

It also also has these controllers, right, so you use these controllers to control what you’re doing.

And even these have a little bit of lag, not a ton.

But it’s enough to break the illusion, it’s enough to, to to sort of throw you out now, in the controllers, the controllers are pretty fast.

So that’s why Oculus, for example, had great games like beat Sabre, because they were very, very fast, highly responsive.

And so you, you could pretend to have lightsabers cutting up these little objects flying at you on screen.

But when you got into using their vision based where it’s using the onboard cameras, it was really terrible.

Like it was a substantial amount of latency to the point where it was almost unusable.

So that those issues of latency are really what will make or break a device like Apple vision Pro or the next generation Oculus headset or whatever.

You’ve got to get the latency, right.

And the good news is, at least for Apple, they have a good history of getting stuff like that correct.

In terms of user interface.

If you’ve ever used Apple’s pencil on the iPad, you’re looking at it fresco that’s $139 for a stylus, like that seems excessive.

But when you try it like, Okay, this actually behaves like a real pencil on paper.

And no other stylus feels the way it does on Apple hardware.

It is seamless.

When you’re painting.

And like Adobe fresco, you’re like, Okay, this, this is pretty realistic, the pressure sensors, you know, let you draw.

So I’m optimistic that Apple will get that, right.

But it really is going to be the the motion in a space to be able to say like, Okay, I’m going to touch this thing.

And you reach out and you’re grabbing your touch, and you move these things around in this virtual space.

They’ve got to get that right, they’ve got to get that right, because that’s what’s going to separate and justify the enormous price tag even once they come out with lower cost models, because I’m sure they will.

It’s called Apple vision Pro for a reason that will probably be an apple vision.

And knowing Apple will be an apple vision, err, and Apple Pro Plus, Apple vision Ultra, our apple vision Max unit with Apple will come up with stuff like that.

But it is the latency that will really define how this thing feels, and whether or not as a success, especially if you’re going to use it in any kind of industrial application or enterprise application where you want somebody to wear one of these things for four or five, six hours a day.

There are virtual workspaces available in the Oculus, they suck.

They are terrible.

The resolution is not great, but the latency of moving around moving screens around and stuff in here is it’s just unusable.

So this is why this is just now part of my backdrop.

And not something that I use on a regular basis.

So those are some initial thoughts about the spatial computing thing.

Some things for you to think about as you evaluate these these tools.

Yes, there are plenty of use cases we saw demos.

During Apple’s announcements we saw all sorts of conversation, but when When it comes time to try these out, measure the latency measure how it feels to move around in that environment.

And if it feels good, it might be worth the price tag if it feels even the slightest bit janky it’s probably not worth the price tag and it’s gonna give you a headache.

So, that’s it for today.

Thanks for tuning in.

We’ll talk to you next time.

If you’d like this video, go ahead and hit that subscribe button.

Machine-Generated Transcript

Christopher Penn 0:00

Comments

Leave a Reply Cancel reply

Pin It on Pinterest