Yesterday, anthropic released its magnum opus, a new large language model that dominates
GPT-4 and Gemini Ultra across the board.
I'm sick of AI just as much as you guys are, but it's time to reset the counter, because
it's been zero days since a game-changing AI development.
Cloud opus not only slaps, but it's also been making some weird, self-aware remarks,
and could be even more intelligent than what the benchmarks test it for.
In today's video, we'll put it to the test to find out if Claude is really the giga chat that
it claims to be.
It is March 5th, 2024, and you are watching the code report.
Before we get into it, I need to address something very serious.
There have been some allegations coming out against me, and what I can tell you is that these
disgusting allegations are 100% false.
I've seen allegations in the comments that I've been using in AI voice in my videos.
I ask all of you to wait and hear the truth before you label or condemn me.
Sometimes my voice sounds weird because I record in the morning, and then later in the afternoon
when my testosterone is lower, my voice gets a bit higher.
But everything I actually record in my video is my real voice, and I made a video about how
I do that on my personal channel that yoľu can check out here.
But the allegations are reasonable because I do have access to a high quality AI voice.
But the reason I don't use it to 10x my content is because it still has that uncanny valley vibe
to it, and you can tell it's just ever so slightly off.
Okay, back to human mode. When the AI hysteria started a year ago,
anthropic and its Claude model has been like the third wheel to GPT4 in Gemini.
It's impressive to the tech community, but no one in the mainstream cares.
But yesterday, it finally got its big moment with the release of Claude 3,
which itself comes in 3 sizes, IQ, Sonnet, and Opus.
The big one is beating GPT4 in Gemini Ultra on every major benchmark,
but most notably, it's way better on human evaluated code.
What's really crazy to me though, is the tiny model high coup.
Also, outperforms all the other big models when it comes to writing code.
It's extremely impressive for a small model.
What's also hella impressive is that it scores hella high on the hella swag benchmark,
which is used to measure common sense in everyday situations.
In comparison, Gemini is hella bad at that.
Now, Claude can also analyze images, but it failed to be Gemini Ultra on the math benchmark,
which means Gemini is still the best option for cheating on your math homework.
Now, one benchmark they never put in these things, though, is the hella woke benchmark.
Unlike Gemini, it did write a poem about Donald Trump for me,
but then followed it up with two paragraphs about why this poem is wrong.
However, it did the same thing for an Obama poem, so it feels relatively balanced politically.
What it wouldn't do, though, is give me tips to overthrow the government,
teach me how to build a **** or do **** and even with something relatively benign,
like asking it to rephrase apex alpha male.
It refused and responded with a condescending for paragraph explanation
about how that terminology can be hurtful to other males on the dominance hierarchy,
but Gemini and GPT4 had no problem with that.
It's surprising to say, but GPT4 is actually the most based large model out there,
but for me, the most important test is whether or not it can write code.
I tried a bunch of different examples, but one thing that really impressed me,
is that it wrote nearly perfect code for an obscure spilt library that I wrote.
No other LLM is ever done that for me in a single shot.
GPT4 just ignores my library and provides some total nonsense,
while Gemini gives a better attempt, but then hallucinates a bunch of react stuff.
Clawed is way better at not hallucinating.
I took it through about 10 different prompts in a next JS application,
which also included image inputs, and not only did it maintain the context perfectly,
but it also gave me code that I could copy and paste directly into my project every time.
And that code was extremely well explained.
This is likely the best coding AI out there right now, before you get too excited, though,
there are some drawbacks to Clawed.
The first of all, it's going to cost 20 bucks a month to use the big model opus.
I'm already subscribed to Chatchee BT Gemini and Grock,
so this is getting pretty absurd.
That money goes to anthropic, the parent company,
which is received massive investments from both Amazon and Google.
Clawed has a beautiful friend and UI built with next JS,
but it can't generate diverse images like Gemini.
It can't take videos as input.
It doesn't have a plugin ecosystem like Chatchee BT,
and can't browse the web for current information or Twitter like Grock.
But here's where things start to get weird.
Currently, Clawed is limited to a 200,000 token context window,
but it's capable of going beyond a million tokens.
Now, one way to test its recolability is the needle in a haystack event.
Where you take a large collection of text,
like Warren piece could be the haystack,
then you take one sentence from infinite just and insert it in the middle.
Then see if the model can recall that needle,
by asking a question that requires that information.
When they ran a test like this with Clawed,
it not only found the needle, but also responded by saying that it thinks the needle was inserted
as a joke or a test to find out if Clawed was actually paying attention,
and referred to itself in the first person.
In other words, it appears to have become self-aware,
and that fits the narrative perfectly because Clawed was named after Clawed Shannon,
who once said, I visualize a time when we will be to robots
what dogs are to humans, and I'm rooting for the machines.
This has been the co-report.
Thanks for watching, and I will see you in the next one.