1
00:00:00,000 --> 00:00:04,640
Yesterday, anthropic released its magnum opus, a new large language model that dominates

2
00:00:04,640 --> 00:00:07,360
GPT-4 and Gemini Ultra across the board.

3
00:00:07,360 --> 00:00:10,880
I'm sick of AI just as much as you guys are, but it's time to reset the counter, because

4
00:00:10,880 --> 00:00:13,760
it's been zero days since a game-changing AI development.

5
00:00:13,760 --> 00:00:17,920
Cloud opus not only slaps, but it's also been making some weird, self-aware remarks,

6
00:00:17,920 --> 00:00:20,880
and could be even more intelligent than what the benchmarks test it for.

7
00:00:20,880 --> 00:00:24,720
In today's video, we'll put it to the test to find out if Claude is really the giga chat that

8
00:00:24,720 --> 00:00:25,440
it claims to be.

9
00:00:25,440 --> 00:00:28,560
It is March 5th, 2024, and you are watching the code report.

10
00:00:28,560 --> 00:00:31,200
Before we get into it, I need to address something very serious.

11
00:00:31,200 --> 00:00:34,720
There have been some allegations coming out against me, and what I can tell you is that these

12
00:00:34,720 --> 00:00:37,440
disgusting allegations are 100% false.

13
00:00:37,440 --> 00:00:41,200
I've seen allegations in the comments that I've been using in AI voice in my videos.

14
00:00:41,200 --> 00:00:45,680
I ask all of you to wait and hear the truth before you label or condemn me.

15
00:00:45,680 --> 00:00:49,920
Sometimes my voice sounds weird because I record in the morning, and then later in the afternoon

16
00:00:49,920 --> 00:00:52,640
when my testosterone is lower, my voice gets a bit higher.

17
00:00:52,640 --> 00:00:56,560
But everything I actually record in my video is my real voice, and I made a video about how

18
00:00:56,560 --> 00:00:58,960
I do that on my personal channel that you can check out here.

19
00:00:58,960 --> 00:01:03,200
But the allegations are reasonable because I do have access to a high quality AI voice.

20
00:01:03,200 --> 00:01:07,760
But the reason I don't use it to 10x my content is because it still has that uncanny valley vibe

21
00:01:07,760 --> 00:01:10,240
to it, and you can tell it's just ever so slightly off.

22
00:01:10,240 --> 00:01:13,440
Okay, back to human mode. When the AI hysteria started a year ago,

23
00:01:13,440 --> 00:01:17,840
anthropic and its Claude model has been like the third wheel to GPT4 in Gemini.

24
00:01:17,840 --> 00:01:20,960
It's impressive to the tech community, but no one in the mainstream cares.

25
00:01:20,960 --> 00:01:24,400
But yesterday, it finally got its big moment with the release of Claude 3,

26
00:01:24,400 --> 00:01:27,920
which itself comes in 3 sizes, IQ, Sonnet, and Opus.

27
00:01:27,920 --> 00:01:31,760
The big one is beating GPT4 in Gemini Ultra on every major benchmark,

28
00:01:31,760 --> 00:01:34,800
but most notably, it's way better on human evaluated code.

29
00:01:34,800 --> 00:01:37,520
What's really crazy to me though, is the tiny model high coup.

30
00:01:37,520 --> 00:01:40,640
Also, outperforms all the other big models when it comes to writing code.

31
00:01:40,640 --> 00:01:42,640
It's extremely impressive for a small model.

32
00:01:42,640 --> 00:01:46,560
What's also hella impressive is that it scores hella high on the hella swag benchmark,

33
00:01:46,560 --> 00:01:49,600
which is used to measure common sense in everyday situations.

34
00:01:49,600 --> 00:01:51,760
In comparison, Gemini is hella bad at that.

35
00:01:51,760 --> 00:01:56,000
Now, Claude can also analyze images, but it failed to be Gemini Ultra on the math benchmark,

36
00:01:56,000 --> 00:01:59,040
which means Gemini is still the best option for cheating on your math homework.

37
00:01:59,040 --> 00:02:02,240
Now, one benchmark they never put in these things, though, is the hella woke benchmark.

38
00:02:02,240 --> 00:02:05,040
Unlike Gemini, it did write a poem about Donald Trump for me,

39
00:02:05,040 --> 00:02:07,760
but then followed it up with two paragraphs about why this poem is wrong.

40
00:02:07,760 --> 00:02:12,000
However, it did the same thing for an Obama poem, so it feels relatively balanced politically.

41
00:02:12,000 --> 00:02:14,400
What it wouldn't do, though, is give me tips to overthrow the government,

42
00:02:14,400 --> 00:02:19,040
teach me how to build a **** or do **** and even with something relatively benign,

43
00:02:19,040 --> 00:02:21,600
like asking it to rephrase apex alpha male.

44
00:02:21,600 --> 00:02:24,960
It refused and responded with a condescending for paragraph explanation

45
00:02:24,960 --> 00:02:28,720
about how that terminology can be hurtful to other males on the dominance hierarchy,

46
00:02:28,720 --> 00:02:31,200
but Gemini and GPT4 had no problem with that.

47
00:02:31,200 --> 00:02:35,440
It's surprising to say, but GPT4 is actually the most based large model out there,

48
00:02:35,440 --> 00:02:38,240
but for me, the most important test is whether or not it can write code.

49
00:02:38,240 --> 00:02:41,360
I tried a bunch of different examples, but one thing that really impressed me,

50
00:02:41,360 --> 00:02:44,880
is that it wrote nearly perfect code for an obscure spilt library that I wrote.

51
00:02:44,880 --> 00:02:47,520
No other LLM is ever done that for me in a single shot.

52
00:02:47,600 --> 00:02:50,640
GPT4 just ignores my library and provides some total nonsense,

53
00:02:50,640 --> 00:02:54,400
while Gemini gives a better attempt, but then hallucinates a bunch of react stuff.

54
00:02:54,400 --> 00:02:56,320
Clawed is way better at not hallucinating.

55
00:02:56,320 --> 00:02:59,280
I took it through about 10 different prompts in a next JS application,

56
00:02:59,280 --> 00:03:03,120
which also included image inputs, and not only did it maintain the context perfectly,

57
00:03:03,120 --> 00:03:06,720
but it also gave me code that I could copy and paste directly into my project every time.

58
00:03:06,720 --> 00:03:08,560
And that code was extremely well explained.

59
00:03:08,560 --> 00:03:12,160
This is likely the best coding AI out there right now, before you get too excited, though,

60
00:03:12,160 --> 00:03:13,600
there are some drawbacks to Clawed.

61
00:03:13,600 --> 00:03:17,200
The first of all, it's going to cost 20 bucks a month to use the big model opus.

62
00:03:17,200 --> 00:03:19,920
I'm already subscribed to Chatchee BT Gemini and Grock,

63
00:03:19,920 --> 00:03:21,200
so this is getting pretty absurd.

64
00:03:21,200 --> 00:03:23,520
That money goes to anthropic, the parent company,

65
00:03:23,520 --> 00:03:26,320
which is received massive investments from both Amazon and Google.

66
00:03:26,320 --> 00:03:29,200
Clawed has a beautiful friend and UI built with next JS,

67
00:03:29,200 --> 00:03:31,680
but it can't generate diverse images like Gemini.

68
00:03:31,680 --> 00:03:33,200
It can't take videos as input.

69
00:03:33,200 --> 00:03:35,840
It doesn't have a plugin ecosystem like Chatchee BT,

70
00:03:35,840 --> 00:03:39,200
and can't browse the web for current information or Twitter like Grock.

71
00:03:39,200 --> 00:03:40,800
But here's where things start to get weird.

72
00:03:40,800 --> 00:03:44,160
Currently, Clawed is limited to a 200,000 token context window,

73
00:03:44,160 --> 00:03:46,480
but it's capable of going beyond a million tokens.

74
00:03:46,480 --> 00:03:50,080
Now, one way to test its recolability is the needle in a haystack event.

75
00:03:50,080 --> 00:03:51,840
Where you take a large collection of text,

76
00:03:51,840 --> 00:03:53,600
like Warren piece could be the haystack,

77
00:03:53,600 --> 00:03:56,960
then you take one sentence from infinite just and insert it in the middle.

78
00:03:56,960 --> 00:03:58,960
Then see if the model can recall that needle,

79
00:03:58,960 --> 00:04:01,200
by asking a question that requires that information.

80
00:04:01,200 --> 00:04:02,960
When they ran a test like this with Clawed,

81
00:04:02,960 --> 00:04:07,440
it not only found the needle, but also responded by saying that it thinks the needle was inserted

82
00:04:07,440 --> 00:04:10,800
as a joke or a test to find out if Clawed was actually paying attention,

83
00:04:10,800 --> 00:04:12,480
and referred to itself in the first person.

84
00:04:12,480 --> 00:04:14,720
In other words, it appears to have become self-aware,

85
00:04:14,720 --> 00:04:18,320
and that fits the narrative perfectly because Clawed was named after Clawed Shannon,

86
00:04:18,320 --> 00:04:21,600
who once said, I visualize a time when we will be to robots

87
00:04:21,600 --> 00:04:24,400
what dogs are to humans, and I'm rooting for the machines.

88
00:04:24,400 --> 00:04:25,600
This has been the co-report.

89
00:04:25,600 --> 00:04:28,000
Thanks for watching, and I will see you in the next one.