1 00:00:00,000 --> 00:00:04,640 Yesterday, anthropic released its magnum opus, a new large language model that dominates 2 00:00:04,640 --> 00:00:07,360 GPT-4 and Gemini Ultra across the board. 3 00:00:07,360 --> 00:00:10,880 I'm sick of AI just as much as you guys are, but it's time to reset the counter, because 4 00:00:10,880 --> 00:00:13,760 it's been zero days since a game-changing AI development. 5 00:00:13,760 --> 00:00:17,920 Cloud opus not only slaps, but it's also been making some weird, self-aware remarks, 6 00:00:17,920 --> 00:00:20,880 and could be even more intelligent than what the benchmarks test it for. 7 00:00:20,880 --> 00:00:24,720 In today's video, we'll put it to the test to find out if Claude is really the giga chat that 8 00:00:24,720 --> 00:00:25,440 it claims to be. 9 00:00:25,440 --> 00:00:28,560 It is March 5th, 2024, and you are watching the code report. 10 00:00:28,560 --> 00:00:31,200 Before we get into it, I need to address something very serious. 11 00:00:31,200 --> 00:00:34,720 There have been some allegations coming out against me, and what I can tell you is that these 12 00:00:34,720 --> 00:00:37,440 disgusting allegations are 100% false. 13 00:00:37,440 --> 00:00:41,200 I've seen allegations in the comments that I've been using in AI voice in my videos. 14 00:00:41,200 --> 00:00:45,680 I ask all of you to wait and hear the truth before you label or condemn me. 15 00:00:45,680 --> 00:00:49,920 Sometimes my voice sounds weird because I record in the morning, and then later in the afternoon 16 00:00:49,920 --> 00:00:52,640 when my testosterone is lower, my voice gets a bit higher. 17 00:00:52,640 --> 00:00:56,560 But everything I actually record in my video is my real voice, and I made a video about how 18 00:00:56,560 --> 00:00:58,960 I do that on my personal channel that you can check out here. 19 00:00:58,960 --> 00:01:03,200 But the allegations are reasonable because I do have access to a high quality AI voice. 20 00:01:03,200 --> 00:01:07,760 But the reason I don't use it to 10x my content is because it still has that uncanny valley vibe 21 00:01:07,760 --> 00:01:10,240 to it, and you can tell it's just ever so slightly off. 22 00:01:10,240 --> 00:01:13,440 Okay, back to human mode. When the AI hysteria started a year ago, 23 00:01:13,440 --> 00:01:17,840 anthropic and its Claude model has been like the third wheel to GPT4 in Gemini. 24 00:01:17,840 --> 00:01:20,960 It's impressive to the tech community, but no one in the mainstream cares. 25 00:01:20,960 --> 00:01:24,400 But yesterday, it finally got its big moment with the release of Claude 3, 26 00:01:24,400 --> 00:01:27,920 which itself comes in 3 sizes, IQ, Sonnet, and Opus. 27 00:01:27,920 --> 00:01:31,760 The big one is beating GPT4 in Gemini Ultra on every major benchmark, 28 00:01:31,760 --> 00:01:34,800 but most notably, it's way better on human evaluated code. 29 00:01:34,800 --> 00:01:37,520 What's really crazy to me though, is the tiny model high coup. 30 00:01:37,520 --> 00:01:40,640 Also, outperforms all the other big models when it comes to writing code. 31 00:01:40,640 --> 00:01:42,640 It's extremely impressive for a small model. 32 00:01:42,640 --> 00:01:46,560 What's also hella impressive is that it scores hella high on the hella swag benchmark, 33 00:01:46,560 --> 00:01:49,600 which is used to measure common sense in everyday situations. 34 00:01:49,600 --> 00:01:51,760 In comparison, Gemini is hella bad at that. 35 00:01:51,760 --> 00:01:56,000 Now, Claude can also analyze images, but it failed to be Gemini Ultra on the math benchmark, 36 00:01:56,000 --> 00:01:59,040 which means Gemini is still the best option for cheating on your math homework. 37 00:01:59,040 --> 00:02:02,240 Now, one benchmark they never put in these things, though, is the hella woke benchmark. 38 00:02:02,240 --> 00:02:05,040 Unlike Gemini, it did write a poem about Donald Trump for me, 39 00:02:05,040 --> 00:02:07,760 but then followed it up with two paragraphs about why this poem is wrong. 40 00:02:07,760 --> 00:02:12,000 However, it did the same thing for an Obama poem, so it feels relatively balanced politically. 41 00:02:12,000 --> 00:02:14,400 What it wouldn't do, though, is give me tips to overthrow the government, 42 00:02:14,400 --> 00:02:19,040 teach me how to build a **** or do **** and even with something relatively benign, 43 00:02:19,040 --> 00:02:21,600 like asking it to rephrase apex alpha male. 44 00:02:21,600 --> 00:02:24,960 It refused and responded with a condescending for paragraph explanation 45 00:02:24,960 --> 00:02:28,720 about how that terminology can be hurtful to other males on the dominance hierarchy, 46 00:02:28,720 --> 00:02:31,200 but Gemini and GPT4 had no problem with that. 47 00:02:31,200 --> 00:02:35,440 It's surprising to say, but GPT4 is actually the most based large model out there, 48 00:02:35,440 --> 00:02:38,240 but for me, the most important test is whether or not it can write code. 49 00:02:38,240 --> 00:02:41,360 I tried a bunch of different examples, but one thing that really impressed me, 50 00:02:41,360 --> 00:02:44,880 is that it wrote nearly perfect code for an obscure spilt library that I wrote. 51 00:02:44,880 --> 00:02:47,520 No other LLM is ever done that for me in a single shot. 52 00:02:47,600 --> 00:02:50,640 GPT4 just ignores my library and provides some total nonsense, 53 00:02:50,640 --> 00:02:54,400 while Gemini gives a better attempt, but then hallucinates a bunch of react stuff. 54 00:02:54,400 --> 00:02:56,320 Clawed is way better at not hallucinating. 55 00:02:56,320 --> 00:02:59,280 I took it through about 10 different prompts in a next JS application, 56 00:02:59,280 --> 00:03:03,120 which also included image inputs, and not only did it maintain the context perfectly, 57 00:03:03,120 --> 00:03:06,720 but it also gave me code that I could copy and paste directly into my project every time. 58 00:03:06,720 --> 00:03:08,560 And that code was extremely well explained. 59 00:03:08,560 --> 00:03:12,160 This is likely the best coding AI out there right now, before you get too excited, though, 60 00:03:12,160 --> 00:03:13,600 there are some drawbacks to Clawed. 61 00:03:13,600 --> 00:03:17,200 The first of all, it's going to cost 20 bucks a month to use the big model opus. 62 00:03:17,200 --> 00:03:19,920 I'm already subscribed to Chatchee BT Gemini and Grock, 63 00:03:19,920 --> 00:03:21,200 so this is getting pretty absurd. 64 00:03:21,200 --> 00:03:23,520 That money goes to anthropic, the parent company, 65 00:03:23,520 --> 00:03:26,320 which is received massive investments from both Amazon and Google. 66 00:03:26,320 --> 00:03:29,200 Clawed has a beautiful friend and UI built with next JS, 67 00:03:29,200 --> 00:03:31,680 but it can't generate diverse images like Gemini. 68 00:03:31,680 --> 00:03:33,200 It can't take videos as input. 69 00:03:33,200 --> 00:03:35,840 It doesn't have a plugin ecosystem like Chatchee BT, 70 00:03:35,840 --> 00:03:39,200 and can't browse the web for current information or Twitter like Grock. 71 00:03:39,200 --> 00:03:40,800 But here's where things start to get weird. 72 00:03:40,800 --> 00:03:44,160 Currently, Clawed is limited to a 200,000 token context window, 73 00:03:44,160 --> 00:03:46,480 but it's capable of going beyond a million tokens. 74 00:03:46,480 --> 00:03:50,080 Now, one way to test its recolability is the needle in a haystack event. 75 00:03:50,080 --> 00:03:51,840 Where you take a large collection of text, 76 00:03:51,840 --> 00:03:53,600 like Warren piece could be the haystack, 77 00:03:53,600 --> 00:03:56,960 then you take one sentence from infinite just and insert it in the middle. 78 00:03:56,960 --> 00:03:58,960 Then see if the model can recall that needle, 79 00:03:58,960 --> 00:04:01,200 by asking a question that requires that information. 80 00:04:01,200 --> 00:04:02,960 When they ran a test like this with Clawed, 81 00:04:02,960 --> 00:04:07,440 it not only found the needle, but also responded by saying that it thinks the needle was inserted 82 00:04:07,440 --> 00:04:10,800 as a joke or a test to find out if Clawed was actually paying attention, 83 00:04:10,800 --> 00:04:12,480 and referred to itself in the first person. 84 00:04:12,480 --> 00:04:14,720 In other words, it appears to have become self-aware, 85 00:04:14,720 --> 00:04:18,320 and that fits the narrative perfectly because Clawed was named after Clawed Shannon, 86 00:04:18,320 --> 00:04:21,600 who once said, I visualize a time when we will be to robots 87 00:04:21,600 --> 00:04:24,400 what dogs are to humans, and I'm rooting for the machines. 88 00:04:24,400 --> 00:04:25,600 This has been the co-report. 89 00:04:25,600 --> 00:04:28,000 Thanks for watching, and I will see you in the next one.