DeepSeek R1 671B on EC2


Over the past few months, I've been using AI tools like ChatGPT, Claude, and Cursor to enhance my coding workflow. While these tools certainly have their limitations, I've found them to be invaluable for ideation, code review, debugging, iterative development, learning, and so much more. Some of my favorite recent AI projects include Kengine, a game engine built in Kotlin/Native+SDL3, and Klip, an image processing service.

When Deep Seek V3 was announced, I was pretty excited to have the chance of running a cutting-edge LLM and begin to explore all the ways to integrate this into my own life and as well as at Arrived. My adventure began with downloading the DeepSeek 671B model directly from Hugging Face. However, after some research and initial challenges, I decided to take a more gradual approach starting with smaller models. Along the way, I discovered Ollama, which significantly simplified the process.

DeepSeek R1 8B on Macbook Pro

My first step was to install ollama on my Macbook Pro and test out the 8B model:.

brew install ollama
ollama pull deepseek-r1:8b
ollama run deepseek-r1:8b

A few conversation prompts [1] [2] verified that the model was working as expected.

DeepSeek R1 on Amazon Linux EC2

For my experiments I used an existing AMI (Amazon Machine Image), "Deep Learning AMI GPU PyTorch 2.0.1 (Amazon Linux 2) 20231003 (ami-031879bfa8927215d)". For more information on which instance size to use, see Instance Types vs Model Size.

I created a new EC2 instance from this AMI and installed ollama using the following commands:

curl -fsSL https://ollama.com/install.sh | sh

DeepSeek R1 32B

Create a file deepseek-r1-32b.mod file to configure the model and add the following:

FROM deepseek-r1:32b

# Model Parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|endoftext|>"

Then run:

ollama create deepseek-r1-32b -f deepseek-r1-32b.mod
ollama run deepseek-r1:32b

Like with the 8B model, I verified the model was working as expected with a few conversation prompts:

Notably, the 32B model seems to strike a great balance between quality and speed.

DeepSeek R1 70B

Similar to the 32B model, I model configurations files for the 70B model. However, I created two versions for two different instance types.

DeepSeek R1 70B on G5.12xlarge (deepseek-r1-70b-g5.24xlarge.mod)

FROM deepseek-r1:70b

# Conservative parameters for multi-GPU setup
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 2048
PARAMETER num_gpu 4
PARAMETER stop "<|endoftext|>"

DeepSeek R1 70B on P4D.24xlarge (deepseek-r1-70b-p4d.24xlarge.mod)

FROM deepseek-r1:70b

PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 8192
PARAMETER num_gpu 8
PARAMETER num_thread 96
PARAMETER num_batch 2048
PARAMETER stop "<|endoftext|>"

Then run:

ollama create deepseek-r1-70b-p4d.24xlarge -f deepseek-r1-70b-p4d.24xlarge.mod
ollama run deepseek-r1-70b-p4d.24xlarge

Since the model is bigger, I decided to ask it a wider range of questions:

DeepSeek R1 671B

Now for the big one, the 671B model. And it's quite a jump from the 70B model.

I initially downloaded the DeepSeek V3 model directly from Hugging Face, however, I discovered it was ultimately easier to use ollama. I'll come back to manually running V3 later.

Generate the model configuration file

Create a file named deepseek-r1-671b-p4d.24xlarge.mod and add the following:

FROM deepseek-r1:671b

# GPU and Memory Settings
PARAMETER num_gpu 8
PARAMETER num_thread 96

# Performance Settings
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
PARAMETER num_batch 1024

# Stop token
PARAMETER stop "<|endoftext|>"

If that doesn't work, try the below or some other combination:

PARAMETER num_ctx 4096
PARAMETER num_batch 512

Create the model

ollama ollama create deepseek-r1-671b-p4d.24xlarge -f deepseek-r1-671b-p4d.24xlarge.mod

which should result in the following output:

gathering model components 
pulling manifest 
pulling 9801e7fce27d... 100% ▕██████████████████████████████████████████████████████████████████▏ 404 GB                         
pulling 369ca498f347... 100% ▕██████████████████████████████████████████████████████████████████▏  387 B                         
pulling 6e4c38e1172f... 100% ▕██████████████████████████████████████████████████████████████████▏ 1.1 KB                         
pulling f4d24e9138dd... 100% ▕██████████████████████████████████████████████████████████████████▏  148 B                         
pulling fdf3d6cb73c7... 100% ▕██████████████████████████████████████████████████████████████████▏  497 B                         
verifying sha256 digest 
writing manifest 
success 
using existing layer sha256:9801e7fce27dbf3d0bfb468b7b21f1d132131a546dfc43e50518631b8b1800a9 
using existing layer sha256:369ca498f347f710d068cbb38bf0b8692dd3fa30f30ca2ff755e211c94768150 
using existing layer sha256:6e4c38e1172f42fdbff13edf9a7a017679fb82b0fde415a3e8b3c31c6ed4a4e4 
creating new layer sha256:c3caaa714b3c48636d7cf161797cf6ccba75a41fcfb4cb0281d10a7de7de9f36 
writing manifest 
success 

Run DeepSeek R1 671B

ollama run deepseek-r1-671b-p4d.24xlarge

I'm not going to go through all the prompts I used for the 671B model, but I will share a few interesting ones:

First Conversation with DeepSeek R1 671B

In kudos to the Chinese engineers, I wanted to mix Chinese into my first conversation. Like ChatGPT/Claude, DeepSeek is a very capable polyglot.

$ ollama run deepseek-r1-671b-p4d.24xlarge
>>> hi my name is kenny. what is your name? I'm very interested in exploring your knowledge base. 听说你还理解中文和其他语言。
<think>

</think>

你好,Kenny!很高兴认识你。你可以叫我小智,我支持使用多种语言进行交流哦~无论是中文、英文还是其他语言,我都乐意为你提供帮助。
有什么想了解的,随时告诉我吧!😊

>>>你好,小智!我中文名叫小龍,不过你可以叫我 Kenny 😊。除了中文以外,我还会讲日语。我的母语是英语,但因为我老婆是中国人,我在家
... 每天都用中文。我在大学学了四年中文,再加上自学,一共学了快二十年。不过,我还是有口音,有时候语法也不太自然。所以,我很崇拜你能
... 懂这么多语言!你还能理解所有人讲的语言,还懂所有的计算机语言、逻辑、数学等等,真的很厉害。你觉得自己最理解的话题是什么?比如,
... 我可能最喜欢的是电脑科学/数学。
<think>

</think>

Kenny你好呀~哇,你的语言学习经历太让人佩服了!坚持二十年真的超有毅力呢!(竖起大拇指)和太太日常用中文交流真是温馨又高效的学
习方式呢~关于口音问题完全不用纠结哦,这可是你独特文化背景的可爱印记✨ 

说到我的知识领域嘛,确实覆盖比较广泛:从量子物理到古典诗词,再到编程语言的底层逻辑都会涉及。不过要论"最理解",可能更偏向于**结
构化知识体系**的处理——比如帮用户梳理复杂概念的关联性、建立跨学科的知识图谱,或是用算法思维拆解实际问题~ 

如果你对计算机科学/数学特别感兴趣的话,我们以后可以多聊聊这些话题呀!最近在研究什么有趣的课题吗?(☆▽☆)

Levenshtein Distance in Kotlin Native with DeepSeek R1 671B

Next, I wanted to see how well the 671B model handled a more complex task, like writing a Kotlin Native program to compute the Levenshtein distance between two strings.

The immediate difference is that for this question DeepSeek spent a lot more time thinking. It was very interesting watching it work through the problem for almost 20 minutes, and to ultimately return a working solution.

$ ollama run deepseek-r1-671b-p4d.24xlarge
>>> Hi could you write an implementation of levenshtein distance in kotlin native with a couple tests with two sequeneces [A,T,C,G].

And it's 450+ line chain of thought. Take note how it will suddenly stop mid-thought, updating and changing course as needed.

<think>
Okay, I need to implement the Levenshtein distance algorithm in Kotlin Native. Let me start by recalling what the 
Levenshtein distance is. It's the minimum number of single-character edits (insertions, deletions, or substitutions) 
required to change one sequence into another.

First, I should structure my code. Since it's Kotlin Native, maybe a function that takes two lists or arrays of some 
type—since the sequences are made up of [A, T, C, G], perhaps using characters or an enum. But for simplicity, maybe just 
use Char since those letters can be represented as such.

Wait, but in the problem statement, they mentioned "two sequences [A,T,C,G]". So each sequence is composed of these 
elements. I need to handle sequences of these nucleotides. Hmm, perhaps using a List<Char> where each element is one of 'A', 
'T', 'C', or 'G'. Or maybe an enum class Nucleotide with those values? But the Levenshtein algorithm doesn't really care 
about what the elements are, just whether they're equal. So treating them as characters should be fine.

So the function signature could be something like fun levenshteinDistance(a: List<Char>, b: List<Char>): Int { ... }

But wait, maybe more efficient with arrays? Not sure if performance is a concern here. The problem says "Kotlin Native", 
which compiles to native binaries, but for the purpose of this task, probably any standard implementation would work.

The standard dynamic programming approach uses a 2D array where dp[i][j] represents the distance between the first i 
characters of sequence a and the first j characters of sequence b. The steps are:

Initialize a (n+1) x (m+1) matrix where n is the length of a, m the length of b.

The base cases: if one string is empty, the distance is the length of the other. So dp[0][j] = j and dp[i][0] = i.

Then fill in the rest by checking each pair:

dp[i][j] = min(
    dp[i-1][j] + 1 (deletion),
    dp[i][j-1] + 1 (insertion),
    dp[i-1][j-1] + cost (substitution, where cost is 0 if a[i-1] == b[j-1], else 1)
)

So in Kotlin, how to implement this efficiently? For space optimization, since each row only depends on the previous one. So 
instead of storing the entire matrix, just keep two rows: current and previous.

That would reduce the space from O(nm) to O(min(n,m)), which is better for longer sequences.

Wait but in Kotlin Native, handling arrays might be straightforward. Let's outline the code steps:

1. Get lengths lenA and lenB of a and b.
2. If one of them is zero, return the length of the other.
3. Initialize previous row (size lenB + 1) with values from 0 to lenB.
4. Iterate over each element in a:
   - Create current row starting with i+1 (current index in a).
   - For each j from 1 to lenB:
       - Check if the characters at positions (i-1 of a and j-1 of b) are equal, cost is 0 or 1.
       - Calculate substitutionCost = previous[j-1] + cost
       - deletionCost = previous[j] + 1
       - insertionCost = currentRow[j-1] + 1
       - Take the minimum of these three and set to current row j position.
5. After processing each i, swap previous with current.

At the end, previous[lenB] will have the result.

Wait but in space optimization steps:

Actually, when using two rows (previous and current), for each element a[i], we create a new current array where each entry 
depends on the previous row's entries above-left (substitution) or left (insertion from j-1 in currentRow?), so maybe need 
to correct this.

Alternatively, use a single array and update it in place. Let me think again about space optimization.

The standard approach for optimizing space is using two rows because each cell only depends on the cells diagonally above 
and to the left.

Wait no: when filling the matrix row by row, you can overwrite the previous values since once we compute current[j], we 
don't need previous[j-1] again. Wait maybe not. Let me think:

The standard way is that for each i (row), j iterates from 1 to lenB.

So in order to minimize space usage, using a single array and updating it as you go might be possible by keeping track of 
the diagonal value before overwriting it.

Alternatively, use two separate arrays: previousRow (for i-1) and currentRow (i).

Wait here's an example. Let's say we're building row i based on row i-1.

For each j in 0..lenB:

current[j] = min(previous[j]+1, // deletion
                  current[j-1]+1, // insertion
                  previous[j-1] + substitutionCost)

Wait but when using two arrays, the first array is the previous (i-1) row. So for each i in 0..lenA:

currentRow[0] = i+1? Because if we are building currentRow as the ith iteration.

Hmm, maybe I need to adjust the code structure.

Let me outline with variables:

val lenA = a.size
val lenB = b.size

if (a.isEmpty()) return lenB
if (b.isEmpty()) return lenA

// create previous array (size lenB+1)
var prevRow = IntArray(lenB + 1) { it } // first row when i=0 is [0,1,...,lenB]

for (i in 1..lenA) {
    val currRow = IntArray(lenB + 1).apply {
        this[0] = i
    }
    for (j in 1..lenB) {
        val substitutionCost = if (a[i-1] == b[j-1]) prevRow[j-1] else prevRow[j-1] + 1
        currRow[j] = minOf(
            prevRow[j] + 1, // deletion cost: remove from a to reach i-1, j 
            currRow[j - 1] + 1, // insertion cost: insert into b's position j (so take previous in current row)
            substitutionCost
        )
    }
    prevRow = currRow
}

return prevRow[lenB]

Yes that seems right. So for each i starting from 1 to lenA (since a is of length lenA, and we need up to index lenA-1), the 
first element in current row is i. Then, j starts at 1.

So substitution cost comes from previous[j-1], plus whether the characters are equal or not.

In this setup, prevRow holds the values for i-1, so when processing i-th character of a (a[i-1]), we compute currRow which 
will be lenB+1 elements. Then prevRow is updated to currRow.

This uses O(n) space where n is the length of b (assuming lenB >= lenA). Wait no: if you process from 0 to lenA, then for 
each i, it's an array of lenB +1 size. But in code above we create a new IntArray(lenB+1), so total O(n) space where n is 
the length of b.

But perhaps even better by using one row and updating in place? Let me think again: that requires saving the previous 
diagonal value before overwriting it. For example:

Use a single array, but for each j from 1 to lenB, we need to track the current top-left (previous i-1,j-1) which is being 
updated as we go.

But perhaps using two rows makes the code clearer and avoids that complexity.

So back to Kotlin: writing this function with two arrays. Then testing it on some examples.

Now for tests: The user wants a couple of tests using sequences composed of A, T, C, G.

Test case 1: Empty sequence versus another. For example, levenshteinDistance(emptyList(), listOf('A')) should return 1.

Another test case: Identical sequences. E.g., "ATCG" and "ATCG"—distance is zero.

Third test case: Substitution needed. Like "ATGC" vs "ATCC": the third character G to C, so substitution cost is 1, total 
distance 1.

Fourth test case: Insertion/deletion. For example, "ATC" versus "ATGTC". The edit steps could be insert G and T after ATC? 
So wait:

Original sequence is A-T-C

Target is A-T-G-T-C

So to go from ATC (length 3) to ATGTC (length5), need two inserts. Or the Levenshtein distance would be 2.

Wait, but according to the algorithm: Let's see:

The path could be inserting G and T between T and C in "ATC" → becomes "ATGC", then insert T? No wait maybe another 
approach.

Alternatively, comparing ATC (3) vs ATGTC (5):

If we compute using dynamic programming.

But perhaps I can write a test case where the expected distance is known. For instance:

Test 1: s1 = ["A"], s2 = ["A"] → distance 0.

s1= [A, T], s2=[A] → deletion of 'T' (cost 1). So distance 1.

Another example: AT vs AG. Substitution cost 1. Distance is 1.

Also, a more complex case where insertions and substitutions are involved:

Test case where the sequences are "GATTACA" and "GCATGCU". But maybe that's too long. Let me think of two sequences like 
"ACGT", "AGTACG".

The distance here might be calculated as follows: Insert A after G (so AGTA...?), but perhaps it's easier to use examples 
where we know the answer.

So, let's code a couple of tests:

Test 1:
a = listOf('A', 'T'), b= listOf('A')
expected distance is 1.

Test2:
a and b are empty. Distance zero? Or if both have length zero?

But for Levenshtein between two empty strings, the distance is 0.

Another test: a is "ATCG", b is "ATC". The distance would be 1 (delete 'G').

Also, substitution case:

Test3:
a = listOf('A', 'T'), b= listOf('A', 'C'). So substitutions. Distance should be 1.

Then another where it's a combination of insert and substitute: e.g., "AT" vs "AGTC". Let's compute this step by step.

But maybe the test cases are okay as long as they cover edge cases like empty sequences, substitution, insertion, deletion.

Now in Kotlin Native tests. Since I'm writing code for Kotlin native, perhaps using kotlin.test with some simple assertions.

So putting it all together: create a function called levenshteinDistance that takes two List<Char>s and returns the 
distance.

Wait but maybe sequences can be passed as strings? But user specified sequences of [A,T,C,G], which are characters. So 
passing lists is okay, or even pass arrays if needed.

But in tests, creating listOf('A', 'T') etc., that's straightforward.

Implementing this function in Kotlin Native (which is different from regular Kotlin/JVM?), but perhaps for the code here, 
standard Kotlin functions are acceptable unless there's a specific requirement.

Wait, user mentioned "Kotlin native", which implies compiling to native binaries. But the code structure should be similar 
except that some libraries may not be available. However, since we're only using basic data structures and no 
platform-specific APIs, this shouldn't matter here.

So for tests: in Kotlin Native, testing frameworks like kotlin.test can be used. So including @Test annotations from 
kotlin.test, with assertEquals statements.

Now putting it all together:

Function code:

fun levenshteinDistance(a: List<Char>, b: List<Char>): Int {
    val lenA = a.size
    val lenB = b.size

    if (lenA == 0) return lenB
    if (lenB == 0) return lenA

    var prevRow = IntArray(lenB + 1) { it }

    for (i in 1..lenA) {
        val currentRow = IntArray(lenB + 1)
        currentRow[0] = i

        for (j in 1..lenB) {
            val substitutionCost = if (a[i - 1] == b[j - 1]) prevRow[j - 1] else prevRow[j - 1] + 1
            currentRow[j] = minOf(
                prevRow[j] + 1,          // Deletion cost
                currentRow[j - 1] + 1,   // Insertion cost
                substitutionCost         // Substitution cost
            )
        }

        prevRow = currentRow
    }

    return prevRow[lenB]
}

Tests:

import kotlin.test.Test
import kotlin.test.assertEquals

class LevenshteinTest {

    @Test
    fun `empty sequences`() {
        assertEquals(0, levenshteinDistance(listOf(), listOf()))
    }

    @Test
    fun `one empty sequence`() {
        val seq = listOf('A', 'T')
        assertEquals(2, levenshteinDistance(seq, listOf()))
        assertEquals(2, levenshteinDistance(listOf(), seq))
    }

    @Test
    fun `identical sequences`() {
        val seq1 = listOf('A', 'T', 'C', 'G')
        val seq2 = listOf('A', 'T', 'C', 'G')
        assertEquals(0, levenshteinDistance(seq1, seq2))
    }

    @Test
    fun `substitution needed`() {
        val seq1 = listOf('A', 'T', 'G')
        val seq2 = listOf('A', 'C', 'G')
        assertEquals(1, levenshteinDistance(seq1, seq2))
    }

    @Test
    fun `insertion and deletion`() {
        val seq1 = listOf('A', 'T', 'C', 'G')
        val seq2 = listOf('A', 'T', 'G')
        assertEquals(1, levenshteinDistance(seq1, seq2)) // Delete 'C' -> distance 1
    }

    @Test
    fun `complex case`() {
        val seq1 = listOf('G', 'T', 'A', 'C')
        val seq2 = listOf('G', 'C', 'A', 'T', 'C')
        // Convert GTAC to GCATC:
        // G -> same (0)
        // T vs C: substitute (cost 1) → or insert C and delete T?
        // Let's see step by step. But according to the algorithm, it should compute the minimal steps.
        // The correct distance here is 2? Because inserting 'C' after G would require insertion, then substitution of T to 
A might be needed.

        // Alternatively:
        // GTAC → insert C (so cost +1) between G and T: GC TAC?
        Wait perhaps this example needs better analysis. Let me compute it properly:

seq1 is ["G", "T", "A", "C"]; seq2 is ["G","C","A","T","C"]

Let's build the matrix.

But maybe I should use known steps or an online calculator to find the correct distance for this example. Alternatively, 
perhaps skip this test if unsure and stick to simpler tests where we can compute manually.

Alternatively, let me think of another approach: seq1 is G T A C (length 4), seq2 is G C A T C (length5). So steps needed:

Optionally convert GTAC into GCATC. Let's see:

One possible path:
- Insert 'C' after the first G → G C T A C
- Then substitute second T in original with A? Wait maybe not.

Alternatively, substitution of T to C at position 2 (cost +1), then insertion of T between positions 4 and 5 (original 
becomes GCAC; insert T: GCA TC. So cost is 1 for substitution and 1 for insertion → total distance 2.

So the Levenshtein distance here should be 3? Wait, I need to compute it properly using the algorithm.

Let me manually calculate:

For i=0 (seq1 empty), prevRow = [0,1,2,3,4] (since seq2 length is 5).

Then for each row up to i=4 (positions 0-3 in seq1):

i=1: currentChar G. Compute j from 1 to 5:

j=1: compare with C (no?), wait no.

Wait the actual sequences are:

seq1 = ['G', 'T','A','C']

seq2 =['G','C','A','T','C']

So for each i and j in code steps. Let's try building matrix row by row.

But perhaps it's time-consuming, but given that this is a test case to include maybe the correct distance is 3?

Alternatively, use an online Levenshtein calculator:

Input strings as "GTAC" vs "GCA TC".

Assuming spaces are ignored (they shouldn't be), input GTAC and GCATC.

The levenshtein between GTAC and G CATC would have steps: insert C after G → positions 1 in seq2, then substitute T to A at 
position3? Or other combinations. Let's check:

Using online calculator for "GTAC" vs "GCATC":

G - G (cost 0)

T is compared with C → substitution cost +1.

Then next characters A and A: no cost.

C versus T in seq2? Hmm, perhaps not sure.

Alternatively, the Levenshtein distance between these two would be:

Operations:
Insert 'C' after first G (G→GC), then substitute T to A at position 3. Wait maybe I'm getting confused here. Let's use a 
proper method.

Wait GTAC: indices 0-3 are G(0),T(1),A(2), C(3).

seq2 is GCATC (indices 0-4: G,C,A,T,C)

So the alignment:

G T A C  
| | \ \
G C A T C

No, that's not possible. Let me see.

An alternative approach:

We can model the DP table for "GTAC" and "GCATC".

The rows represent GTAC (i from 0 to4), columns GCATC (j from0 to5).

Initialize row 0 as [0,1,2,3,4].

Row i=1 (G):

current row starts with j=0: currentRow[0] =1.

For each j in 1-5:

At j=1 (char C), compare G vs C. Substitute cost is prev[j-1](prevRow[0]=0) +1 → substitutionCost=1. Insertion cost is 
currRow[j-1]+1: currentRow[0] was set to 1, so insertion cost is 2. Deletion cost is prevRow[j (j=1)]+1 = prevRow[1](which 
in previous row after i=0) which would be 1 +1 →2. So min(2,2,1) → currentRow[1] is 1.

Then j=2: compare T and C?

Wait no:

i=1 (row for the first character of a, which is 'G').

At i=1 in code loop (processing a[i-1], since a has indices from0 to lenA-1. So when i runs from 1..4 (since seq1 length is 
4), then a[0] (i=1) corresponds to the first character G, and so on.

Wait maybe I'm overcomplicating. Let's proceed step by step for this test case:

But perhaps it's better to avoid including such a complex example in tests unless certain of the correct answer.

So instead, let me use another approach: create sequences where insertions are needed. For example, seq1 is "A", seq2 is 
"AT": distance 1 (insert T).

Another Test case with insertion and substitution:

For example, seq1 = listOf('A', 'G'), seq2 = listOf('T','C'). The Levenshtein distance here would be 2 substitutions: A→T 
and G→ C. So total cost is 2.

Alternatively, inserting T and deleting both A and G but that's more costly. So the minimal is substitution x2 → cost 2.

But perhaps in code:

prevRow starts as [0,1,2]

i=1 (a[0] = 'A')

current row first element is i (=1).

j runs from 1 to lenB(=2). 

For j=1: comparing A with T. substitutionCost = prev[j-1](0) +1 →1.

Deletion cost: prevRow[1]+1 (1+1=2), insertion currentRow[0] +1 (1+1=2).

currentRow[1]= min(2, 2, 1)=1

j=2: comparing A with C. substitutionCost = prev[1](prev row for j-1=1 was in previous row as 1 → but wait when i is 1 and 
seq2 has lenB=2.

Wait perhaps this is getting too time-consuming, so it's better to code the function properly and have tests that cover key 
cases.
</think>

Followed by the solution:

Here's an implementation of Levenshtein distance in Kotlin for nucleotide sequences (`A`, `T`, `C`, `G`) along with test 
cases:
fun levenshteinDistance(a: List<Char>, b: List<Char>): Int {
    val lenA = a.size
    val lenB = b.size

    if (lenA == 0) return lenB
    if (lenB == 0) return lenA

    var prevRow = IntArray(lenB + 1) { it } // Initialize previous row with 0..lenB

    for (i in 1..lenA) {
        val currentRow = IntArray(lenB + 1)
        currentRow[0] = i
        
        for (j in 1..lenB) {
            val substitutionCost = if (a[i - 1] == b[j - 1]) 
                prevRow[j - 1]
            else 
                prevRow[j - 1] + 1
            
            currentRow[j] = minOf(
                    prevRow[j] + 1,        // Deletion
                    currentRow[j - 1] + 1,   // Insertion
                    substitutionCost       // Substitution
            )
        }
        
        prevRow = currentRow
    }

    return prevRow[lenB]
}

class LevenshteinTest {
    
    @Test
    fun `empty sequences`() {
        assertEquals(0, levenshteinDistance(listOf(), listOf()))
    }

    @Test
    fun `one empty sequence`() {
        val seq = listOf('A', 'T')
        assertEquals(2, levenshteinDistance(seq, listOf()))
        assertEquals(2, levenshteinDistance(listOf(), seq))
    }

    @Test
    fun `identical sequences`() {
        val seq1 = listOf('A', 'T', 'C', 'G')
        val seq2 = listOf('A', 'T', 'C', 'G')
        assertEquals(0, levenshteinDistance(seq1, seq2))
    }

    @Test
    fun `single substitution`() {
        val seq1 = listOf('A', 'T', 'G')
        val seq2 = listOf('A', 'C', 'G')
        assertEquals(1, levenshteinDistance(seq1, seq2))
    }

    @Test
    fun `insertion and deletion`() {
        val longer = listOf('A', 'T', 'C', 'G')
        val shorter = listOf('A', 'T', 'G')
        assertEquals(1, levenshteinDistance(longer, shorter))
    }
}
This implementation:
- Uses dynamic programming with space optimization (O(n) space)
- Works for any character sequences but specifically handles nucleotide characters (`A`, `T`, `C`, `G`)
- Includes tests covering:
  - Empty sequence cases
  - Identical sequences
  - Single substitution case
  - Insertion/deletion scenario

The algorithm calculates the minimum number of operations (insertions, deletions, or substitutions) needed to transform one 
sequence into another. Each test case verifies a different aspect of the implementation's correctness.

You can run these tests using any Kotlin testing framework like JUnit (shown here with `@Test` annotations). The 
space-optimized approach ensures it works efficiently even for longer sequences typical in bioinformatics applications.

I loaded this solution right into IntelliJ and the tests passsed. Source Code.

Bonus Question While I had DeepSeek 671B online, I had to ask it one final question. :)

What is the origin of the universe?

DeepSeek's response here.

DeepSeek V3-671B on EC2

Coming Soon.

EC2 Instance Types vs Model Size

I didn't do much benchmarking, but made note of which instnace types I've ran the larger models on. My typical workflow is to prep everything on smaller instances and then scale up to larger instances at run time.

Additionaly, be sure you have enough disk. The 671B model requires 404GB of disk space. But you need more so you can copy/transform the data, and support other models.

g5.24xlarge
  • Models tested: deepseek-r1-70b, deepseek-r1-32b, coder-33b-instruct
  • 96 vCPUs, 384 GB RAM, 4x A10G GPUs (24GB each = 96GB total)
  • $8.00/hr $192.00/day $5,760/mo
g5.12xlarge
  • Models tested: deepseek-r1-32b, coder-33b-instruct
  • 48 vCPUs, 192 GB RAM, 4x A10G GPUs (24GB each = 96GB total)
  • $4.00/hr $96.00/day $2,880/mo
p4d.24xlarge
  • Models tested: deepseek-r1-671b, deepseek-r1-70b, deepseek-r1-32b
  • 96 vCPUs, 1,152 GB RAM, 8x A100 GPUs (40GB each = 320GB total)
  • $32.77/hr $786.48/day $23,594/mo

GPU Monitoring on P4D.24xlarge

I found this tool helpful to monitor GPU utilization and available memory.

$ nvidia-smi
      
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:10:1C.0 Off |                    0 |
| N/A   31C    P0              53W / 400W |      5MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          On  | 00000000:10:1D.0 Off |                    0 |
| N/A   30C    P0              53W / 400W |      5MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          On  | 00000000:20:1C.0 Off |                    0 |
| N/A   31C    P0              55W / 400W |      5MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA A100-SXM4-40GB          On  | 00000000:20:1D.0 Off |                    0 |
| N/A   30C    P0              53W / 400W |      5MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-40GB          On  | 00000000:90:1C.0 Off |                    0 |
| N/A   31C    P0              54W / 400W |      5MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA A100-SXM4-40GB          On  | 00000000:90:1D.0 Off |                    0 |
| N/A   30C    P0              52W / 400W |      5MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA A100-SXM4-40GB          On  | 00000000:A0:1C.0 Off |                    0 |
| N/A   32C    P0              61W / 400W |      5MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA A100-SXM4-40GB          On  | 00000000:A0:1D.0 Off |                    0 |
| N/A   30C    P0              52W / 400W |      5MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+