DeepSeek R1 671B on EC2
Over the past few months, I've been using AI tools like ChatGPT, Claude, and Cursor to enhance my coding workflow. While these tools certainly have their limitations, I've found them to be invaluable for ideation, code review, debugging, iterative development, learning, and so much more. Some of my favorite recent AI projects include Kengine, a game engine built in Kotlin/Native+SDL3, and Klip, an image processing service.
When Deep Seek V3 was announced, I was pretty excited to have the chance of running a cutting-edge LLM and begin to explore all the ways to integrate this into my own life and as well as at Arrived. My adventure began with downloading the DeepSeek 671B model directly from Hugging Face. However, after some research and initial challenges, I decided to take a more gradual approach starting with smaller models. Along the way, I discovered Ollama, which significantly simplified the process.
DeepSeek R1 8B on Macbook Pro
My first step was to install ollama
on my Macbook Pro and test out the 8B model:.
brew install ollama
ollama pull deepseek-r1:8b
ollama run deepseek-r1:8b
A few conversation prompts [1] [2] verified that the model was working as expected.
DeepSeek R1 on Amazon Linux EC2
For my experiments I used an existing AMI (Amazon Machine Image), "Deep Learning AMI GPU PyTorch 2.0.1 (Amazon Linux 2) 20231003 (ami-031879bfa8927215d)"
. For more information on which instance size to use, see Instance Types vs Model Size.
I created a new EC2 instance from this AMI and installed ollama
using the following commands:
curl -fsSL https://ollama.com/install.sh | sh
DeepSeek R1 32B
Create a file deepseek-r1-32b.mod
file to configure the model and add the following:
FROM deepseek-r1:32b
# Model Parameters
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER stop "<|endoftext|>"
Then run:
ollama create deepseek-r1-32b -f deepseek-r1-32b.mod
ollama run deepseek-r1:32b
Like with the 8B model, I verified the model was working as expected with a few conversation prompts:
- Write a Kotlin Native program to compute the Levenshtein distance between two strings. [code]
- Write an Assembly program to print "Hello, World!", [code]
Notably, the 32B model seems to strike a great balance between quality and speed.
DeepSeek R1 70B
Similar to the 32B model, I model configurations files for the 70B model. However, I created two versions for two different instance types.
DeepSeek R1 70B on G5.12xlarge (deepseek-r1-70b-g5.24xlarge.mod)
FROM deepseek-r1:70b
# Conservative parameters for multi-GPU setup
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 2048
PARAMETER num_gpu 4
PARAMETER stop "<|endoftext|>"
DeepSeek R1 70B on P4D.24xlarge (deepseek-r1-70b-p4d.24xlarge.mod)
FROM deepseek-r1:70b
PARAMETER temperature 0.7
PARAMETER top_p 0.9
PARAMETER num_ctx 8192
PARAMETER num_gpu 8
PARAMETER num_thread 96
PARAMETER num_batch 2048
PARAMETER stop "<|endoftext|>"
Then run:
ollama create deepseek-r1-70b-p4d.24xlarge -f deepseek-r1-70b-p4d.24xlarge.mod
ollama run deepseek-r1-70b-p4d.24xlarge
Since the model is bigger, I decided to ask it a wider range of questions:
- Generate a Chinese lesson for beginners.
- Generate a sample essay about using ML for protein folding prediction and how it's helped improve the speed at innovating vaccines. Feel free to tie in information about the recent mRNA vaccines as well. (Assume HSK6)
- Write John Conway's Game of Life in C.
DeepSeek R1 671B
Now for the big one, the 671B model. And it's quite a jump from the 70B model.
I initially downloaded the DeepSeek V3 model directly from Hugging Face, however, I discovered it was ultimately easier to use ollama
. I'll come back to manually running V3 later.
Generate the model configuration file
Create a file named deepseek-r1-671b-p4d.24xlarge.mod
and add the following:
FROM deepseek-r1:671b
# GPU and Memory Settings
PARAMETER num_gpu 8
PARAMETER num_thread 96
# Performance Settings
PARAMETER temperature 0.7
PARAMETER num_ctx 8192
PARAMETER num_batch 1024
# Stop token
PARAMETER stop "<|endoftext|>"
If that doesn't work, try the below or some other combination:
PARAMETER num_ctx 4096
PARAMETER num_batch 512
Create the model
ollama ollama create deepseek-r1-671b-p4d.24xlarge -f deepseek-r1-671b-p4d.24xlarge.mod
which should result in the following output:
gathering model components
pulling manifest
pulling 9801e7fce27d... 100% ▕██████████████████████████████████████████████████████████████████▏ 404 GB
pulling 369ca498f347... 100% ▕██████████████████████████████████████████████████████████████████▏ 387 B
pulling 6e4c38e1172f... 100% ▕██████████████████████████████████████████████████████████████████▏ 1.1 KB
pulling f4d24e9138dd... 100% ▕██████████████████████████████████████████████████████████████████▏ 148 B
pulling fdf3d6cb73c7... 100% ▕██████████████████████████████████████████████████████████████████▏ 497 B
verifying sha256 digest
writing manifest
success
using existing layer sha256:9801e7fce27dbf3d0bfb468b7b21f1d132131a546dfc43e50518631b8b1800a9
using existing layer sha256:369ca498f347f710d068cbb38bf0b8692dd3fa30f30ca2ff755e211c94768150
using existing layer sha256:6e4c38e1172f42fdbff13edf9a7a017679fb82b0fde415a3e8b3c31c6ed4a4e4
creating new layer sha256:c3caaa714b3c48636d7cf161797cf6ccba75a41fcfb4cb0281d10a7de7de9f36
writing manifest
success
Run DeepSeek R1 671B
ollama run deepseek-r1-671b-p4d.24xlarge
I'm not going to go through all the prompts I used for the 671B model, but I will share a few interesting ones:
First Conversation with DeepSeek R1 671B
In kudos to the Chinese engineers, I wanted to mix Chinese into my first conversation. Like ChatGPT/Claude, DeepSeek is a very capable polyglot.
$ ollama run deepseek-r1-671b-p4d.24xlarge
>>> hi my name is kenny. what is your name? I'm very interested in exploring your knowledge base. 听说你还理解中文和其他语言。
<think>
</think>
你好,Kenny!很高兴认识你。你可以叫我小智,我支持使用多种语言进行交流哦~无论是中文、英文还是其他语言,我都乐意为你提供帮助。
有什么想了解的,随时告诉我吧!😊
>>>你好,小智!我中文名叫小龍,不过你可以叫我 Kenny 😊。除了中文以外,我还会讲日语。我的母语是英语,但因为我老婆是中国人,我在家
... 每天都用中文。我在大学学了四年中文,再加上自学,一共学了快二十年。不过,我还是有口音,有时候语法也不太自然。所以,我很崇拜你能
... 懂这么多语言!你还能理解所有人讲的语言,还懂所有的计算机语言、逻辑、数学等等,真的很厉害。你觉得自己最理解的话题是什么?比如,
... 我可能最喜欢的是电脑科学/数学。
<think>
</think>
Kenny你好呀~哇,你的语言学习经历太让人佩服了!坚持二十年真的超有毅力呢!(竖起大拇指)和太太日常用中文交流真是温馨又高效的学
习方式呢~关于口音问题完全不用纠结哦,这可是你独特文化背景的可爱印记✨
说到我的知识领域嘛,确实覆盖比较广泛:从量子物理到古典诗词,再到编程语言的底层逻辑都会涉及。不过要论"最理解",可能更偏向于**结
构化知识体系**的处理——比如帮用户梳理复杂概念的关联性、建立跨学科的知识图谱,或是用算法思维拆解实际问题~
如果你对计算机科学/数学特别感兴趣的话,我们以后可以多聊聊这些话题呀!最近在研究什么有趣的课题吗?(☆▽☆)
Levenshtein Distance in Kotlin Native with DeepSeek R1 671B
Next, I wanted to see how well the 671B model handled a more complex task, like writing a Kotlin Native program to compute the Levenshtein distance between two strings.
The immediate difference is that for this question DeepSeek spent a lot more time thinking. It was very interesting watching it work through the problem for almost 20 minutes, and to ultimately return a working solution.
$ ollama run deepseek-r1-671b-p4d.24xlarge
>>> Hi could you write an implementation of levenshtein distance in kotlin native with a couple tests with two sequeneces [A,T,C,G].
And it's 450+ line chain of thought. Take note how it will suddenly stop mid-thought, updating and changing course as needed.
<think>
Okay, I need to implement the Levenshtein distance algorithm in Kotlin Native. Let me start by recalling what the
Levenshtein distance is. It's the minimum number of single-character edits (insertions, deletions, or substitutions)
required to change one sequence into another.
First, I should structure my code. Since it's Kotlin Native, maybe a function that takes two lists or arrays of some
type—since the sequences are made up of [A, T, C, G], perhaps using characters or an enum. But for simplicity, maybe just
use Char since those letters can be represented as such.
Wait, but in the problem statement, they mentioned "two sequences [A,T,C,G]". So each sequence is composed of these
elements. I need to handle sequences of these nucleotides. Hmm, perhaps using a List<Char> where each element is one of 'A',
'T', 'C', or 'G'. Or maybe an enum class Nucleotide with those values? But the Levenshtein algorithm doesn't really care
about what the elements are, just whether they're equal. So treating them as characters should be fine.
So the function signature could be something like fun levenshteinDistance(a: List<Char>, b: List<Char>): Int { ... }
But wait, maybe more efficient with arrays? Not sure if performance is a concern here. The problem says "Kotlin Native",
which compiles to native binaries, but for the purpose of this task, probably any standard implementation would work.
The standard dynamic programming approach uses a 2D array where dp[i][j] represents the distance between the first i
characters of sequence a and the first j characters of sequence b. The steps are:
Initialize a (n+1) x (m+1) matrix where n is the length of a, m the length of b.
The base cases: if one string is empty, the distance is the length of the other. So dp[0][j] = j and dp[i][0] = i.
Then fill in the rest by checking each pair:
dp[i][j] = min(
dp[i-1][j] + 1 (deletion),
dp[i][j-1] + 1 (insertion),
dp[i-1][j-1] + cost (substitution, where cost is 0 if a[i-1] == b[j-1], else 1)
)
So in Kotlin, how to implement this efficiently? For space optimization, since each row only depends on the previous one. So
instead of storing the entire matrix, just keep two rows: current and previous.
That would reduce the space from O(nm) to O(min(n,m)), which is better for longer sequences.
Wait but in Kotlin Native, handling arrays might be straightforward. Let's outline the code steps:
1. Get lengths lenA and lenB of a and b.
2. If one of them is zero, return the length of the other.
3. Initialize previous row (size lenB + 1) with values from 0 to lenB.
4. Iterate over each element in a:
- Create current row starting with i+1 (current index in a).
- For each j from 1 to lenB:
- Check if the characters at positions (i-1 of a and j-1 of b) are equal, cost is 0 or 1.
- Calculate substitutionCost = previous[j-1] + cost
- deletionCost = previous[j] + 1
- insertionCost = currentRow[j-1] + 1
- Take the minimum of these three and set to current row j position.
5. After processing each i, swap previous with current.
At the end, previous[lenB] will have the result.
Wait but in space optimization steps:
Actually, when using two rows (previous and current), for each element a[i], we create a new current array where each entry
depends on the previous row's entries above-left (substitution) or left (insertion from j-1 in currentRow?), so maybe need
to correct this.
Alternatively, use a single array and update it in place. Let me think again about space optimization.
The standard approach for optimizing space is using two rows because each cell only depends on the cells diagonally above
and to the left.
Wait no: when filling the matrix row by row, you can overwrite the previous values since once we compute current[j], we
don't need previous[j-1] again. Wait maybe not. Let me think:
The standard way is that for each i (row), j iterates from 1 to lenB.
So in order to minimize space usage, using a single array and updating it as you go might be possible by keeping track of
the diagonal value before overwriting it.
Alternatively, use two separate arrays: previousRow (for i-1) and currentRow (i).
Wait here's an example. Let's say we're building row i based on row i-1.
For each j in 0..lenB:
current[j] = min(previous[j]+1, // deletion
current[j-1]+1, // insertion
previous[j-1] + substitutionCost)
Wait but when using two arrays, the first array is the previous (i-1) row. So for each i in 0..lenA:
currentRow[0] = i+1? Because if we are building currentRow as the ith iteration.
Hmm, maybe I need to adjust the code structure.
Let me outline with variables:
val lenA = a.size
val lenB = b.size
if (a.isEmpty()) return lenB
if (b.isEmpty()) return lenA
// create previous array (size lenB+1)
var prevRow = IntArray(lenB + 1) { it } // first row when i=0 is [0,1,...,lenB]
for (i in 1..lenA) {
val currRow = IntArray(lenB + 1).apply {
this[0] = i
}
for (j in 1..lenB) {
val substitutionCost = if (a[i-1] == b[j-1]) prevRow[j-1] else prevRow[j-1] + 1
currRow[j] = minOf(
prevRow[j] + 1, // deletion cost: remove from a to reach i-1, j
currRow[j - 1] + 1, // insertion cost: insert into b's position j (so take previous in current row)
substitutionCost
)
}
prevRow = currRow
}
return prevRow[lenB]
Yes that seems right. So for each i starting from 1 to lenA (since a is of length lenA, and we need up to index lenA-1), the
first element in current row is i. Then, j starts at 1.
So substitution cost comes from previous[j-1], plus whether the characters are equal or not.
In this setup, prevRow holds the values for i-1, so when processing i-th character of a (a[i-1]), we compute currRow which
will be lenB+1 elements. Then prevRow is updated to currRow.
This uses O(n) space where n is the length of b (assuming lenB >= lenA). Wait no: if you process from 0 to lenA, then for
each i, it's an array of lenB +1 size. But in code above we create a new IntArray(lenB+1), so total O(n) space where n is
the length of b.
But perhaps even better by using one row and updating in place? Let me think again: that requires saving the previous
diagonal value before overwriting it. For example:
Use a single array, but for each j from 1 to lenB, we need to track the current top-left (previous i-1,j-1) which is being
updated as we go.
But perhaps using two rows makes the code clearer and avoids that complexity.
So back to Kotlin: writing this function with two arrays. Then testing it on some examples.
Now for tests: The user wants a couple of tests using sequences composed of A, T, C, G.
Test case 1: Empty sequence versus another. For example, levenshteinDistance(emptyList(), listOf('A')) should return 1.
Another test case: Identical sequences. E.g., "ATCG" and "ATCG"—distance is zero.
Third test case: Substitution needed. Like "ATGC" vs "ATCC": the third character G to C, so substitution cost is 1, total
distance 1.
Fourth test case: Insertion/deletion. For example, "ATC" versus "ATGTC". The edit steps could be insert G and T after ATC?
So wait:
Original sequence is A-T-C
Target is A-T-G-T-C
So to go from ATC (length 3) to ATGTC (length5), need two inserts. Or the Levenshtein distance would be 2.
Wait, but according to the algorithm: Let's see:
The path could be inserting G and T between T and C in "ATC" → becomes "ATGC", then insert T? No wait maybe another
approach.
Alternatively, comparing ATC (3) vs ATGTC (5):
If we compute using dynamic programming.
But perhaps I can write a test case where the expected distance is known. For instance:
Test 1: s1 = ["A"], s2 = ["A"] → distance 0.
s1= [A, T], s2=[A] → deletion of 'T' (cost 1). So distance 1.
Another example: AT vs AG. Substitution cost 1. Distance is 1.
Also, a more complex case where insertions and substitutions are involved:
Test case where the sequences are "GATTACA" and "GCATGCU". But maybe that's too long. Let me think of two sequences like
"ACGT", "AGTACG".
The distance here might be calculated as follows: Insert A after G (so AGTA...?), but perhaps it's easier to use examples
where we know the answer.
So, let's code a couple of tests:
Test 1:
a = listOf('A', 'T'), b= listOf('A')
expected distance is 1.
Test2:
a and b are empty. Distance zero? Or if both have length zero?
But for Levenshtein between two empty strings, the distance is 0.
Another test: a is "ATCG", b is "ATC". The distance would be 1 (delete 'G').
Also, substitution case:
Test3:
a = listOf('A', 'T'), b= listOf('A', 'C'). So substitutions. Distance should be 1.
Then another where it's a combination of insert and substitute: e.g., "AT" vs "AGTC". Let's compute this step by step.
But maybe the test cases are okay as long as they cover edge cases like empty sequences, substitution, insertion, deletion.
Now in Kotlin Native tests. Since I'm writing code for Kotlin native, perhaps using kotlin.test with some simple assertions.
So putting it all together: create a function called levenshteinDistance that takes two List<Char>s and returns the
distance.
Wait but maybe sequences can be passed as strings? But user specified sequences of [A,T,C,G], which are characters. So
passing lists is okay, or even pass arrays if needed.
But in tests, creating listOf('A', 'T') etc., that's straightforward.
Implementing this function in Kotlin Native (which is different from regular Kotlin/JVM?), but perhaps for the code here,
standard Kotlin functions are acceptable unless there's a specific requirement.
Wait, user mentioned "Kotlin native", which implies compiling to native binaries. But the code structure should be similar
except that some libraries may not be available. However, since we're only using basic data structures and no
platform-specific APIs, this shouldn't matter here.
So for tests: in Kotlin Native, testing frameworks like kotlin.test can be used. So including @Test annotations from
kotlin.test, with assertEquals statements.
Now putting it all together:
Function code:
fun levenshteinDistance(a: List<Char>, b: List<Char>): Int {
val lenA = a.size
val lenB = b.size
if (lenA == 0) return lenB
if (lenB == 0) return lenA
var prevRow = IntArray(lenB + 1) { it }
for (i in 1..lenA) {
val currentRow = IntArray(lenB + 1)
currentRow[0] = i
for (j in 1..lenB) {
val substitutionCost = if (a[i - 1] == b[j - 1]) prevRow[j - 1] else prevRow[j - 1] + 1
currentRow[j] = minOf(
prevRow[j] + 1, // Deletion cost
currentRow[j - 1] + 1, // Insertion cost
substitutionCost // Substitution cost
)
}
prevRow = currentRow
}
return prevRow[lenB]
}
Tests:
import kotlin.test.Test
import kotlin.test.assertEquals
class LevenshteinTest {
@Test
fun `empty sequences`() {
assertEquals(0, levenshteinDistance(listOf(), listOf()))
}
@Test
fun `one empty sequence`() {
val seq = listOf('A', 'T')
assertEquals(2, levenshteinDistance(seq, listOf()))
assertEquals(2, levenshteinDistance(listOf(), seq))
}
@Test
fun `identical sequences`() {
val seq1 = listOf('A', 'T', 'C', 'G')
val seq2 = listOf('A', 'T', 'C', 'G')
assertEquals(0, levenshteinDistance(seq1, seq2))
}
@Test
fun `substitution needed`() {
val seq1 = listOf('A', 'T', 'G')
val seq2 = listOf('A', 'C', 'G')
assertEquals(1, levenshteinDistance(seq1, seq2))
}
@Test
fun `insertion and deletion`() {
val seq1 = listOf('A', 'T', 'C', 'G')
val seq2 = listOf('A', 'T', 'G')
assertEquals(1, levenshteinDistance(seq1, seq2)) // Delete 'C' -> distance 1
}
@Test
fun `complex case`() {
val seq1 = listOf('G', 'T', 'A', 'C')
val seq2 = listOf('G', 'C', 'A', 'T', 'C')
// Convert GTAC to GCATC:
// G -> same (0)
// T vs C: substitute (cost 1) → or insert C and delete T?
// Let's see step by step. But according to the algorithm, it should compute the minimal steps.
// The correct distance here is 2? Because inserting 'C' after G would require insertion, then substitution of T to
A might be needed.
// Alternatively:
// GTAC → insert C (so cost +1) between G and T: GC TAC?
Wait perhaps this example needs better analysis. Let me compute it properly:
seq1 is ["G", "T", "A", "C"]; seq2 is ["G","C","A","T","C"]
Let's build the matrix.
But maybe I should use known steps or an online calculator to find the correct distance for this example. Alternatively,
perhaps skip this test if unsure and stick to simpler tests where we can compute manually.
Alternatively, let me think of another approach: seq1 is G T A C (length 4), seq2 is G C A T C (length5). So steps needed:
Optionally convert GTAC into GCATC. Let's see:
One possible path:
- Insert 'C' after the first G → G C T A C
- Then substitute second T in original with A? Wait maybe not.
Alternatively, substitution of T to C at position 2 (cost +1), then insertion of T between positions 4 and 5 (original
becomes GCAC; insert T: GCA TC. So cost is 1 for substitution and 1 for insertion → total distance 2.
So the Levenshtein distance here should be 3? Wait, I need to compute it properly using the algorithm.
Let me manually calculate:
For i=0 (seq1 empty), prevRow = [0,1,2,3,4] (since seq2 length is 5).
Then for each row up to i=4 (positions 0-3 in seq1):
i=1: currentChar G. Compute j from 1 to 5:
j=1: compare with C (no?), wait no.
Wait the actual sequences are:
seq1 = ['G', 'T','A','C']
seq2 =['G','C','A','T','C']
So for each i and j in code steps. Let's try building matrix row by row.
But perhaps it's time-consuming, but given that this is a test case to include maybe the correct distance is 3?
Alternatively, use an online Levenshtein calculator:
Input strings as "GTAC" vs "GCA TC".
Assuming spaces are ignored (they shouldn't be), input GTAC and GCATC.
The levenshtein between GTAC and G CATC would have steps: insert C after G → positions 1 in seq2, then substitute T to A at
position3? Or other combinations. Let's check:
Using online calculator for "GTAC" vs "GCATC":
G - G (cost 0)
T is compared with C → substitution cost +1.
Then next characters A and A: no cost.
C versus T in seq2? Hmm, perhaps not sure.
Alternatively, the Levenshtein distance between these two would be:
Operations:
Insert 'C' after first G (G→GC), then substitute T to A at position 3. Wait maybe I'm getting confused here. Let's use a
proper method.
Wait GTAC: indices 0-3 are G(0),T(1),A(2), C(3).
seq2 is GCATC (indices 0-4: G,C,A,T,C)
So the alignment:
G T A C
| | \ \
G C A T C
No, that's not possible. Let me see.
An alternative approach:
We can model the DP table for "GTAC" and "GCATC".
The rows represent GTAC (i from 0 to4), columns GCATC (j from0 to5).
Initialize row 0 as [0,1,2,3,4].
Row i=1 (G):
current row starts with j=0: currentRow[0] =1.
For each j in 1-5:
At j=1 (char C), compare G vs C. Substitute cost is prev[j-1](prevRow[0]=0) +1 → substitutionCost=1. Insertion cost is
currRow[j-1]+1: currentRow[0] was set to 1, so insertion cost is 2. Deletion cost is prevRow[j (j=1)]+1 = prevRow[1](which
in previous row after i=0) which would be 1 +1 →2. So min(2,2,1) → currentRow[1] is 1.
Then j=2: compare T and C?
Wait no:
i=1 (row for the first character of a, which is 'G').
At i=1 in code loop (processing a[i-1], since a has indices from0 to lenA-1. So when i runs from 1..4 (since seq1 length is
4), then a[0] (i=1) corresponds to the first character G, and so on.
Wait maybe I'm overcomplicating. Let's proceed step by step for this test case:
But perhaps it's better to avoid including such a complex example in tests unless certain of the correct answer.
So instead, let me use another approach: create sequences where insertions are needed. For example, seq1 is "A", seq2 is
"AT": distance 1 (insert T).
Another Test case with insertion and substitution:
For example, seq1 = listOf('A', 'G'), seq2 = listOf('T','C'). The Levenshtein distance here would be 2 substitutions: A→T
and G→ C. So total cost is 2.
Alternatively, inserting T and deleting both A and G but that's more costly. So the minimal is substitution x2 → cost 2.
But perhaps in code:
prevRow starts as [0,1,2]
i=1 (a[0] = 'A')
current row first element is i (=1).
j runs from 1 to lenB(=2).
For j=1: comparing A with T. substitutionCost = prev[j-1](0) +1 →1.
Deletion cost: prevRow[1]+1 (1+1=2), insertion currentRow[0] +1 (1+1=2).
currentRow[1]= min(2, 2, 1)=1
j=2: comparing A with C. substitutionCost = prev[1](prev row for j-1=1 was in previous row as 1 → but wait when i is 1 and
seq2 has lenB=2.
Wait perhaps this is getting too time-consuming, so it's better to code the function properly and have tests that cover key
cases.
</think>
Followed by the solution:
Here's an implementation of Levenshtein distance in Kotlin for nucleotide sequences (`A`, `T`, `C`, `G`) along with test
cases:
fun levenshteinDistance(a: List<Char>, b: List<Char>): Int {
val lenA = a.size
val lenB = b.size
if (lenA == 0) return lenB
if (lenB == 0) return lenA
var prevRow = IntArray(lenB + 1) { it } // Initialize previous row with 0..lenB
for (i in 1..lenA) {
val currentRow = IntArray(lenB + 1)
currentRow[0] = i
for (j in 1..lenB) {
val substitutionCost = if (a[i - 1] == b[j - 1])
prevRow[j - 1]
else
prevRow[j - 1] + 1
currentRow[j] = minOf(
prevRow[j] + 1, // Deletion
currentRow[j - 1] + 1, // Insertion
substitutionCost // Substitution
)
}
prevRow = currentRow
}
return prevRow[lenB]
}
class LevenshteinTest {
@Test
fun `empty sequences`() {
assertEquals(0, levenshteinDistance(listOf(), listOf()))
}
@Test
fun `one empty sequence`() {
val seq = listOf('A', 'T')
assertEquals(2, levenshteinDistance(seq, listOf()))
assertEquals(2, levenshteinDistance(listOf(), seq))
}
@Test
fun `identical sequences`() {
val seq1 = listOf('A', 'T', 'C', 'G')
val seq2 = listOf('A', 'T', 'C', 'G')
assertEquals(0, levenshteinDistance(seq1, seq2))
}
@Test
fun `single substitution`() {
val seq1 = listOf('A', 'T', 'G')
val seq2 = listOf('A', 'C', 'G')
assertEquals(1, levenshteinDistance(seq1, seq2))
}
@Test
fun `insertion and deletion`() {
val longer = listOf('A', 'T', 'C', 'G')
val shorter = listOf('A', 'T', 'G')
assertEquals(1, levenshteinDistance(longer, shorter))
}
}
This implementation:
- Uses dynamic programming with space optimization (O(n) space)
- Works for any character sequences but specifically handles nucleotide characters (`A`, `T`, `C`, `G`)
- Includes tests covering:
- Empty sequence cases
- Identical sequences
- Single substitution case
- Insertion/deletion scenario
The algorithm calculates the minimum number of operations (insertions, deletions, or substitutions) needed to transform one
sequence into another. Each test case verifies a different aspect of the implementation's correctness.
You can run these tests using any Kotlin testing framework like JUnit (shown here with `@Test` annotations). The
space-optimized approach ensures it works efficiently even for longer sequences typical in bioinformatics applications.
I loaded this solution right into IntelliJ and the tests passsed. Source Code.
Bonus Question While I had DeepSeek 671B online, I had to ask it one final question. :)
What is the origin of the universe?
DeepSeek's response here.
DeepSeek V3-671B on EC2
Coming Soon.
EC2 Instance Types vs Model Size
I didn't do much benchmarking, but made note of which instnace types I've ran the larger models on. My typical workflow is to prep everything on smaller instances and then scale up to larger instances at run time.
Additionaly, be sure you have enough disk. The 671B model requires 404GB of disk space. But you need more so you can copy/transform the data, and support other models.
g5.24xlarge
- Models tested: deepseek-r1-70b, deepseek-r1-32b, coder-33b-instruct
- 96 vCPUs, 384 GB RAM, 4x A10G GPUs (24GB each = 96GB total)
- $8.00/hr $192.00/day $5,760/mo
g5.12xlarge
- Models tested: deepseek-r1-32b, coder-33b-instruct
- 48 vCPUs, 192 GB RAM, 4x A10G GPUs (24GB each = 96GB total)
- $4.00/hr $96.00/day $2,880/mo
p4d.24xlarge
- Models tested: deepseek-r1-671b, deepseek-r1-70b, deepseek-r1-32b
- 96 vCPUs, 1,152 GB RAM, 8x A100 GPUs (40GB each = 320GB total)
- $32.77/hr $786.48/day $23,594/mo
GPU Monitoring on P4D.24xlarge
I found this tool helpful to monitor GPU utilization and available memory.
$ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA A100-SXM4-40GB On | 00000000:10:1C.0 Off | 0 |
| N/A 31C P0 53W / 400W | 5MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM4-40GB On | 00000000:10:1D.0 Off | 0 |
| N/A 30C P0 53W / 400W | 5MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 2 NVIDIA A100-SXM4-40GB On | 00000000:20:1C.0 Off | 0 |
| N/A 31C P0 55W / 400W | 5MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 3 NVIDIA A100-SXM4-40GB On | 00000000:20:1D.0 Off | 0 |
| N/A 30C P0 53W / 400W | 5MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 4 NVIDIA A100-SXM4-40GB On | 00000000:90:1C.0 Off | 0 |
| N/A 31C P0 54W / 400W | 5MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 5 NVIDIA A100-SXM4-40GB On | 00000000:90:1D.0 Off | 0 |
| N/A 30C P0 52W / 400W | 5MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 6 NVIDIA A100-SXM4-40GB On | 00000000:A0:1C.0 Off | 0 |
| N/A 32C P0 61W / 400W | 5MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
| 7 NVIDIA A100-SXM4-40GB On | 00000000:A0:1D.0 Off | 0 |
| N/A 30C P0 52W / 400W | 5MiB / 40960MiB | 0% Default |
| | | Disabled |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+