ok, been having some fun with this... it's not finished yet, but I'll post what I have
First, you should create a C:\Huffman folder with 2 .txt files and 1 .vbs file in it.
The first .txt file should be named test.txt and can contain the text you want to encode wih Huffman. For my example, I took the following:
Code: Select all
This is a small test to see if I can encode this with a Huffman tree.
The second .txt file should be named coded_test.txt. This is the file where the coded text will appear. If this file is not created, the script will create it for you.
The .vbs file can have any name and should contain:
Code: Select all
Dim objFSO, objCodedFile, objTextFile
Dim strCharacter
Dim strHuffVal
Dim strCount, tmpCount
Dim arrHuff(255)
Dim arrHuffVal(255)
Dim arrHuffChar(255)
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objTextFile = objFSO.OpenTextFile ("C:\Huffman\test.txt", 1)
Set objCodedFile = objFSO.OpenTextFile ("C:\Huffman\coded_test.txt", 8, True)
strCount = 0
Do Until objTextFile.AtEndOfStream
strCharacter = objTextFile.Read(1)
strHuffVal = asc(strCharacter)
arrHuff(strHuffVal) = arrHuff(strHuffVal) + 1
strCharacter = ""
strHuffVal = ""
Loop
for i = 0 to 255
strCount = arrHuff(i)
if strCOunt = "" then
'do nothing
else
tmpCount = tmpCount + 1
arrHuffVal(tmpCount) = arrHuff(i)
arrHuffChar(tmpCount) = i
end if
next
for a = tmpCount - 1 To 0 Step -1
for j= 0 to a
if arrHuffVal(j)>arrHuffVal(j+1) then
temp=arrHuffVal(j+1)
tempChar=arrHuffChar(j+1)
arrHuffVal(j+1)=arrHuffVal(j)
arrHUffChar(j+1)=arrHuffChar(j)
arrHuffVal(j)=temp
arrHuffChar(j)=tempChar
end if
next
next
for k = 1 to tmpCount
strBin=""
strBinCalc = arrHuffChar(k)
Do While strBinCalc > 0
If strBinCalc Mod 2 > 0 Then
strBin = "1" & strBin
Else
strBin = "0" & strBin
End If
strBinCalc = Int(strBinCalc / 2)
Loop
Do While len(strBin) < 8
strBin = "0" & strBin
Loop
objCodedFile.Writeline(arrHuffVal(k) & "," & arrHuffChar(k) & "," & strBin)
next
objTextFile.Close
objCodedFile.Close
When launching this script with the correct .txt files in the correct folder, this gives me following result on my test.txt file:
Code: Select all
1,46,00101110
1,72,01001000
1,73,01001001
1,84,01010100
1,100,01100100
1,114,01110010
1,117,01110101
1,119,01110111
2,99,01100011
2,108,01101100
2,109,01101101
2,111,01101111
3,102,01100110
3,104,01101000
3,110,01101110
5,97,01100001
5,105,01101001
6,115,01110011
6,116,01110100
7,101,01100101
15,32,00100000
This result is a sorted list of all characters that appear in my test.txt file. The first column indicates the amount of times a certain character appears in the text. The second column is the decimal value of that character on the ASCII table and the 3rd column is the binary value of that character.
So for example, character 32 appears 15 times in my test.txt file, which is a " " (space bar).
"e" (character 101) appears 7 times in my test.txt file.
And so on...
The next step would be to start building the actual Huffman tree. No idea yet how I'm going to do that... looks a bit tricky.
Also, I may be going about this all wrong... I have no idea, I'm just having fun trying to get it all sorted
