Java: Decrypting Caesar Cipher with Unknown Positions

Descriptions: I am suppose to come up with a program that will break encrypted messages that used a Caesar Cipher algorithm. Sounds easy enough the problem is you have to figure out what position was used to make the encrypted message in order to decrypt the coded message. So I have a method called public void train(String trainingFileName) that reads in a file with a lot of text in it to determine the frequency of each lowercase alphabetical English character (a - z) that are stored in a double array [] . That method works great so it's not necessary to look at unless you want to, but I do use a good majority of that code in the method that I'm having trouble on which is my public int decrypt(String cipherTextFileName, String outputFileName) method. The code works beautifully when the Caesar Cipher is set to 3 positions, but anything else it gives me a lot of problems. I have a do-while loop in my public int decrypt(String cipherTextFileName, String outputFileName) method that will decrypt the coded message starting with 0 positions and then using the "distance" formula that I'm using (NOTE: I cannot use any other formula) to find the minimum distance between the knownFrequencies and the observedFreq in my encrypted message. Right now I have my do-while loop set to where if the distance is less than 0.6 then stop the loop. In theory when I have the correct number of positions in the Caesar Cipher the distance should be below that value.

Problem: Program works great when numberOfPositions is 3, but when I use an encrypted message that is not using 3 positions in the Caesar Cipher the distance never falls below 1 and while in debug mode when I set the numberOfPositions to what it should be to decrypt the message, the message is still encrypted.

Question: How can I implement this method better so I am not testing my distance on a "hard" value to stop the do-while loop? I tried using Math.min() , but that doesn't work. Why can't I decode a message with the positions of the Caesar Cipher other than 3.

I will show you my code now. If you want to test it on your system. You will need 3 text files. One file has to be long with a bunch of words in it... at least 1000. That file will be read in the train method. You need a file with an encrypted message and another file for the program to write the decrypted message.

Here is an encrypted message first using 3 positions of the Caesar Cipher and then 5 positions.

Wkh surjudp zdv krvwhg eb dfwru Slhufh Eurvqdq dqg kdg frpphqwdub iurp pdqb Kroobzrrg dfwruv dqg iloppdnhuv Prylh txrwdwlrqv wkdw ylhzhuv xvh lq wkhlu rzq olyhv dqg vlwxdwlrqv

Ymj uwtlwfr bfx mtxyji gd fhytw Unjwhj Gwtxsfs fsi mfi htrrjsyfwd kwtr rfsd Mtqqdbtti fhytwx fsi knqrrfpjwx Rtanj vztyfyntsx ymfy anjbjwx zxj ns ymjnw tbs qnajx fsi xnyzfyntsx

When decrypted it should say: The program was hosted by actor Pierce Brosnan and had commentary from many Hollywood actors and filmmakers Movie quotations that viewers use in their own lives and situations

Alright here is the class I wrote (you will need all the imports) and I would like to thank anyone who helps in advance:

public class CodeBreaker {
    public final int NUMBER_OF_LETTERS = 26;

    private double[] knownFrequencies = new double[NUMBER_OF_LETTERS];

    public double[] getKnownFrequencies() {
        return knownFrequencies;

    public void setKnownFrequencies(double[] knownFrequencies) {

        this.knownFrequencies = knownFrequencies;

     * Method reads in a file with a lot of text in it and
     * then use that to figure out the frequencies of each character
     * @param trainingFileName
    public void train(String trainingFileName) {
        try {
            Scanner fileIO = new Scanner(new File(trainingFileName));

            int total = 0;
            String temp = "";

            while (fileIO.hasNext()) {
                //reading into file and storing it into a string called temp
                temp += fileIO.next().toLowerCase().replaceAll("[ -,!?';:.]+", "");
                //converting temp string into a char array

            char[] c = temp.toCharArray();
            total += c.length; // how many characters are in text
            int k = (int) 'a'; // int value of lowercase letter 'a'
            int[] counter = new int[NUMBER_OF_LETTERS];

            for (int j = 0; j < total; j++) {
                for (int i = k - k; i < knownFrequencies.length; i++) {

                    char[] d = new char[knownFrequencies.length];

                    d[i] = (char) (k + i);

                    if (c[j] == d[i]) {//checking to see if char in text equals char in d array


                        knownFrequencies[i] = (double) counter[i] / total;

        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block

     * Main decryption method used to take coded text from a file, figure out the positions in the CaesarCipher
     * and then decode it onto another file.
     * @param cipherTextFileName
     * @param outputFileName
     * @return
    public int decrypt(String cipherTextFileName, String outputFileName) {
        Scanner fileIO;
        int numberOfPositions = 0;
        double distance = 0.000000;

        try {
            fileIO = new Scanner(new File(cipherTextFileName));

            PrintWriter writer = new PrintWriter(new File(outputFileName));

            String temp = "";

            while (fileIO.hasNext()) {
                //reading into file and storing it into a string called temp
                temp += fileIO.next().toLowerCase().replaceAll(" ", "");


            do {
                distance = 0.0;
                int total = 0;
                double[] observedFreq = new double[NUMBER_OF_LETTERS];
                temp = decrypt(temp, numberOfPositions);
                char[] c = temp.toCharArray(); //store decrypted chars into an array
                total += c.length; // how many characters are in text

                int k = (int) 'a'; // int value of lowercase letter 'a'
                int[] counter = new int[NUMBER_OF_LETTERS]; //use to count the number of characters in text

                for (int j = 0; j < total; j++) {
                    for (int i = k - k; i < observedFreq.length; i++) {
                        char[] d = new char[observedFreq.length];
                        d[i] = (char) (k + i);
                        if (c[j] == d[i]) { //checking to see if char in text equals char in d array
                            observedFreq[i] = (double) counter[i] / total;

                //Formula for finding distance that will determine the numberOfPositions in CaesarCipher
                for (int j = 0; j < knownFrequencies.length; j++) {
                    distance += Math.abs(knownFrequencies[j] - observedFreq[j]); //This is the part of the code I am having trouble with

                numberOfPositions = numberOfPositions + 1;

            } while (distance > 0.6); //This is the part of the code I am having trouble with

            Scanner fileIO2 = new Scanner(new File(cipherTextFileName));

            while (fileIO2.hasNextLine()) {

                //reading into file and storing it into a string called temp
                temp = fileIO2.nextLine();

                writer.println(decrypt(temp, numberOfPositions));


        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
        return numberOfPositions;


     * CaesarCipher decrypt and encrypt methods
     * @param ciphertext
     * @param numberOfPositions
     * @return
    public String decrypt(String ciphertext, int numberOfPositions) {
        // TODO Auto-generated method stub
        return encrypt(ciphertext, -numberOfPositions);

    public String encrypt(String msg, int offset) {
        offset = offset % 26 + 26;
        StringBuilder encoded = new StringBuilder();
        for (char i : msg.toCharArray()) {
            if (Character.isLowerCase(i)) {
                int j = (i - 'a' + offset) % 26;
                encoded.append((char) (j + 'a'));
            } else if (Character.isUpperCase(i)) {
                int h = (i - 'A' + offset) % 26;
                encoded.append((char) (h + 'A'));
            } else {
        return encoded.toString();

    // barebones main method to test your code
    public static void main(String[] args) {
        // args[0] contains the filename of the training file
        // args[1] contains the filename of the cipher text file
        // args[2] contains the filename of the output file
        CodeBreaker cb = new CodeBreaker();
        System.out.println(cb.decrypt(args[1], args[2]));


The standard method of decoding a Caesar cypher is called, "running down the alphabet". Essentially a brute force solution; you try all the possibilities. Since there are only 26 possible keys, it is not that difficult.

Taking your example:


wkh surjudp zdv krvwhg ...
xli tvskveq aew lswxih ...
ymj uwtlwfr bfx mtxyji ...
znk vxumxgs cgy nuyzkj ...
aol wyvnyht dhz ovzalk ...
bpm xzwoziu eia pwabml ...
cqn yaxpajv fjb qxbcnm ...
dro zbyqbkw gkc rycdon ...
esp aczrclx hld szdepo ...
ftq bdasdmy ime taefqp ...
gur cebtenz jnf ubfgrq ...
hvs dfcufoa kog vcghsr ...
iwt egdvgpb lph wdhits ...
jxu fhewhqc mqi xeijut ...
kyv gifxird nrj yfjkvu ...
lzw hjgyjse osk zgklwv ...
max ikhzktf ptl ahlmxw ...
nby jlialug qum bimnyx ...
ocz kmjbmvh rvn cjnozy ...
pda lnkcnwi swo dkopaz ...
qeb moldoxj txp elpqba ...
rfc npmepyk uyq fmqrcb ...
sgd oqnfqzl vzr gnrsdc ...
the program was hosted ...
uif qsphsbn xbt iptufe ...
vjg rtqitco ycu jquvgf ...
wkh surjudp zdv krvwhg ...

It is simple enough for a human to find the correct line with only 26 to pick from. For a computer it is more difficult. Your idea of counting letter frequencies is good. You might also mark down letter pairs like "qx" and mark up pairs like "th". Calculate the score for all 26 possible results and pick the highest scoring result. As long as you have tuned your scoring method well, then you have a good chance of finding the right solution.

Taking the suggestion from rossum, and realizing that my initial class was a total mess that nobody could understand. I rewrote the class using bunch of methods this time instead of clumping everything into one or two methods, and now the class works perfectly. I am up for any suggestions to make the code more efficient. To me it seems a little redundant so any suggestion for improvement are welcome. This was for a class assignment which the due date has passed, so this code is going to be for reference.

public class CodeBreaker {

//Setting up instance variables and setter/getter methods
public final int NUMBER_OF_LETTERS = 26;
private int numberOfPositions = 0;

private double[] knownFrequencies = new double[NUMBER_OF_LETTERS]; 
private double[] observedFreq = new double[NUMBER_OF_LETTERS];

public double[] getKnownFrequencies() {
    return knownFrequencies;

public void setKnownFrequencies(double[] knownFrequencies) {
    this.knownFrequencies = knownFrequencies;

//This method reads text from a long file, breaks it down into individual characters, and stores it in the knownFrequencies array
public void train(String trainingFileName) {
    String tempString = "";
    double totalChars = 0.0;
    try {
        Scanner FileIO = new Scanner(new File(trainingFileName)).useDelimiter("[ *-,!?.]+"); //reading text from a file using 
        //the delimiter so we get all of the contents
            tempString += FileIO.next().toLowerCase();//storing contents into a string, all lower case 
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block

    //Figuring out total number of English letters(a-z) used to determine the frequencies
    for(int j = 0; j < tempString.length(); j++){
        char ch = tempString.charAt(j);

    //Initializing the knownFrequencies array with each individual letter count a-z
    for (int k = 0; k <= tempString.length()-1; k++){
        char ch = tempString.charAt(k);
        double chValue = (double) ch;
        if (Character.isAlphabetic(ch)) {
            if(chValue >= 97 && chValue <= 122){
                knownFrequencies[ch - 'a']++;

    //Divide the individual letter counts by the total to get a decimal number
    //for the frequency and store that into the knownFrequencies array.
    for (int i = 0; i < knownFrequencies.length; i++) {
        if(knownFrequencies[i] > 0){

            knownFrequencies[i] = knownFrequencies[i]/totalChars;



//This method does practically the same thing in the train method except it doesn't read from a file, and it compiles all of the 
//cipher text characters to find the frequencies that will be used later to determine the key
public void setObservedFreq(String tempString)//String parameter takes in the cipher text
    //Finding total number of lower case English letters (a-z)
    double totalChars = 0.0;
    for(int j = 0; j < tempString.length(); j++){
        char ch = tempString.charAt(j);
    //Initializing observedFreq with the number of letters in the string.
    for (int k = 0; k <= tempString.length()-1; k++){
        char ch = tempString.charAt(k);
        double chValue = (double) ch;
        if (Character.isAlphabetic(ch)) {
            if(chValue >= 97 && chValue <= 122){
                observedFreq[ch - 'a']++;

    //Re-initializing with a decimal frequency. 
    for (int i = 0; i < NUMBER_OF_LETTERS; i++) {
        if(observedFreq[i] > 0){
            observedFreq[i] = observedFreq[i]/totalChars;

//This method subtracts the absolute value of the observedFreq from the knownFrequencies, sum all those together and store it
//in a variable that will be return in the method. The smallest distance value means the cipher text has been decoded.
public double findDistance(){
    double distance = 0.0;
    for(int x = 0; x < NUMBER_OF_LETTERS; x++){
        distance += Math.abs(knownFrequencies[x] - observedFreq[x]);

//This method finds a int value that will be used as the key to decipher the cipherText
public int findNumberOfPositions(String cipherText){
    int smallestIndex = 0;
    double [] indexArray = new double [NUMBER_OF_LETTERS];

    //We are going through all possible shifts (up to 25) to see and storing those distances into the indexArray.
    for(int i = 0; i < NUMBER_OF_LETTERS; i ++){

        indexArray[i] = findDistance();


    //Determine which index in the array has the smallest distance
    double currentValue = indexArray[0];
    for (int j=0; j < NUMBER_OF_LETTERS; j++) {
        if (indexArray[j] < currentValue)
            currentValue = indexArray[j];
            smallestIndex = j;
    return smallestIndex; //The index is returned and will be used for the key when the message is decrypted

//Read in a file that contains cipher text decrypt it using the key that was found in the findNumberOfPositions method
//then write the plain text into a output file.
public int decrypt(String cipherTextFileName, String outputFileName) {
    String tempString = "";

    try {
        Scanner FileIO = new Scanner(new File(cipherTextFileName)).useDelimiter("[ *-,!?.]+");

            tempString += FileIO.next().toLowerCase();//read into a file and store lower case text it into tempString

    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block

    numberOfPositions = findNumberOfPositions(tempString); //call our findNumberOfPositions method to find the key

    try {
        Scanner scan = new Scanner(new File(cipherTextFileName));
        PrintWriter writer = new PrintWriter(new File(outputFileName));
            writer.println(decrypt(scan.nextLine(), numberOfPositions)); //key is then used to decrypt the message and gets 
            //printed into another file.
    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block

    return numberOfPositions;

//Caesar Cipher encrypt and decrypt methods
public String decrypt(String ciphertext, int numberOfPositions) {
    // TODO Auto-generated method stub
    return encrypt(ciphertext, -numberOfPositions);

public String encrypt(String msg, int offset){

    offset = offset % 26 + 26;
    StringBuilder encoded = new StringBuilder();
    for (char i : msg.toCharArray()) {
        if (Character.isLowerCase(i)) {
            int j = (i - 'a' + offset) % 26;
            encoded.append((char) (j + 'a'));
        else if(Character.isUpperCase(i)){
            int h = (i - 'A' + offset) % 26;
            encoded.append((char) (h + 'A'));
        else {
    return encoded.toString();
public static void main(String[] args) {
    // args[0] contains the filename of the training file
    // args[1] contains the filename of the cipher text file
    // args[2] contains the filename of the output file
    CodeBreaker cb = new CodeBreaker();
    cb.decrypt(args[1], args[2]); 

