Thursday, July 26, 2012

Character, Line and Word Counting in Text Files using ANSI C

To understand this tutorial you probably should know a little bit about how to operate with text files and external arguments. If you don't, read the following articles:

1. Counting characters
/* Description:
 * Counts the number of characters existing in the file specified by @stream.
 * Parameters:
 * stream - a pointer to a text file
 * Returns:
 *  The number of characters in the file.
 * Postconditions:
 *  The file pointer will be positioned at the end of the stream.
 */
long CountCharacters(FILE* stream)
{
    long counter = 0L;
    char c = 0x01;
    /*Counts until the cursor reaches EOF*/
    while(c!=EOF)
    {
        c = fgetc(stream);
        counter++;
    }
    return counter;
}
The function will return a long variable in order to avoid possible counter overflows caused by large files. The char variable c is initialized to an arbitrary variable so I could use a while loop instead of a do/while loop. The loop will be exited when the file cursor will reach the EOF (End of File) character.

2. Counting Lines
/* Description:
 * Counts the number of lines existing in the file specified by @stream.
 * Parameters:
 * stream - a pointer to a text file
 * Returns:
 *  The number of lines in the file.
 * Postconditions:
 *  The file pointer will be positioned at the end of the stream.
 */
long CountLines(FILE* stream)
{
 /*The counter is initialized to 1 because it will not count the first
  line*/
    long counter = 1L;
    char c = 0x01;
    /*Counts until the cursor reaches EOF*/
    while(c!=EOF)
    {
        c = fgetc(stream);
        /*Checks if it encounters and a newline character*/
        if(c=='\n')
        {
            counter++;
        }
    }
    return counter;
}
The same rules apply as above with the exception that the counter starts at 1 in order to include the first line. Also, the counter is only incremented when an '\n' (newline) character is encountered.

3.Counting Words
/* Description:
 * Counts the number of words existing in the file specified by @stream.
 * Parameters:
 * stream - a pointer to a text file
 * Returns:
 *  The number of words in the file.
 * Preconditions:
 *  We assume that after every punctuation mark and word delimiter there is a
 *  whitespace character.
 * Postconditions:
 *  The file pointer will be positioned at the end of the stream.
 */
long CountWords(FILE* stream)
{
    long counter = 0L;
    char c = 0x01;
    bool isInsideWord = true;
    while(c!=EOF)
    {
        c = fgetc(stream);
        if(isInsideWord==true)
        {
         counter++;
         isInsideWord = false;
        }
        else if(isspace(c))
        {
         isInsideWord = true;
        }
    }
    return counter;
}
We shall use the boolean variable isInsideWord in order to memorize the state of the file pointer. If a whitespace character was read, the state of isInsideWord will be toggled (will signify either the start or the end of a word).

I assumed that the text is written correctly and there no construction such as "There is no space after this point.Cool!". I tested the function with various text files and it gives results close to the word counter provided by LibreOffice.

4.Example
#include <stdio.h>
#include <stdbool.h>
#include <ctype.h>

#define NR_ARGS        2
#define FILE_ARG_INDEX 1

#include<stdio.h>

long CountCharacters(FILE* stream);
long CountLines(FILE* stream);
long CountWords(FILE* stream);

/*
 * Description:
 *  The program prints the number of characters, lines and words existing
 *  in a text file. It should be called like:
 *   Counter file.txt
 */
int main(int argc, char** argv)
{
 FILE* stream = NULL;
 long characters = 0UL, lines = 0UL, words = 0UL;
 if(argc==NR_ARGS)
 {
     stream = fopen(argv[FILE_ARG_INDEX],"rt");
     if(stream!=NULL)
     {
         characters = CountCharacters(stream);
         /*The function positions the cursor at the end of the
          stream. In order to count correctly the cursor should
          positioned at the start of the stream*/
         rewind(stream);
         lines = CountLines(stream);
         /*The function positions the cursor at the end of the
          stream. In order to count correctly the cursor should
          positioned at the start of the stream*/
         rewind(stream);
         words = CountWords(stream);
         /*The function positions the cursor at the end of the
          stream. In order to count correctly the cursor should
          positioned at the start of the stream*/
         rewind(stream);
         printf("Characters: %ld\n"
                "Lines     : %ld\n"
                "Words     : %ld\n",
                characters,lines,words);
         fclose(stream);
     }
     else
     {
         perror("Could not open file");
     }
 }
 else
 {
     perror("Incorrect number of arguments");
 }
 return 0;;
}


No comments:

Post a Comment

Got a question regarding something in the article? Leave me a comment and I will get back at you as soon as I can!

Related Posts Plugin for WordPress, Blogger...
Recommended Post Slide Out For Blogger