Java - Counting words, lines, and characters from a file

问题: I'm trying to read in words from a file. I need to count the words, lines, and characters in the text file. The word count should only include words (containing only alphab...

问题:

I'm trying to read in words from a file. I need to count the words, lines, and characters in the text file. The word count should only include words (containing only alphabetic letters, no punctuation, spaces, or non-alphabetic characters). The character count should only include the characters inside those words.

This is what I have so far. I'm unsure of how to count the characters. Every time I run the program, it jumps to the catch mechanism as soon as I enter the file name (and it should have no issues with the file path, as I've tried using it before). I tried to create the program without the try/catch to see what the error was, but it wouldn't work without it.

import java.util.Scanner;
import java.util.StringTokenizer;
import java.io.*;

public class WordCount
{
    public static void main(String[] args)
    {
        Scanner userInput = new Scanner(System.in);

        try {
            // Input file
            System.out.println("Please enter the name of the file.");
            String fileName = userInput.next( );
            File newFile = new File("C:/Users/garre/OneDrive/Desktop/" + fileName);

             // Word count, line count, and character count variables; temporary string variable
            int wordC = 0;
            int lineC = 0;
            int charC = 0;
            String tempo;

            // Text file scanner
            Scanner fileScan = new Scanner(newFile);

            while (fileScan.hasNextLine( )) {
                lineC++;

                tempo = fileScan.nextLine( );
                wordC += new StringTokenizer(tempo, "[.,:;()?!"\s]+").countTokens( );
                System.out.println("Lines: " + lineC + "nWords: " + wordC);
            }
        }

        catch (IOException ex1) {
            System.out.println("Error.");
            System.exit(0);
        }
    }
}

Why is it jumping to the catch function when I enter the file name? How can I fix this program to properly count words, lines, and characters in the text file?


回答1:

I tried your code but I didn't receive any exception here. However, I suspect that when you input the file name, maybe you forgot the extension of the file.


回答2:

I don't get any exception with your code if I give a proper file name. As for reading the number of character, you should modify the logic a little bit. Instead of directly concatenating the number of words count, you should create a new instance of StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+"); and iterate through all the token and sum the length of each token. This should give you the number of characters. Something like below

while (fileScan.hasNextLine()) {
            lineC++;
            tempo = fileScan.nextLine();
            StringTokenizer st = new StringTokenizer(tempo, "[ .,:;()?!]+");
            wordC += st.countTokens();
            while(st.hasMoreTokens()) {
                String stt = st.nextToken();
                System.out.println(stt); // Displaying string to confirm that like is splitted as I expect it to be
                charC += stt.length();
            }
            System.out.println("Lines: " + lineC + "nWords: " + wordC+" nChars: "+charC);
        }

Note: Escaping character with StringTokenizer will not work. i.e. you would expect that \s should delimit with any whitespace character but it will instead delimit based on literal character s. If you want to escape a character, I suggest you to use java.util.Pattern and java.util.Matcher and use it matcher.find() to idenfity words and characters


回答3:

You probably forgot the file extension while giving input, but there is a much simpler way of doing this. You also mention you don't know how to count the characters. You can try something like this:

import java.util.Scanner;
import java.util.StringTokenizer;
import java.io.*;
import java.util.stream.*;

public class WordCount
{
    public static void main(String[] args)
    {
        Scanner userInput = new Scanner(System.in);

       try {
            // Input file
            System.out.println("Please enter the name of the file.");
            String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));
            System.out.printf("Lines: %dnWords: %dnCharacters: %d",content.split("n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());
            }


        catch (IOException ex1) {
            System.out.println("Error.");
            System.exit(0);
        }
    }
}

Going through the code

import java.util.stream.*;

Note we use the streams package, for filtering out empty strings while finding words. Now let's skip forward a bit.

String content = Files.readString(Path.of("C:/Users/garre/OneDrive/Desktop/" + userInput.next()));

The above part gets all of the text in the file and stores it as a string.

System.out.printf("Lines: %dnWords: %dnCharacters: %d",content.split("n").length,Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(),content.length());

Okay, this is a long line. Let's break it down.

"Lines: %dnWords: %dnCharacters: %d" is a format string, where each %d is replaced with the corresponding argument in the printf function. The first %d will be replaced by content.split("n").length, which is the number of lines. We get the number of lines by splitting the string.

The second %d is replaced by Stream.of(content.split("[^A-Za-z]")).filter(x -> !x.isEmpty()).count(). Stream.of creates a stream from an array, and the array is an array of strings after you split on anything that is non-alphabetic (you said words are anything that are non-alphabetic). Next, we filter all the empty values out, since String.split keeps in empty values. The .count() is self-explanatory, takes the amount of words left after filtering.

The third and last %d is the simplest. It is replaced by the length of the string. content.length() should be self-explanatory.

I left your catch block intact, but I feel like the System.exit(0) is a bit redundant.

  • 发表于 2019-03-15 11:51
  • 阅读 ( 321 )
  • 分类:sof

条评论

请先 登录 后评论
不写代码的码农
小编

篇文章

作家榜 »

  1. 小编 文章
返回顶部
部分文章转自于网络,若有侵权请联系我们删除