Sunday, October 20, 2024

How to Read a Large Text File Line by Line Using Java Sample Code

Reading a large text file efficiently in Java is a common task in day to day software development. When dealing with files too large to load into memory at once, it's essential to read them line by line to avoid memory overflow. The topic is also discussed during interviews specically focusing on underlying implementation of below classes. In this post, we will explore different approaches for reading a large file line by line using Java:

  1. Using BufferedReader (Pre-Java 8)
  2. Using the Files class with Java 8 and above
  3. Using java.util.Scanner
  4. Using Files.newBufferedReader()
  5. Using SeekableByteChannel
  6. Using FileUtils.lineIterator (Apache Commons IO)

We will provide sample code for reading large files using each of these methods, making it easy for you to choose the approach that best fits your project.

1. Reading a Large File Line by Line Using BufferedReader


import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

public class ReadLargeFileUsingBufferedReader {
    public static void main(String[] args) {
        String samplePath = "example/path/to-large-file.txt";

        try (BufferedReader br = new BufferedReader(new FileReader(samplePath))) {
            String line;
            while ((line = br.readLine()) != null) {
                System.out.println(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

2. Reading a Large File Line by Line Using Java 8 Files.lines()


import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.stream.Stream;

public class ReadLargeFileUsingJava8 {
    public static void main(String[] args) {
        String samplePath = "example/path/to-large-file.txt";

        try (Stream<String> lines = Files.lines(Paths.get(samplePath))) {
            lines.forEach(System.out::println);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

3. Reading a Large File Line by Line Using java.util.Scanner


import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class ReadLargeFileUsingScanner {
    public static void main(String[] args) {
        String samplePath = "example/path/to-large-file.txt";

        try (Scanner scanner = new Scanner(new File(samplePath))) {
            while (scanner.hasNextLine()) {
                String line = scanner.nextLine();
                System.out.println(line);
            }
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        }
    }
}

4. Reading a Large File Line by Line Using Files.newBufferedReader()


import java.io.BufferedReader;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Paths;

public class ReadLargeFileUsingNewBufferedReader {
    public static void main(String[] args) {
        String samplePath = "example/path/to-large-file.txt";

        try (BufferedReader reader = Files.newBufferedReader(Paths.get(samplePath))) {
            String line;
            while ((line = reader.readLine()) != null) {
                System.out.println(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

5. Reading a Large File Line by Line Using SeekableByteChannel


import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.channels.SeekableByteChannel;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class ReadLargeFileUsingSeekableByteChannel {
    public static void main(String[] args) {
        Path samplePath = Paths.get("example/path/to-large-file.txt");

        try (SeekableByteChannel sbc = Files.newByteChannel(samplePath)) {
            ByteBuffer buffer = ByteBuffer.allocate(1024);
            while (sbc.read(buffer) > 0) {
                buffer.flip();
                while (buffer.hasRemaining()) {
                    System.out.print((char) buffer.get());
                }
                buffer.clear();
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

6. Reading a Large File Line by Line Using FileUtils.lineIterator (Apache Commons IO)


import org.apache.commons.io.FileUtils;
import org.apache.commons.io.LineIterator;

import java.io.File;
import java.io.IOException;

public class ReadLargeFileUsingLineIterator {
    public static void main(String[] args) {
        String samplePath = "example/path/to-large-file.txt";

        try (LineIterator it = FileUtils.lineIterator(new File(samplePath), "UTF-8")) {
            while (it.hasNext()) {
                String line = it.nextLine();
                System.out.println(line);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

Key Differences Between These Methods

  • BufferedReader is one of the most memory-efficient options for reading large files.
  • Files.lines() and Files.newBufferedReader() provide modern, concise approaches but may consume more memory for larger files.
  • Scanner is flexible but slower for very large files.
  • SeekableByteChannel allows random access to file data, useful when working with specific sections of a file.
  • FileUtils.lineIterator() is perfect for handling extremely large files while keeping memory usage low.

Interview Questions: File Reading in Java

If you're preparing for a Java interview, understanding file handling in Java is crucial. Below are some common interview questions related to reading large files in Java:

1. What is the difference between BufferedReader and Files.lines()?

  • BufferedReader reads the file line by line without loading the entire file into memory, making it more memory efficient for large files.
  • Files.lines() uses a stream to read all lines from a file and can be easily processed with Java Streams API. However, it might load more data into memory, especially for very large files.

2. When would you choose Scanner over BufferedReader?

  • Scanner is better suited for parsing input with custom delimiters and working with different types of data (e.g., integers, floats). However, for reading large files line by line efficiently, BufferedReader is a better option because of its lower memory consumption and faster performance.

3. What is the advantage of using SeekableByteChannel?

  • SeekableByteChannel allows random access to file data. You can jump to a specific position in the file to read or write, which is not possible with BufferedReader or Scanner.

4. How does FileUtils.lineIterator() handle very large files?

  • FileUtils.lineIterator() is part of Apache Commons IO and allows reading large files with very low memory usage. It loads and processes the file line by line without consuming excessive memory.

5. What are the advantages of using Files.newBufferedReader() in Java 8?

  • Files.newBufferedReader() is a modern API that simplifies file handling and integrates with the Path class, offering a concise and readable way to work with files. It's an efficient alternative to BufferedReader for reading files in Java 8 and above.

Learnings:

Java provides various ways to read a large text file line by line. You can choose the right approach based on your project’s requirements, file size, and memory considerations. For most cases, BufferedReader or Files.newBufferedReader() is the best option for efficiency and simplicity. If you need to work with very large files, FileUtils.lineIterator() or SeekableByteChannel may offer better performance with minimal memory usage.

Related Keywords:

  • Sample code for reading a large file using Java
  • Example code for reading large file in Java
  • Java BufferedReader example
  • Java 8 Files.lines() example
  • Apache Commons IO lineIterator example

Saturday, October 19, 2024

How to List S3 Buckets and Objects Using AWS CLI

How to List S3 Buckets and Objects Using AWS CLI

Amazon Simple Storage Service (S3) is a scalable cloud storage solution provided by AWS, widely used for storing data of all kinds. Whether you are managing backups, application files, or large datasets, the AWS CLI (Command Line Interface) is an essential tool for quickly interacting with S3. One of the most frequent tasks is listing buckets and objects in your S3 storage.

In this article, we’ll guide you through various methods of listing your S3 buckets and their contents using AWS CLI. We will explain each command and provide examples to help you get started quickly.

Prerequisites

  • AWS CLI is installed: You can install it from the AWS CLI installation guide.
  • AWS CLI is configured: Run the command aws configure to set up your credentials (Access Key, Secret Access Key, Region, etc.).
  • Necessary permissions: Make sure your IAM user has the right permissions to access and list S3 buckets. The required permission is s3:ListBucket.

1. Listing All S3 Buckets

To list all the S3 buckets in your AWS account, use the following command:

aws s3 ls

This command will return a list of all S3 buckets with their creation dates.

Example Output:

2023-10-12 12:34:56 bucket-name-1
2023-09-10 08:21:33 bucket-name-2

2. Listing Contents of a Specific S3 Bucket

If you want to list all the objects in a specific bucket, you can append the bucket name to the command:

aws s3 ls s3://bucket-name

Replace bucket-name with the actual name of your S3 bucket.

Example Output:

2024-01-10 14:20:15    1024 file1.txt
2024-01-10 14:30:25    2048 file2.txt

3. Listing Objects in a Specific Folder

S3 buckets can contain virtual directories (folders). To list the contents of a specific folder within a bucket, specify the folder name:

aws s3 ls s3://bucket-name/folder-name/

Example Output:

2024-02-15 15:10:05    512  folder-name/file3.jpg
2024-02-16 10:12:45   1024  folder-name/file4.pdf

4. Listing Objects Recursively

To list all objects in a bucket, including those stored in subdirectories, use the --recursive option:

aws s3 ls s3://bucket-name --recursive

Example Output:

2024-01-10 14:20:15    1024 folder1/file1.txt
2024-01-10 14:30:25    2048 folder2/file2.txt
2024-01-11 09:15:10    512  folder2/subfolder/file3.jpg

5. Listing with Human-Readable File Sizes

To view file sizes in a human-readable format (e.g., KB, MB, GB), use the --human-readable option:

aws s3 ls s3://bucket-name --human-readable

Example Output:

2024-01-10 14:20:15   1.0 KiB folder1/file1.txt
2024-01-10 14:30:25   2.0 KiB folder2/file2.txt

6. Summarizing Total Files and Sizes

To get a summary of the total number of objects and their cumulative size in a bucket, use the --summarize option along with --recursive:

aws s3 ls s3://bucket-name --recursive --summarize

Example Output:

2024-01-10 14:20:15    1024 folder1/file1.txt
2024-01-10 14:30:25    2048 folder2/file2.txt

Total Objects: 2
Total Size: 3 KiB

7. Filtering Results Using Wildcards

You can filter the objects by file name patterns using wildcards:

aws s3 ls s3://bucket-name --recursive --exclude "*" --include "*.txt"

This command will only list .txt files, excluding other file types.

Common Errors and How to Fix Them

  • Access Denied Error: Ensure that your IAM user has the necessary permissions to list the bucket contents. You need s3:ListBucket and possibly other permissions for more advanced actions.
  • No Such Bucket: Verify that the bucket name is correct and exists in the region you’re working in.
  • CLI Configuration Issues: Ensure the AWS CLI is properly configured using aws configure, and check if you’re using the correct AWS profile if necessary.

Using the AWS CLI to list S3 buckets and objects is a powerful way to interact with your storage without needing to navigate the AWS Management Console. Whether you're listing all buckets, viewing files in a folder, or summarizing the total size of a bucket, these commands provide flexibility and control over your cloud storage operations.

By mastering these CLI commands, you can streamline your cloud management processes and handle S3 tasks more efficiently, saving both time and effort.

Sunday, September 13, 2015

Java Function Code to create Reverse Linked List of Singly Linked List (In memory pointer changes)

Here is a sample java Function Code to create Reverse Linked List of Singly Linked List (In memory pointer changes). Approach
  • 1) Maintain a previous node pointer
  • 2) maintain a current node pointer
  • 3) store current-> next in temporary variable
  • 4) replace the current node-> next to previous pointer
  • 5) do step 3 and 4 till you reach current node is your end node
  • 6) at the end update the head->next to null as the old linked list head will be your end node now
  • 7) return previous node as this will be the head of your new linked list.
You can see the image below - before and after structure of linked list: java Function Code to create Reverse Linked List
package com.sample.linkedlist;

public class ReverseLinkedList {

 static class Node {
  int data;
  Node next;

  public Node(int data) {
   this.data = data;
  }

  @Override
  public String toString() {
   return "Node[data=" + data + "]";
  }

 }

 public static Node reverse(Node head) {

  if (head == null)
   return null;

  Node currentNode = head.next;
  Node previousNode = head;
  while (currentNode != null) {
   System.out.println("previous:" + previousNode +" current:"+currentNode);
   
   Node tempCurrent = currentNode.next;

   currentNode.next = previousNode;
   previousNode = currentNode;

   /* move to next node */
   currentNode = tempCurrent;
  }
  head.next = null;

  return previousNode;
 }

 static void print(Node head) {
  while (head != null) {
   System.out.print(head + " ");
   head = head.next;
  }
  System.out.println();

 }

 public static void main(String[] args) {
  
  Node head = new Node(1);
  head.next = new Node(2);
  head.next.next = new Node(3);
  head.next.next.next=new Node(4);
  
  /*print actual linked list*/
  print(head);

  Node newHead = reverse(head);

  /*print reversed linked list*/
  print(newHead);
 }
}

Output
Node[data=1] Node[data=2] Node[data=3] Node[data=4] 
previous:Node[data=1] current:Node[data=2]
previous:Node[data=2] current:Node[data=3]
previous:Node[data=3] current:Node[data=4]
Node[data=4] Node[data=3] Node[data=2] Node[data=1] 


Please share/ask more data structure problems with us. We will try to solve your problem in coming posts. Also share this post with your friends/classmates.

Monday, December 17, 2012

Principles of Testing - Software Testing Principles

Testing is a process to expose hidden defects. It is detecting errors and deviations from specification. It is verifying that a system satisfies its specified requirements or not. Here discuss principles of testing.

Software Testing Principles

Seven General principles of Software testing

Principle 1. Exhaustive Input Testing
Exhaustive input testing means test all possible input condition as test cases. Test for all valid and invalid input conditions. But it is impossible to test a system for all input test cases.

Principle 2. Testing is creative and difficult
Second principle of testing is "testing is creative and difficult". Yes it requires creativity. It requires extensive domain knowledge. It also requires good testing methodology.

Principle 3. Prevention of defects
It is a procedure to fixing errors. If we find defects in early stages of development then its cost of fixing is less. Cost of fixing of any defect is too higher in later stages. So it's better to go with preventive approach.

Principle 4. Testing is risk based
Testing is risk based process. A risk is a loss associated with an event. Risk can be economical.
Suppose I try to test one module of system. I got some defects. I solved defects but due to modification some more defects appear in same module or other modules. So it is risky process. Sometime it increases the cost of testing.

Principle 5. Testing must be planned
Test planning is essential. Test Planning helps to solve many problems in system. Test plan covers points like requirement of testing, test priority, cost of testing, test team, test strategy, test tools etc. These factors effect on testing.

Principle 6. Testing requires independence
Testing must be unbiased. Unbiased testing is essential to objectively test quality of software. If testing is done by same developer who develop it then it may be biased. Developer can be an emotional attachment with its development. Developer who has to test his or her program parts will tend to be too optimistic. Chances of “blindness to their own errors”. So require to testing done by other than developer to make unbiased.

Principle 7. Provide expected results
It is important principle of testing. Test is done to check system is fulfilling user requirement or not. System must provide expected results. Testing checks that system pre defined specifications achieved or not.