Saturday, September 3, 2016

Parsing CSV using Java

We could parse CSV in Java without using third party libraries because CSV file is nothing but a text file and we could use BufferedBufferedReader and split() method from java.lang.String class

The csv file 

Our CSV file - Employees.txt
NAME,Age,location
Binod,20,Hyderabad,INR
Samuel,20,Boston,SGD
Billy,19,China,YUAN


Duplicates in an Array

Question: All numbers in an array with length n+1 are in range from 1 to n, so there is at least one duplication in the array. How to find any a duplication? Please don't modify the input array.

Analysis: The simple solution is to utilize hash tables. When scanning the array, array elements are inserted into the hash table one by one. In this way, it's easy to find a duplication with the hash table, which costs O(n) space.

Let's try not to employ extra space. Why there are duplications in the array? If there are no duplications, the count of numbers ranging from 1 to n is n. Since the array contains more thann numbers, there should be duplications. It looks like it's important to count numbers in ranges.

Let's divide numbers from 1 to n into two ranges, split with the number in the middle (denoted as m), and then count the numbers of the two subranges. If the count of numbers from 1 to mis greater than m, the duplication is in the range from 1 to m. Otherwise, there should be at least one duplication in the range from m+1 to n. And then we continue the recursive process until we find the duplication.

The Java code is listed below:

static int countRange(int[] numbers, int start, int end)
{
    int count = 0;
    for(int i = 0; i < numbers.length; i++)
        if(numbers[i] >= start && numbers[i] <= end)
            ++count;
    return count;
}

static int getDuplication(int[] numbers)
{
    int start = 1;
    while(end >= start) {
        int end = numbers.length;
        int middle = ((end - start) >> 1) + start;
        int count = countRange(numbers, start, middle);
        if(end == start) {
            if(count > 1)
                return start;
            else
        break;
    }

    if(count > (middle - start + 1))
        end = middle;
    else
        start = middle + 1;
    }
    return -1;
}