techInterview Discussion

A part of techInterview.org: answers to technical interview questions.

Your host: Michael Pryor

substring of zero and one [Amazon]

Given and Array of A[1..n] bits, find the length of longest consecutive substring of 0 and 1 such that number of 0s and 1s are equal
rk Send private email
Saturday, December 26, 2009
 
 
Hard mode: don't write to the array and use O(1) words of extra space.
d Send private email
Sunday, December 27, 2009
 
 
Hi,

A linear sweep through the array would do it.

1. XOR neighbors, maintain two counters;
2. if (XOR result is 0 AND a[i] is 0)
        increment zeroCount;
    else if (XOR result is 0 AND a[i] is 1)
        increment oneCount;
3. return min(zeroCount, oneCount);

Thanks.
Rajendra Uppal Send private email
Sunday, December 27, 2009
 
 
So neither zeroCount nor oneCount gets incremented on (01)^n?
d Send private email
Sunday, December 27, 2009
 
 
guys...please post an example explaining the problem !!
Imran Shaikh Send private email
Sunday, December 27, 2009
 
 
Call a string /zero-one balanced/ if it has the same number of zeros and ones.

Examples of zero-one balanced strings:

"", "01", "10", "0011", "0101", "0110", "1001", "1010", "1100", ...

Examples of strings that are not zero-one balanced:

"0", "1", "00", "11", "000", "001", "010", "011", "100", "101", "110", "111", ...

A string y is a substring of a string w if there exist strings x and z such that w == x + y + z (where + denotes string concatenation). The problem is to find the length of the longest zero-one balanced substring of the input.

Examples:

"01" => 2
"001" => 2 ("01")
"0011" => 4
"10011" => 4 ("1001" or "0011")
"11111" => 0 ("")
d Send private email
Sunday, December 27, 2009
 
 
Have I misinterpreted the problem?

As per OP:
find the length of longest consecutive substring of 0 and 1 such that number of 0s and 1s are equal.

Didn't OP wanted to ask following?
example: "10010111" return 2 as length of longest consecutive substring is 2 where 0's and 1's are equal.

@d
Given your example and explanation, the problem seems different.

I guess in my above mentioned example, as d says, we need to find length of "01" which is also 2(coincidence), if yes, then ya I have misinterpreted the problem. Am I correct d?

will be back soon.
Rajendra Uppal Send private email
Sunday, December 27, 2009
 
 
Thank you very much d!

From the problem definition you posted, I could think of a O(n) algorithm.
Sweep through the input maintaining the zero-one balance information.
Balance is defined as follows:
balance = 0 (initialization)
If we find a '1' in the input increment by 1, else decrement by 1(we found a zero).

balance = 0;
max_length = 0;

for(int i=0; i<input(length); i++) {

if (input[i] == 1)
balance = balance + 1;
else
balance = balance - 1;

if(balance ==0) {
max_length++;
}

}

max_length has the answer.
Imran Shaikh Send private email
Monday, December 28, 2009
 
 
If d says it is hard, it must be really hard.

Think again people.
Viladge Idi@
Monday, December 28, 2009
 
 
@Imran

Let's go through your algorithm for "10000111", as per d's explanation answer should be 6, right?

if (input[i] == 1)
balance = balance + 1;
else
balance = balance - 1;

if(balance ==0) {
max_length++;
}

a[0]=1 balance = 1 max_length = 0
a[1]=0 balance = 0 max_length = 1
a[2]=0 balance = -1 max_length = 1
a[3]=0 balance = -2 max_length = 1
a[4]=0 balance = -3 max_length = 1
a[5]=1 balance = -2 max_length = 1
a[6]=1 balance = -1 max_length = 1
a[7]=1 balance = 0 max_length = 2

your algorithm returns 2 or 4, if you multiply the result by 2 considering balancing.

Shouldn't the answer be 3 or 6.

Please correct me, if I have assumed anything wrong here.

Thanks
Rajendra Uppal Send private email
Monday, December 28, 2009
 
 
I don't know about "really hard", but it's harder than bucket sorting the prefix sums of the array with -1s replacing 0s.
d Send private email
Monday, December 28, 2009
 
 
Now that I think about it some more, I see that I missed a case. "Really hard" might mean "impossible" here =P
d Send private email
Monday, December 28, 2009
 
 
sweep the string twice forward and backward, for example:
111010011110

this string has 8 1s and 4 0s, it means the max substring <= 4
it also means there are 4 more 1 than 0

if you can find a consecutive substring in which 1 is four more than 0  from left or from right or both side, then you'll get the max substring

e.g.
111010011110  ->  111  010011  110
"111" + "110" has 5 1s and 1 0s, then "010011" is the max substring

now the question became easer, and it can be solved in O(n) time, but i'm afraid O(1) extra space is not enough for the worst case :)
yao Send private email
Monday, December 28, 2009
 
 
Rajendra,

You are right! I should have multiplied the length by 2 at the end of the algorithm. Thanks for pointing it out.

d!
Can you explain why do you think it is impossible/hard??
Imran Shaikh Send private email
Monday, December 28, 2009
 
 
Thanks Rajendra,

Now I understand, my algorithm doesn't work. Would come up with another way.
Imran Shaikh Send private email
Monday, December 28, 2009
 
 
Imran, I don't have a good reason for thinking that O(1) space is hard/impossible. It just "feels like" extra space is needed to navigate the array efficiently, based on my experience with these kinds of problems.
d Send private email
Tuesday, December 29, 2009
 
 
Hello, everyone. I think I know the answer to this algorithm.

This problem is similar to another problem. Find a maximum sub-array in a given array whose sum is 0.

When given a 01 series, we can view 0 as -1. My algorithm works as follows, Suppose the array is

A: 0 0 1 1 0 1 0 1

Create another array B, each element of B contains two parts, idx and sum. idx is the index in the array i and sum is the sum of A[1..n].

Then do a sorting on B.sum. The last part is scan array B find two element there sum is the same and difference on idx is largest.

The whole time complexity is o(nlgn) and space is o(n)
Xinyao Send private email
Thursday, December 31, 2009
 
 
If you are using O(n) space, you can do it in O(n) time.
Town Dunce
Thursday, December 31, 2009
 
 
You can actually do it in O(n) time using O(n) bits (which would normally count as O(n/log n) space).
A.F.
Saturday, January 02, 2010
 
 
Care to post your algorithm, A.F?
Town Dunce
Saturday, January 02, 2010
 
 
I don't have time for a fully detailed description, so I'll give a somewhat sketchy one.

Let us call a substring with equal numbers of 0s and 1s "balanced".

Let's assume array indexing starts at 1.

Define the following function:
f(i) = [number of 1s in a[1..i] ] - [number of 0s in a[1..i] ].
For convenience, also define f(0)=0.
Then a substring a[i..j] is balanced iff f(i-1)=f(j).

A substring starting at position i is of one of three types:
Type 1:  f(i-1) = f(n).
Type 2:  f(i-1) < f(n).
Type 3:  f(i-1) > f(n).
The basic idea is to find the maximum length balanced substring of each type and take the maximum among them.

Finding  the maximum-length substring of Type 1 is straightforward in O(n) time and O(1) space (i.e., O(log n) bits).

Types 2 and 3 are symmetrical, so I'll only describe the algorithm for Type 2.

The key notion is "suffix minimum".  A number 1<=i<=n is said to be a "suffix minimum" if f(i)<f(k) for all i-1<=k<=n.  (Yes, k ranges from i-1 to n, not from i; "i-1" is not a typo.)  Note that if i is a suffix minimum then i<n and f(i)<f(n).  Also note that if i1 and i2 are suffix minima such that i1<i2, then f(i1)<f(i2).

Consider i>=1 such that f(i-1)<f(n).  Let m be maximum such that m>i, m is a suffix minimum, and f(m)<=f(i-1).  Assuming m exists, the longest balanced substring starting at position i is the one that ends at the unique position j>=m such that f(j)=f(i-1).  (If you think about it for a moment you will see that by the definition of local minimum and the fact that f(i-1)<f(n) there is indeed a unique such j.)  Moreover, f() strictly increases in the interval m..j.

What if m does not exist?  Then (because f(i-1)<f(n)) there is no nonempty balanced substring starting at i.

To the algorithm.  The first order of business is a preprocessing step to compute two things.
1.  The value f(n).  This takes linear time and O(1) space (i.e.,
    O(log n) bits).
2.  An array suf_min[1..n] of bits, such that suf_min[i]=1 iff i is
    a suffix minimum.  This can be done in linear time using
    O(1) additional space in a single pass over a[] from a[n]
    down to a[1].

Next we wish to find the minimum i such that there exists a balanced substring of Type 2 starting at i.  We also need to find the corresponding m and j for this i, and compute f(i), f(j), and f(m).  (Of course, If no such i exists we are done.)  This is not difficult to do in O(n) time and O(1) extra space.

In general, we will only be incrementing i, j, and m, so updating the corresponding f() values can be done in O(1) per increment.  I will therefore assume that the f() values are known whenever needed.

We now iterate.  Suppose we have found i, m, and j such that f(i-1)<f(n), m>i, m is the last suffix minimum such that f(m)<=f(i-1), j>=m, and f(j)=f(i-1).  We proceed to the next i such that f(i-1)<f(n).  It is easy to see that if the new f(i-1) is not greater than the previous one, then it cannot be the starting point of a balanced substring longer than the one found previously.  So in this case we just go on to the next candidate.
Otherwise, the new m is either the old m, or appears after it.  If it is the same as the old m, we just need to search forward with j.  Otherwise we find the correct new m with a forward search, and then the new j with a forward search from the new m.

I've glossed over some details (in particular what to do when the new m does not exist), but it should be clear that the whole thing can be done in linear time using O(1) extra space. 

So in total we use an n-bit array + a constant number of variables (i.e., additional O(log n) bits) and solve the problem in linear time.
A.F.
Sunday, January 03, 2010
 
 
Little mistake.  In the definition of suffix minimum, k ranges from i-1 to n, excluding i, of course.
A.F.
Sunday, January 03, 2010
 
 
Hi A.F,

the following:

"An array suf_min[1..n] of bits, such that suf_min[i]=1 iff i is
    a suffix minimum.  This can be done in linear time using
    O(1) additional space in a single pass over a[] from a[n]
    down to a[1].
"

Can you please explain how? Sorry if this is obvious.

Thanks.
Confused
Sunday, January 03, 2010
 
 
// C++ code
// Array indexing still starts at 1 (so the input is in A[1]..A[n]
// and the output is in suf_min[1]..suf_min[n]).

for(int i=n-1, prevF=0, minF=0; i >= 0; i--){
    int currF = prevF - (A[i+1] ? 1 : -1);
    if(currF > prevF && prevF < minF){
          suf_min[i+1] = 1;
          minF = currF;
    }
    else
          suf_min[i+1] = 0;

    prevF = currF;
}

Note that in computing the values of f() backwards it doesn't matter if we start with the true value f(n) or any arbitrarily chosen value (such as 0, which I used in the code above).  The suffix minima occur in the same places.
A.F.
Monday, January 04, 2010
 
 
*Smacks head*
Confused
Monday, January 04, 2010
 
 
I was able to get the memory consumption down to O(sqrt(n/log n)) (i.e., sqrt(n log n) bits) but it certainly doesn't look like the "right" answer.

The idea is this.  The reason my previous algorithm needed n bits was that it had to compute the suffix minima from right to left, but use them in the left-to-right direction.  The point is, however,  that it is not necessary to remember all of them explicitly.  Instead, we can compute them on demand, if we remember enough precomputed information to make this possible.

We partition A[] into blocks of length sqrt(n log n).  For each block we compute and store the minimum value that f() attains in the suffix of the array beginning right after the end of the block.  This can be done in one forward pass (to compute f(n)), followed by a single backward pass.  Since there are
n/sqrt(n log n) = sqrt(n / log n)
blocks, that is also the space we need to keep these values.

Following this preprocessing step, we proceed with the algorithm as before, except that every so often we need to compute a batch of suffix minima to be used next.  Specifically, each time the search for m enters a new block, we perform the following mini-preprocessing step, which finds and stores the suffix minima occurring in that block.  We first compute the value of f() at the block end by a forward scan of the block.  Then, using this value and the minimum value of f() in the suffix of A[] beginning right after the end of the block (which the algorithm's preprocessing stage has computed and stored), we find all the suffix minima occurring in the block by a backward scan of the block and store them in a sqrt(n log n) long array of bits.  We use this bit array to locate the suffix minima needed while the search proceeds within the block.  We re-use this same bit array over and over for all of the blocks, so the total memory consumption is O(sqrt(n log n)), as promised.  The time complexity is of course O(n).
A.F.
Tuesday, January 05, 2010
 
 
d Send private email
Tuesday, January 05, 2010
 
 
10000111

x-no. of 1's = 0 (Initialized)
y-no. of 0's = 0 (Initialized)
max = 0 (var maintains max balance)

scan each digit
1 ( x =1 , y= 0, max =0)
0 => x=1, y=1
0
0
0 => x=1, y=4
Since the next digit is 1 again (starting digit)
do the following
max = Max( max, Min(x,y))
        = Max ( 0, Min(1,4))
        = 1
Reset value of x. x=0
1 => x=1
1
1 => x=3
Since this is End of seq/ or start of 0
now max = Max(max, Min(x,y))
                  = Max(1,Min(4,3))
                = 3
Now the ans is 3*2 = 6.
Done in linear time with o(1) space.

For subsequent changing from 0 to 1, reset x
and from 1 to 0, reset y. So max holds the answer.
Iyyappa Send private email
Tuesday, January 12, 2010
 
 
Isn't the answer 8 for there are 4 ones and 4 zeros in your example?
D
Tuesday, January 19, 2010
 
 
I think this algorithm is looking for a substring of the form (0^n)(1^n), which is an easier problem.
d Send private email
Tuesday, January 19, 2010
 
 
I think of a way in O(n) with O(n) space

1. For each index i, save the sum(1)-sum(0) to another array B. Sum subarray 0 : i.
2. Go through B, get each distinct value's min and max index
3. The biggest max - min is the result.

Sample:
Input: 1 0  0  0  0  1  1  1  0  0  1  0  1
B:        1 0 -1 -2 -3 -2 -1 0 -1 -2 -1 -2 -1

1: Min = Max = 0
0: Min = 1; Max = 7
-1: Min = 2; Max = 12
-2: Min = 3; Max = 11
-3: Min = Max = 4

The result is 10 :  0  0  1  1  1  0  0  1  0  1
Calvin Send private email
Friday, February 19, 2010
 
 
One more thing:
In step 1, you don't need to do sum actually. Just +1 when '1' is encountered; -1 in the other case.
Calvin Send private email
Friday, February 19, 2010
 
 

This topic is archived. No further replies will be accepted.

Other recent topics Other recent topics
 
Powered by FogBugz