I have a hypothetical set of data with 3 columns that has monthly profit data for a set of widget machines. I am trying to figure out the maximum profit period within a 2-year span.

The 3 columns are as follows:

**name**: identifier of widget machine (there are maybe 100 of these)

**date**: month/year over a 2 year period

**profit**: dollars made from widgets that month (can be negative if costs exceed revenue)

The *maximum profit period* is a concurrent set of months at least 3 months long (and could encompass all of the data).

Obviously I could brute force this and simply test every combination: Jan-Mar, Jan-Apr, Jan-May, Feb-Apr, etc. but I am looking for a better solution than creating all of these by hand. It seems like the data is a bit too big to want to transpose across and turn months into columns so I would like to be able to operate on a stacked dataset as described.

I'd prefer a sas data step but an sql query that works in proc SQL would be fine as well (but the sets of subqueries that might be required are beyond my ability).

**Example Data:**

```
data max(drop=dt);
length name dt $50;
infile datalines delimiter=',';
input name $ dt profit;
date=input(dt,mmddyy10.);
format date mmddyy10.;
datalines;
Widget1,01/01/2011,1000
Widget1,02/01/2011,2000
Widget1,03/01/2011,500
Widget2,01/01/2011,100
Widget2,02/01/2011,200
Widget2,03/01/2011,-50
Widget2,04/01/2011,250
Widget2,05/01/2011,-150
Widget2,06/01/2011,-250
Widget2,07/01/2011,400
Widget2,08/01/2011,0
Widget2,03/01/2011,-200
;
```

Maybe a better phrasing of the question would be "How do I come up with all possible consecutive combinations of values?" From a query like that, I could then take the max of combinations where # of values >= 3.

The query would build up every combination of sequential rows in the table, drop those where there are less than 3 rows, and then return the max value (grouped by Widget# of course). I suppose it would also helpful to know the starting and ending row for each combination. I'm trying to work out how this would be done in an SQL query (doesn't sound like a sas datastep to my mind)

**Python Sample:**

Here is a sample with some made up data that I wrote in Python. It is not the most efficient thing but it gets the sort of result I am looking for--I just can't figure out how to replicate it in SQL or SAS:

```
from itertools import groupby
data = []
data.append(['Widget1','Jan',5])
data.append(['Widget1','Feb',1])
data.append(['Widget1','Mar',-2])
data.append(['Widget1','Apr',0])
data.append(['Widget1','May',-3])
data.append(['Widget1','Jun',8])
data.append(['Widget1','Jul',-2])
data.append(['Widget1','Aug',1])
data.append(['Widget2','Jan',-1])
data.append(['Widget2','Feb',1])
data.append(['Widget2','Mar',-3])
data.append(['Widget2','Apr',1])
data.append(['Widget2','May',-60])
data.append(['Widget2','Jun',9])
data.append(['Widget2','Jul',-2])
data.append(['Widget2','Aug',20])
results = []
for key, group in groupby(data, lambda g: g[0]):
max = -999999
for i,v in enumerate(data):
if key <> v[0]:
continue
runningtotal = 0
for j,w in enumerate(data):
if key <> w[0]:
continue
if i <= j:
runningtotal = runningtotal + w[2]
if i+2 <= j and runningtotal > max:
max = runningtotal
maxstart = v[1]
maxend = w[1]
results.append([key, maxstart, maxend, max])
print results
```

This gives me the result of [['Widget1', 'Jan', 'Jun', 9], ['Widget2', 'Jun', 'Aug', 27]] for the fake python data I made.

Your core problem appears to be that you see combinatorially many periods, but you want a solution that doesn't require a combinatorial amount of work.

Luckily for you, if you have N months, you can solve this problem in O(N^2) time with O(N) space.

The trick is that you don't actually need to save all the period's values; you only need the maximal one. So let's break the big problem down into smaller chunks.

First, create two arrays of length N and fill them with zeroes. Now, you read in the first month and (if it earned a profit) put it in the first cell of each - it's the "best run of length 1" and also the "current run of length 1". If it's negative, leave the "best" at zero, but fill the "current" cell anyhow.

Next you read in the second month. If the second month earned more profit than the first, you replace the first cell of each array with the value of the second month (otherwise just replace the "current" one but leave the "best" alone). Then, if the first month plus the second month nets positive, put that value in the second cell of each - that's the "longest run of length 2" and the "current run of length 2" - the current-run-of-two plus the most recent cell is the current-run-of-three.

When you read in the third month, things start to get interesting. First you check the first cell - if the third month is greater than the value currently there, replace it. Next you check the second cell. If adding the third month *and subtracting the first* would make that value greater, do it. Otherwise just put it in the "current" array, but not the "best" array. Finally, populate the third cells with the "current run of length 2" value plus the third cell.

Continue on in this fashion. When you reach row i, you have the current runs having length 1..i stored, along with the best of each length so far.

When you reach the end of the array, you can discard the "current" values and just take the max of the "best" array!

Because this requires 1+2+3+...+N operations, it's O(N^2). Only one pass through the input data is necessary, and the storage is 2N, which is O(N). If you wish to know *which* period was most profitable, just store the cell that begins the run as well as the run's sum.

The data step (and Proc SQL) will read the data sequentially, however to mimic some of the functionality of the array solution from other programming languages, you can use the LAG function to look at previous observations.

If you sort your data by NAME and DATE, then you can use the BY statement in your data step and have access to FIRST. and LAST. to know when the NAME has changed.

Once you work out the algorithm, which obviously is the hardest part and which I don't have yet, you can OUTPUT each profit total per date sequence, then sort the new data set by name and profit total which should put the highest total at the beginning of the BY group (First.Name).

Maybe this will spur some additional ideas from you or other SAS programmers.

I think I have a working method here. A combination of a proc SQL cross join and a quick data step seems to give everything I want (though it could probably be done in a single big SQL query). Here it is on the sample data from my python example.

```
data one;
length name $50;
infile datalines delimiter=',';
input name $ dt profit;
datalines;
Widget1,1,5
Widget1,2,1
Widget1,3,-2
Widget1,4,0
Widget1,5,-3
Widget1,6,8
Widget1,7,-2
Widget1,8,1
Widget2,1,-1
Widget2,2,1
Widget2,3,-3
Widget2,4,1
Widget2,5,-60
Widget2,6,9
Widget2,7,-2
Widget2,8,20
;
proc sql;
create table two as
select a.name, a.dt as start, b.dt as end, b.profit
from one as a cross join one as b
where start <= end and a.name = b.name
order by name, start, end;
quit;
run;
data two; set two;
by name start;
if first.start then sum=0;
sum+profit;
months = (end-start)+1;
run;
proc means noprint data=two(where=(months>=3));
by name;
output out=three(drop=_:) maxid(sum(start) sum(end))=start end max(sum)=;
run;
```

Making it operate on date constructs instead of numbered months would be trivial (just change the 'months' variable to be based on actual dates).

Similar Questions

I'm wanting to issue the find command in Perl and loop through the resulting file paths. I'm trying it like so (but not having any luck): my $cmd; open($cmd, '-|', 'find $input_dir -name *.fastq.gz

This question builds upon this question: R: Calculate moving maximum slope by week accounting for factors My question: The code pasted below calculates maximum slope over a 7-day period using length(H

I'm looking for a command in VI/VIM to search for particular text in the file and grab the whole line and output in either on the screen or separate file. e.g. This is some line with this _word_ and s

Is there a way to find the maximum number of promotes by a user in a stream , in Accurev? Some information like User - Count# Of Promotes

AppEngine shut down my application for going over its daily usage quota. However, I believe I had plenty of discounted instance hours left and, if they had been applied, I would not have been over my

I'm running passenger with REE on a Rackspace cloud server. Is there any way to find out what is the maximum concurrent processes Passenger can create/handle with the provided infrastructure/hardware?

I am curious whether it is possible to determine the maximum size that an array can have in C++. #include <iostream> using namespace std; #define MAX 2000000 int main() { long array[MAX]; cout

I have a large file with 22 lakhs rows. Value Label 4 1 6 1 2 2 6 2 3 2 5 3 8 3 7 3 1 4 5 4 2 5 4 5 1 5 I want to know the fastest way to get following output, where 'Max' stores the maximum value in

I'm going to put all decimal numbers in a text inside a span tag (<span>) but the numbers are not using period as decimal separator, they use slash (/) The sample text is something like this: Th

What command can I use to find and set plot window to maximum available, i.e. one that will produce a plot window as one gets after clicking on maximize button of top bar? I searched and found that w

I was looking for the maximum memory that Memcached can handle and how can I configure this property if there is such an option. So far, only thing I could find was the maximum size of the object that

$ find . -name some_file ./some_file ./dir1/some_file ./dir2/some_file ./some_file ./dir1/some_file ./dir2/some_file Why? Using CentOS release 5.7 (Final) Linux version 3.0.52

I'm trying to get the total amount of Milliseconds (not the millis field) from the Period object instance. I've tried multiple conversions, as I couldn't find any method easily giving it. Has anyone e

Hi I'm looking into the issues to expect if the Flash Player (version 10) is run over a long period of time, say 24+ hours. I know that the player has issues with not performing garbage collection pro

If I do this: void printfloat(float number) { printf(%f, number); } and void printdouble(double number) { printf(%f, number); } What is the maximum number of characters that can be output by eac

I just wrote a chunk to find out the largest spike in cross-correlation values, but when I ran the loop code there is no output at all. So I just want to ask if there's something wrong with my logic i

i have one list.i added the more then int value in my list.foreach using i get the all int value.but,i want maximum integer value in my list.

I need your help to solve a problem. This is part of a coding exercise, that I couldn't solve completely. Imagine we have the following graph : I need to build a class that calculates the maximum len

I need to find value in b_id which repeats for maximum time. for example query for the table below should return 40 (n.b., query should return a single value) | b_id | s_id | doi | dos | charge | +--

I am working on Ubuntu 11.04. How do I find out the maximum call stack size of a process and also the size of each frame of the stack?

I am writing a program that finds prime numbers up to a specified limit. I have tried: Sub Main() Console.WriteLine(Enter the maximum) Dim primes As List(Of Integer) = New List(Of Integer) Dim m As

This question already has an answer here: Get an OutputStream into a String 6 answers I think this has been answered but I can't seem to find it. I have an instance method which writes some con

I have a numpy.ndarray in which the maximum value will mostly occur more than once. EDIT: This is subtly different from numpy.argmax: how to get the index corresponding to the *last* occurrence, in ca

I have a list: a = [32, 37, 28, 30, 37, 25, 27, 24, 35, 55, 23, 31, 55, 21, 40, 18, 50, 35, 41, 49, 37, 19, 40, 41, 31] max element is 55 (two elements on position 9 and 12) I need to find on which p

I am a algorithm beginner.I just worked out a solution to recursively find the maximum elements in an array of integers: public static int findMax(int[]arr) { if(arr.length==1) return arr[0]; return

I need to find the maximum among values with same labels, in matlab, and I am trying to avoid using for loops. Specifically, I have an array L of labels and an array V of values, same size. I need to

In SQL Azure each database has a size limitation (adjustable). In order for my service to not suddenly come to a halt I'd like to be able to programmatically find the current maximum size and current

How to find maximum and minimum value from the below table using awk command. 20 90 60 30 55 75 80 85 10 15 99 95 55 95 70 20 9 35 85 75 I want the output like max value=99 and min=9

Not so long ago I found a way of sending a DOM representation of an XML document out over an HTTP connection from a Servlet. AFAIK the DOM 3 LS (LoadStore) is needed for this, but the thread on StackO

I am having a hard time figuring out how to strip the last period from a hostname ... current output: domain.com. suddomain.com. domain.com. subdomain.subdomain.com. subdomain.com. desired output:

In Python, I have a list of 10 numbers. I know that I can find the maximum value in the list by doing something like: max = numpy.max(list) I also have a list of indices that I don't want to include

Maximum, minimum and total numbers using python. For example: >>>maxmin() Enter integers, one per line, terminated by -10 : 2 1 3 7 8 -10 Output : total =5, minimum=1, maximum = 8 Here is my

I would like to show a maximum movement overlay in my hex map. For example: Center point is at 50,50 Maximum allowed movement is 5 hexes. This is the code I use for overlay: for (int height = lowHeigh

For a reservation system there is an inventory table and each item has a quantity (e.g. there are 20 chairs). Now the user can make a reservation for a specific period (e.g. 5 chairs for two hours 20

I'd like to store an scrypt-hashed password in a database. What is the maximum length I can expect?

I want to find top 2 customers with maximum orders. The table looks like: CustomerId OrderId ProductId 101 1 A 101 3 B 101 4 C 102 9 D 102 9 E 103 11 E 103 22 F This is the output that I need from SE

I have a short R script which plots a few histograms using ggplot2. How can I automatically set the ymax limit in the histogram based on the maximum frequency in the histogram (plus 10%) i.e scale_y_c

I'm trying to calculate maximum simultaneous calls. My query, which I believe to be accurate, takes way too long given ~250,000 rows. The cdrs table looks like this: +---------------+-----------------

Given a set of points in 2d-space P, where Pi = (Xi, Yi), I need to find a target point T such that the maximum distance to any Pi is minimized. T does not need to exist in P, and can be defined arbit

For a given binary tree, find the largest subtree which is also binary search tree? Example: Input: 10 / \ 50 150 / \ / \ 25 75 200 20 / \ / \ / \ / \ 15 35 65 30 120 135 155 250 Output: 50 / \ 25

I need to return the first value in an array that is greater than another particular value. I have: find(A > val, 1, 'first') According to this post: http://stackoverflow.com/a/9464886/1985603 fin

Almost the same as this: find maximum sum of elements in an array such that not more than k elements are adjacent except there is a limit of n elements we can choose. How to modify the DP algorithm to

So I've got a MySQL database that counts the number of servers with given usage levels on an hourly basis, as such: Date Status Population 2012-01-13 15:33:40 UP Standard 2012-01-13 15:33:40 UP Light

As the question title suggests, I want to find the table having maximum number of rows (entries) in a particular database. I have been able to extract the names of all the tables in a particular datab

How would I find the maximum value in a specific column in a Flex DataGrid?

I have a file with a long list of integers: 10 4 66 .... I want to find the maximum value using UNIX command line tools. I know I can use sort (and indeed there are solutions to this problem on SO th

As I am trying to connect the VLC Python Bindings with ffmpeg (see Exchange data between ffmpeg and video player) I thought that making ffmpeg to output the RTSP stream to STDOUT and catching it wit

In my app I want to count for how long the accelerometer is over a certain threshold, but i'm unsure of how to do this. The sensitivity of my accelerometer is set to 'SensorManager.SENSOR_DELAY_GAME'

How can I ascertain from getdate() in SQL Server 2005 if the current time is between certain time period. If it is then have to set flag as one else two. something like getdate() timeperiod between

How to find increasing subsequence of numbers with maximum sum. I find O(N^2) but I want to know O(N log N). Thanks!