Save The World!: 2010

Saturday, May 22, 2010

What I will write about

Optimal number of sleeping servers vs icdf percentile.
added up distribution (how it's different... it's not).

Where will I put this new information. The part of adding up the distribution could go at the end of the paper I was writing. It seems to go well there.

Sleeping servers could can go in its own chapter maybe?

Wednesday, May 19, 2010

What I'm going to do

One thing for sure is use a percentile of a convolved distribution. This is very clear.

Another thing I can do is at least simulate what happens when I have sleeping servers.

I'm not sure if I'm going to be able to run something for real with sleeping servers, as running something will entail live migration, which I'm not going to be able to do.

Monday, May 17, 2010

Power

Suspend takes about 2.67 watts / second.
idle takes about 46 watts.
1 cpu takes 58 watts.
2 cpus take 67
3 cpus take 75
4 cpus take 83 watts

y=9.1x+47.6

Friday, May 14, 2010

Before I forget

Utility of Sleep

Use 75th percentile of convolved distribution

use sample from lower distribution, add up and use that 75th percentile

Wednesday, April 21, 2010

No Optimization?

I was having a weird problem where my thing was not able to optimize. It was so weird. For high cdf values, it just couldn't improve the solution. I think after toying around with it today, I figured it out. when you increase the cdf value past a certain point, and the items were made to be 3 per bin, then the items have to be 2 per bin because they're all too big for the 3. So, you get a point where you can't optimize.

Tuesday, April 13, 2010

Pack Seperately

The pack seperately idea did not do as well as we had hoped for one test that I ran.

Stuff I'm getting done today

I redid some of the RGGA paper. I don't feel like it's in full form, but I put content in there.

The only thing that has changed is the introduction, the last part of the results and the conclusions.

Friday, April 9, 2010

Things to do

1) Gecco Paper
a) Introduction
b) Return to power & show actual power used
c) Time to run
2) Pack tight things Seperate
3) Start thinking about adding distributions and integrating that into the fitness function.

Thursday, April 8, 2010

more merging

Here's one little quantile above the last graph.

Here's one little quantile less than the last graph.

A merged graph

Here's a graph kind of in the middle.

A merged graph

Here's a graph kind of in the middle.

Wednesday, April 7, 2010

A better graph.

I produced this.

I'd like to know why my graphs are all so different...

Monday, April 5, 2010

Yet another graph.

This is a graph of a two-resource problem with most servers being able to fit three vms.

The nice thing about this graph is that for very small values of percent of servers over capacity, Evolve and RepackEvolve both do very well. RepackEvolve does not perform much better than Evolve.

Another graph

This is another data set. As we can see, repacking offers a few benefits, but it's not as nice as the other graph.

This graph is nice, but not as nice as the one posted the other day.

Friday, April 2, 2010

Hitting a High Fittness on My Evolutionary Programming Approach

So, today, after doing some twiddling and tweaking, I hit a point on my evolutionary fitness with my program that I'm rather happy with.

This was done on a different data set with a different variance. There was quite a bit different for these results than for the results I published in the paper. However, if these results are good, then there are some data sets for which Evolve does much better for some percent of solutions infeasible.

Thursday, March 25, 2010

My findings today

So. I generated this graph: http://aml.cs.byu.edu/~davidw/over_v_real_num_servers.pdf

The graph shows that for all of the algorithms that I tried, they all performed similarly. In fact, there's really no incentive to use one algorithm over another.

I'm at the stage of my research where I'd like to fix that.

I tried an idea that Kevin and I had of one stage of the algorithm that finds the correct number of solutions and another stage of the algorithm that spreads out the solution. I carried out a similar GA after finding the first preliminary solution whose purpose it was to spread out the solution. The second GA takes the solutions that the first GA found, and just tries to spread out the items in the bins. I changed the fitness of bins so that instead of preferring one really full bin and one not-so-full bin, it prefers two bins which are both semi-full.

My findings were interesting. I was getting a bit discouraged at first because the second stage was not improving solutions. However, I increased the problem set size, and vwala, the second stage saw improvement. For large problems, it gains more out of the optimization of the second step. This means that the conversation that Kevin and I were having about the problem being too simple to optimize holds rather true. Now, I just need to find some way to show this on real VMs. :)

Monday, March 1, 2010

What I got done Today

Today, I did quite a bit of reading about enterprise computing papers. It really seems like the exact topic that we're addressing has not been addressed in any other papers. There are similar topics in other papers (predicting load on xen applications given the load of the application when it's not running on a hypervisor, making a table to predict loads of virtual machines throughout different days, etc).

I'd like to start writing an enterprise computing paper soon (start tonight / tomorrowish?). I should take one of those papers that I was reading and use that as a baseline. Then I can start moving and changing things in that paper.

Goals for this week

Develop Energy Measurements

Be able to run experiment on potatoes, get good usage out of potatoes, and summarize usage

Create an implementation of Naive Bayes

Integrate MLP with submission

Make at least one more good idea with submission

Do reading for Data Mining

Prepare well for Matt's Wednesday Meeting

LOLP and LOLE continued

LOLE is the expected number of days per year for which available generating capacity is insufficient to serve the daily peak demand or the hours per year where capacity is insufficient to serve hourly load

LOLE is measured in days/year when it represents a comparison between daily peak values and available generation

LOLE is measured in hours/year when it represents a comparison of hourly load to available generation

LOLP is the proportion in % (probability) of days per year, hours per year, or events per season that available generating capacity/energy is insufficient to serve the daily peak or hourly demand

LOLE & LOLP are methodologies that use probabilistic methods to capture the effect of uncertain parameters such as forced outages, unusual load conditions or hydro conditions on the ability of deliverable generation to meet load; the other major approach is to use deterministic methods and perform scenario analyses

LOLP and LOLE

LOLP (Loss of Load Probability) is the probability that generation will be insufficient to meet demand at some point over some specific time window. Check out http://www.nwcouncil.org/energy/powersupply/presentation1999_1208/sld013.htm.

LOLE (Loss of Load Expectation) is a measure of how long, on average, the
available capacity is likely to fall short of the demand. LOLE is a statistical measure
of the likelihood of failure and does not quantify the extent to which supply fails to
meet demand. LOLE is the expected number of days in the year when the daily peak demand exceeds the available generating capacity. It is obtained by calculating the probability of daily peak demand exceeding the available capacity for each day and adding these probabilities for all the days in the year. The index is referred to as Hourly Loss-of-Load-Expectation if hourly demands are used in the calculations instead of daily peak demands. LOLE also is commonly referred to as Loss-of-Load-Probability.

Wednesday, February 24, 2010

goals for today

work more on my thesis presentation

work on getting my experiment runner a bit more reliable

create a run summarizer that can summarize different runs for me.

last night

i worked on automating my experiments. it's still very finicky, but they run and log information about the server as the vms are running.

Tuesday, February 23, 2010

Got Done Today

Data Generator

Read the papers

Thesis Proposal

I will see about creating preliminary graphs of what I want to see with vm metrics right now.

Goals

Today, I'd like to get done:

data generator for cs 676

read and understand how Modeling Workloads and Devices for IO load Balancing in Virtualized Environments and Modeling Virtual Machine Performance: Challenges and Approaches

graph vm experiment data.

Edit Thesis Presentation