Odysseus Benchmarking

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Odysseus Benchmarking

Post by Marco Grawunder » Fri Jan 27, 2017 11:04 am

Now, the Keyvalue based Map and Select also allows reading of meta data ...

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Odysseus Benchmarking

Post by stefan » Sat Jan 28, 2017 6:57 pm

Hi.

I am working on the benchmarking for my big query and noticed one issue that I dont fully understand:

My processing consists mainly of two streams: nodes and relations.
After reading the nodes I added a datarate operator, also after reading the relations. Then I join them and do some further processing (incl. windows, joins, merges, aggregates etc.). At the end I added again a datarate operator. If I look at the sink, I can only see the last datarate. I assume that the datarate is ignored/deleted each time I change the overall structure of the stream (not only changing some attributes), e. g. JOINs, Aggregates, etc. Is that correct or did I something wrong?

If so, whats the best practice getting different data rates? At the moment I just write the intermediate result directly after the datarate operator in a csv file and go on. I think I will make the metadata explicit, i. e. I will add new attributes for the tuples that include the metadata. This will save some time during evaluation.

greetings,
Stefan

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Odysseus Benchmarking

Post by Marco Grawunder » Sun Jan 29, 2017 1:42 pm

The datarate operators measures the data rate at its position in the stream and stores the current rate as metadata. To allow different measurements inside a stream you can use different keys for the data rate. So each measurement point should get another key

drate = DATARATE({UPDATERATE = 1000, KEY='MEASUREMENT_1'}, previousOperator1)
...
drate2 = DATARATE({UPDATERATE = 1000, KEY='MEASUREMENT_2'}, previousOperator2)

Greetings,

Marco

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Odysseus Benchmarking

Post by stefan » Sun Jan 29, 2017 2:14 pm

Hi Marco,

sure, I already did that in my smaller query and it worked fine. But in the bigger query I dont get all Datarates in my Output. I thought this is maybe because of JOINs etc. Is this not the case?

Example of what I am doing:
- Reading Source1 (with parameter METAATTRIBUTE)
- Calculate the Datarate of Source 1

- Reading Source2 (with parameter METAATTRIBUTE)
- Calculate the Datarate of Source 2

- JOIN Source 1 and Source 2
- Calculate the Datarate of JOIN

If I now write the joined stream to a csv sink (with the metadata) it includes only the last data rate.

And another question:
If I want to use a stream that already includes some data rates and the calculated latency, can I aggregate this stream and execute the calcLatency operator to get the latency for a higher view? Or do I have to change the structure, that the aggregation takes place before the calcLatency operator?

greetings,
Stefan

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Odysseus Benchmarking

Post by Marco Grawunder » Sun Jan 29, 2017 2:36 pm

Hi Stefan,

please make a pull ;-)

The join always merges meta data from its inputs. Unfortunately there was a bug in a special setter method of Datarate :-/

The measurements should now work as expected :-)

Greetings,

Marco

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Odysseus Benchmarking

Post by stefan » Sun Jan 29, 2017 2:54 pm

Hi Marco,

thanks for the fast response!
Indeed, now it seems to be fine. :)

Did you saw my last edit? Thats a bit about best practice:
If I want to use a stream that already includes some data rates and the calculated latency, can I aggregate this stream and execute the calcLatency operator to get the latency for a higher view? Or do I have to change the structure, that the aggregation takes place before the calcLatency operator?

greetings,
Stefan

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Odysseus Benchmarking

Post by Marco Grawunder » Sun Jan 29, 2017 4:42 pm

Hmm. I am not sure, what you intension is... The latency of an aggregation is calculated as the last element that participates in this aggregation. Additionally, we provide the lowest (oldest) value in the metadata.

If you want to create e.g. the average latency of an aggregation you need to "transfer" the meta data to the data (e.g. with a Map operator).

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Odysseus Benchmarking

Post by stefan » Sun Jan 29, 2017 4:45 pm

Hmm, ok, thanks. That answers my question.
Maybe I even have a better idea.

Thanks.

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Odysseus Benchmarking

Post by stefan » Sat Feb 04, 2017 5:20 pm

Hi,

I found a good way to use the evaluation feature. I wanted to test wheter it fits my requirements or not.
I created a evaluation job for one of my queries. Configured that I want to measure the latency, the throughput and the resources. I create plots for everything also.

To see my actual configuration here the created model.eval:

Code: Select all

<?xml version="1.0" encoding="UTF-8"?>
<evaluation CREATE_CPU_PLOTS="true" CREATE_LATENCY_PLOTS="true" CREATE_MEMORY_PLOTS="true" CREATE_THROUGHPUT_PLOTS="true" MEASURE_THROUGHPUT_EACH="2" NUMBER_OF_RUNS="1" OUTPUT_HEIGHT="300" OUTPUT_TYPE="PDF" OUTPUT_WIDTH="1000" PLOT_FILES_PATH="C:\SIMPLe\55 --- Durchführung der Evaluation\MO-Evaluation-Results\Plots" PROCESSINGRESULTSPATH="C:\SIMPLe\55 --- Durchführung der Evaluation\MO-Evaluation-Results" QUERY_FILE="Odysseus/EPN_MobilityRequest_Evaluation/MobilityOffer_Evaluation.qry" WITH_LATENCY="true" WITH_RESOURCE="true" WITH_THROUGHPUT="true">
<variables>
<variable ACTIVE="true" IMemento.internal.id="numberMOs">
<value>100</value>
<value>1000</value>
<value>2500</value>
</variable>
<variable ACTIVE="true" IMemento.internal.id="run">
<value>1</value>
<value>2</value>
<value>3</value>
</variable>
</variables>
</evaluation>
I used it and I have several questions. Maybe you can answer some of these:

1. I miss some values.
In the results-folder I can find only the throughput folder (except the query and the model.eval) that contains the throughputs for each run. In the plots folder there are only the plots for the throughput. What about the latency, and the system load? I can see in the Odysseus studio that sinks are created for that but I cannot see any output. Do I have to add something to the query or anywhere else? I added only the metadata attributes for time interval, latency, data rate and systemload at the beginning of the query. Do I have to add data rate or calcLatency operators the get this data? As far as I understood the wiki this should be done automatically by the pre-transformation handler.
Furthermore I read/assume that the throughput is only measured at the source. Is there a way to see the throughput for the complete query. So not only the "data rate" of the source but the number of elements that can be fully processed in one second/millisecond?

2. Is there a way to influence the outputs?
I am interested in some additional values. E. g. the data rate for some operators or sinks. Or the latency for a specific output.
A example is, that I read in another query tuples of RabbitMQ. One User Request consists of 3..n of this tuples. The user request can be identified by an id. In the query the tuples are read, processed and aggregated for one User Request at the end of the processing. This means, that the latency for a User request could be determined by Latency.lend - Latency.maxlstart (from reading the fist tuple till the output of the result). Is something like this possible? Is it possible with maxlstart?
Another example for the data rates: I want to measure the data rates at the beginning of the processing, at some operators during the processing and at the end of the processing.

greetings,
Stefan

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Odysseus Benchmarking

Post by Marco Grawunder » Mon Feb 06, 2017 10:32 am

Hi Stefan,
1. I miss some values.
In the results-folder I can find only the throughput folder (except the query and the model.eval) that contains the throughputs for each run. In the plots folder there are only the plots for the throughput. What about the latency, and the system load? I can see in the Odysseus studio that sinks are created for that but I cannot see any output. Do I have to add something to the query or anywhere else? I added only the metadata attributes for time interval, latency, data rate and systemload at the beginning of the query. Do I have to add data rate or calcLatency operators the get this data? As far as I understood the wiki this should be done automatically by the pre-transformation handler.
Furthermore I read/assume that the throughput is only measured at the source. Is there a way to see the throughput for the complete query. So not only the "data rate" of the source but the number of elements that can be fully processed in one second/millisecond?
I am not very familiar with this feature. So these are just ideas.

- What happens, when you do NOT add Latency, Datarate etc. to the query text?
- Systemload is currently not supported by the evaluation feature.
- The operators should be added automatically.
- Datarate measurements are only useful at the inputs else you would measure e.g. the selectivity of the query. Together with latency this should be all you need, I guess. Like e.g. precision and recall it is imporant to use both values, as the throughput can be raised by introducing buffer, but this will of course lead to higher latency, too.
2. Is there a way to influence the outputs?
I am interested in some additional values. E. g. the data rate for some operators or sinks. Or the latency for a specific output.
A example is, that I read in another query tuples of RabbitMQ. One User Request consists of 3..n of this tuples. The user request can be identified by an id. In the query the tuples are read, processed and aggregated for one User Request at the end of the processing. This means, that the latency for a User request could be determined by Latency.lend - Latency.maxlstart (from reading the fist tuple till the output of the result). Is something like this possible? Is it possible with maxlstart?
Another example for the data rates: I want to measure the data rates at the beginning of the processing, at some operators during the processing and at the end of the processing.
No, this is not possible at the moment. If you want these measurements you will have to do it by hand.

Greetings,

Marco

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest