Odysseus Benchmarking

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Odysseus Benchmarking

Post by stefan » Mon Feb 06, 2017 1:27 pm

Hi Marco,

thanks for your answer!
- What happens, when you do NOT add Latency, Datarate etc. to the query text?
You mean neighter adding the metadata nor the operators? I will get exceptions.

Code: Select all

de.uniol.inf.is.odysseus.core.server.planmanagement.QueryParseException: EvaluationPreTransformation throwed an exception. Message: Missing attribute 'no such attribute: latency'
...
Caused by: de.uniol.inf.is.odysseus.core.sdf.schema.NoSuchAttributeException: Missing attribute 'no such attribute: latency'
...
289202 ERROR StandardExecutor  - Could not add query '#PRETRANSFORM EvaluationPreTransformation
- Datarate measurements are only useful at the inputs else you would measure e.g. the selectivity of the query. Together with latency this should be all you need, I guess. Like e.g. precision and recall it is imporant to use both values, as the throughput can be raised by introducing buffer, but this will of course lead to higher latency, too.
Hmm, ok, I understand what you mean. But lets think about a stream which data rate is measured at the beginning, "in the middle" and at the end. The tuples are not filtered etc., they are only transformed. If the datarate "in the middle" is much slower than at the beginning, that would point to a bottleneck/some heavy processing.
Furthermore the datarate at the end could indicate how much user requests Odysseus can process per second.
Or is this not really practicable?

To the buffers: I tried to fill my RabbitMQ message queue with a different number of routing results of user requests (100, 1000, 2500,...) and then starting processing (1 user request = approx 36 tuples to process). The idea was to see how much requests/tuples can be processed by Odysseus per second. The reading rate is very different - I guess depending on the buffers. The final result output rate (data rate operator before the final sender) slows down as more tuples are in the queue. But if I calculate the real throughput number of tuples/(lend of last tuple - maxlstart first tuple) by hand I can see that the throughput raises and reaches its peak by 5000 user requests (about 180.000 tuples in the queue). I guess the reason for that are the buffers. I guess the dropping datarate at the final sender is due to the fact that for less tuples the buffers are not fully filled and the overall processing can be faster.

Additionally I tried to write a specific number of tuples per second into the message queue to see how the latency behaves for a different number of requests per second. Based on my further results I assumed that Odysseus could process about 600 tuples (which sounds less but if you think on the payload and the time consuming processing, its not really bad). But if I fill the queue with 1000 tuples per second they are read also in that speed. It seems that they are buffered and processed later (the latency will be higher of course). Of course the buffers will eventually be full but its really hard to find a good way to get reliable values for that (at least for me ;) ). The tuples in the queue cannot be easily imported, I have to generate them which is a time consuming process. Of course this is automated but the processing time is high.

What I am looking for is actually just a way to measure the max requests per second and how the latency changes between 1 - max tuples/second. Or at least some reliable values that can show how good/fast the processing is. Do you have an idea?

greetings,
Stefan

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Odysseus Benchmarking

Post by Marco Grawunder » Mon Feb 06, 2017 2:18 pm

Hmm, ok, I understand what you mean. But lets think about a stream which data rate is measured at the beginning, "in the middle" and at the end. The tuples are not filtered etc., they are only transformed. If the datarate "in the middle" is much slower than at the beginning, that would point to a bottleneck/some heavy processing.
Furthermore the datarate at the end could indicate how much user requests Odysseus can process per second.
Or is this not really practicable?
Ok. The datarates could only be different if there are buffers inside the processing. Else the call is made inside a single thread call.
What I am looking for is actually just a way to measure the max requests per second and how the latency changes between 1 - max tuples/second. Or at least some reliable values that can show how good/fast the processing is. Do you have an idea?
What your are looking for is some kind of a profiler ... we are dreaming of this, too ;-) ... I have no idea, how to solve your problem with the current Odysseus state. Sorry.

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Odysseus Benchmarking

Post by stefan » Mon Feb 06, 2017 3:04 pm

Hi,
Ok. The datarates could only be different if there are buffers inside the processing. Else the call is made inside a single thread call.
Ok, and how can I see whether there is a buffer or not? I dont added one explicitly...
What your are looking for is some kind of a profiler ... we are dreaming of this, too ;-) ... I have no idea, how to solve your problem with the current Odysseus state. Sorry.
Ok. :D I will have to discuss this. Thanks so far.

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Odysseus Benchmarking

Post by Marco Grawunder » Mon Feb 06, 2017 4:12 pm

Ok, and how can I see whether there is a buffer or not? I dont added one explicitly...
In our currently version, no additional buffer is added to the plan. So, there is no other buffer...

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Odysseus Benchmarking

Post by stefan » Tue Feb 07, 2017 2:01 am

Hmm, ok, thanks. I have to check this again.

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Odysseus Benchmarking

Post by stefan » Wed Feb 08, 2017 12:15 am

Sorry, I missed something. What about the data rate and the latency. I just get the throughput in my current case.

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Odysseus Benchmarking

Post by Marco Grawunder » Fri Feb 10, 2017 10:43 am

Sorry for the late response.

I am not sure. Are you missing the data or are you missing the plots? I saw an "ü" in you path. Maybe this is a problem?

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Odysseus Benchmarking

Post by stefan » Fri Feb 10, 2017 1:19 pm

Thanks for your response.
I changed the folders for results and plots to C:\test now.
I am missing both. The eval-folder contains the model.evela, the query and a throughput folder for the different runs. Thats it. There is and no latency. During the evaluation I can see a sink for the latency that points to the correct folder. But there are no folders and no files created for this.

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Odysseus Benchmarking

Post by Marco Grawunder » Fri Feb 10, 2017 1:51 pm

Ok. So there arent any folders for latency? This could be, because of multiple sinks in your query. I tested it with only a single sink.

When you start the query by hand, I guess there is latency output?

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Odysseus Benchmarking

Post by stefan » Fri Feb 10, 2017 2:49 pm

No, as I wrote some messages above I used the my regular query. I did not add calclatency or data rate operators because the should be added automatically.
I tried to add a calclatency operator before the final csv sink. In this case I get errors "EvaluationPreTransformation throwed an exception. Message: Sink name already used".
This is my "small" query. That reads the csv file with mobility offers, adds them to Neo4j (WSEnrich) and stores the result in a csv. This means there is just one sink. Strange...

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest