Sorting and Windows in Odysseus

Post Reply
stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Sorting and Windows in Odysseus

Post by stefan » Fri Dec 09, 2016 5:20 pm

Hello,

I am thinking about some optimizations of my query.
I want to ensure that all incoming tuples of my RabbitMQ message queue are in the correct order. If I use more worker processes of my Python script it could be possible, that tuples of different uuids are mixed together. All tuples belonging to one uuid should be consecutively arranged and in increasing order, e. g.

uuid1, pathcnt1, pathpos1
uuid1, pathcnt1, pathpos0
uuid1, pathcnt0, pathpos2
uuid1, pathcnt0, pathpos1
uuid1, pathcnt0, pathpos0
uuid0, pathcnt1, pathpos1
uuid0, pathcnt1, pathpos0
uuid0, pathcnt0, pathpos2
uuid0, pathcnt0, pathpos1
uuid0, pathcnt0, pathpos0

(This list just represents the idea, pathcnt and pathpos are actually integers)

I tried the sort operator to achieve this:

Code: Select all

testwindow = TIMEWINDOW({SIZE = [5, 'seconds'], SLIDE = [5, 'seconds']}, mrNodeRelsNestedDep)
test = SORT({ATTRIBUTES = ['uuid', 'pathCnt', 'pathPos']}, testwindow)
...but there are two things:
1. I need always all tuples of one uuid. Lets assume the window has a size of 5 seconds, so we talk about the interval [0,5). What happens if a the tuples of uuid0 with pathcnt0 will be received at second 4 and the tuples of uuid0 and pathcnt1 will be received at second 5 or 6? I can not use element windows because each user request could result in a different number of tuples.

2. The above code is not working at all. The TIMEWINDOW is working and the tuples have the same start and end timestamp. The SORT is always empty, i. e. does not provide any tuples.

At the moment, I just use one python worker process. But it would be a "nice to have" feature if I could use more than one. :)

greetings,
Stefan

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Sorting and Windows in Odysseus

Post by Marco Grawunder » Mon Dec 12, 2016 7:15 pm

Ok. This is a generell problem of stream processing systems. I think it is not a good idea to parallelize in a way that will "kill" the order and recombine it in Odysseus again.

Reordering does alway require to wait for some time just to be sure, that no further element can reach the system. So you will introduce some delay

Maybe the sort operator is not working at the moment. I will check this, you can use an SORT Aggreation (in the aggregat_ion_ operator). If you can garantue, that the uuids are not mixed, maybe the predicate window can help to combine the elements (but this is not tested very well)

I would suggest to partition the stream with an hash code on the uuid and use one query for each partition. With the #LOOP command you can create multiple queries with e.g. different topics (or tcp ports).

Greetings,

Marco

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Sorting and Windows in Odysseus

Post by Marco Grawunder » Tue Dec 13, 2016 10:01 am

Remark: Sort works fine. What is important: Sort only creates an output, if a new element is read with a starttimestamp higher than all start timestamp of all read elements, else the system cannot determine the end of the set to sort.

So in the following example, output is created after reading the last "a"-value.

a [0,10] --> no output
c [0,10] --> no output
b [0,10] --> no output
a [10,20] --> Output of three tuples a [0,10], b [0,10], c [0,10]

Greetings,

Marco

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Sorting and Windows in Odysseus

Post by stefan » Tue Dec 13, 2016 4:47 pm

Hi Marco,

thank you very much for your ideas!
Ok, actually I thought that I checked the timestamps before the sort - that was one reason why I used a sliding window - but I will check it again. But you are right, it is maybe not a good idea to mix the tuples up and sort them again in Odysseus. The time I would save through parallelizing the processes, I could maybe loose at the sorting task. I think for the first prototype this is good enough. I will give it some thought and would return to that topic later on. :)

greetings,
Stefan

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest