Rapid Prototyping to split tuple values in multiple streams

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Rapid Prototyping to split tuple values in multiple streams

Post by Marco Grawunder » Mon Nov 07, 2016 1:21 pm

Hi Stefan,

RabbitMQ is not part of the Odysseus core. You need to install the RabbitMQ feature.

Greetings,

Marco

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Rapid Prototyping to split tuple values in multiple streams

Post by stefan » Mon Nov 07, 2016 3:04 pm

Thanks!

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Rapid Prototyping to split tuple values in multiple streams

Post by stefan » Wed Nov 09, 2016 5:17 pm

Hi Marco,

is it possible to add an unique identifier to a tuple in the PQL query.
I am reading my sources, want to add a unique identifier to this tuples and send that to my processing python script. This unique identifier should be used to match the results of one specific request.
I found the UUID() but it seems that I cannot use this in PQL.

greetings,
Stefan

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Rapid Prototyping to split tuple values in multiple streams

Post by Marco Grawunder » Wed Nov 09, 2016 5:28 pm

You can use the MAP-Operator with the UUID-Function for this.

Code: Select all

output = MAP({
              expressions = [
                                [uuid(),'id'],
                                'attribute1',
                                'attribute2',
                                'attribute3',
                                ....
                            ]
             }, input)
See https://wiki.odysseus.offis.uni-oldenbu ... p+operator

Greetings,

Marco

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Rapid Prototyping to split tuple values in multiple streams

Post by Marco Grawunder » Fri Nov 11, 2016 2:25 pm

So. I added some new functionality:

Code: Select all

#PARSER PQL
#RUNQUERY
json2 = ACCESS({
            source='json',
            wrapper='GenericPull',
            transport='File',
            protocol='JSON',
            datahandler='KeyValueObject',
            options=[['filename','${PROJECTPATH}/bamberg.json']]          
          }        
        )
        
tuple = KEYVALUETOTUPLE({
            schema = [['kv','KeyValueObject']]
          },
          json2
        )

map1 = MAP({EXPRESSIONS = [['path(kv,\'$.results[0].data[*].row[0]\')','pathes']]}, tuple)

unnested = UNNEST({ATTRIBUTE = 'pathes'}, map1)
I used the file you gave to us. Here it will be read into a key value object and than transformed to a tuple with a key value object (to get the same state as in your scenario).

The final MAP-Operator now uses a JSONPATH-Expression to create substructures from the key value object and the unnest takes each element from the list and creates an own tuple.

The output is like this

Image

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Rapid Prototyping to split tuple values in multiple streams

Post by stefan » Sun Nov 27, 2016 3:17 pm

Sounds interessting. At the moment this is done by my python script, but I will have a look at it.

Another question to the uuid. I read a csv for my example user requests, use a map operator to add the uuid and do a project after that.
During some tests I noticed the following:
The uuid assigned to a specific user request seems to be different after the project. I checked now multiple times now. Example:

MAP Operator:
- User Request 1; UUID = 93cdd2fe...
- User Request 2; UUID = 3446b410...

PROJECT Operator:
- User Request 1; UUID = 03a77b11...
- User Request 2; UUID = de32e393...

All I am doing is:

Code: Select all

srcRequests := MAP({
	expressions = [
	['uuid()', 'mr_uuid'], 
	'eventTypeIdentifier',
	'localRequestId',
.....

RMQRequest := PROJECT(
	{
		attributes = ['mr_uuid', 'localRequestId', ...]
	}, srcRequests 
)
Up to now I thought that the UUID is a unique id of this tuple and does not change once it has been assigned. But now I am not sure, maybe it is a unique id for the specific tuple and changes for each alteration of the tuple. I could not find this information in the wiki. Maybe you can answer this question? If its the latter, is there a possibility that this specific id does not change?

Thanks!
Stefan

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Rapid Prototyping to split tuple values in multiple streams

Post by Marco Grawunder » Mon Nov 28, 2016 9:32 am

Hi Stefan,

we cannot reproduce this behaviour.

Greetings,

Marco

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Rapid Prototyping to split tuple values in multiple streams

Post by stefan » Mon Nov 28, 2016 12:46 pm

Hi Marco,

a prepared a very easy and straight forward example. Maybe I am doing something wrong:

My csv file:

Code: Select all

idInCSV
MR0000000001
MR0000000002
MR0000000003
MR0000000004
MR0000000005
MR0000000006
MR0000000007
MR0000000008
MR0000000009
MR0000000010
MR0000000011
MR0000000012
MR0000000013
MR0000000014
MR0000000015
MR0000000016
MR0000000017
MR0000000018
MR0000000019
MR0000000020
MR0000000021
MR0000000022
MR0000000023
MR0000000024
MR0000000025
My code:

Code: Select all

#PARSER PQL

#DROPALLQUERIES
#DROPALLSINKS
#DROPALLSOURCES

#RUNQUERY
/// Read Mobility Request CSV Data
csvInput = ACCESS({
	source='csvInput',
	wrapper='GenericPull',
	transport='File',
	protocol='csv',
	dataHandler='Tuple',
	options=[
		['delimiter',';'],
		['textDelimiter',"'"],
		['readfirstline','false'],
		['delay','3000'],
		['filename', 'C:\Users\Stefan\Desktop\Input.csv']
	],
	Schema=[
		['idInCSV','String']
	]
})

/// Add a global unique identifier to Mobility Request CSV Data
inputWithUuid := MAP({
	expressions = [
		['uuid()', 'mr_uuid'], 
		'idInCSV'
	]
}, csvInput
)

#RUNQUERY
/// Prepare Neo4j related attributes for RabbitMQ
projectedInput := PROJECT(
	{
		attributes = ['mr_uuid', 'idInCSV']
	}, inputWithUuid
)
If I am looking at the data of "inputWithUuid" and "projectedInput" I get different UUIDs for the same "idInCSV".

greetings,
Stefan

User avatar
Marco Grawunder
Posts: 272
Joined: Tue Jul 29, 2014 10:29 am
Location: Oldenburg, Germany
Contact:

Re: Rapid Prototyping to split tuple values in multiple streams

Post by Marco Grawunder » Mon Nov 28, 2016 1:53 pm

Hi Stefan,

ok. It seems, that you run the query multiple times? Here every run would create a new, unique uuid (independent ot the tuple).

Maybe you can use hash functions on an input string?

https://wiki.odysseus.offis.uni-oldenbu ... +Functions

If this is not the case: Please let Studio show the whole plan ("Call active graph editor" in ther Query View) and upload it somewhere, e.g. in a Bug Report?

Greetings,

Marco

stefan
Posts: 85
Joined: Tue Jul 12, 2016 1:03 pm

Re: Rapid Prototyping to split tuple values in multiple streams

Post by stefan » Mon Nov 28, 2016 3:32 pm

Hi Marco,

how can I run queries multiple times? Maybe I configured something wrong.
But as you can see in my example, I just use this code, thats the full source code of the example.

The plan looks like this:
Image

I added the attribute user and version to my csv file and concatenate this as a string und use the MD5 function. This creates - of course - the same value. I can work with this. But its really strange...

greetings,
Stefan

Post Reply

Who is online

Users browsing this forum: No registered users and 1 guest