Discussion:
How do I limit queries made to my Hadoop Cluster
Boudreau, Carl
2016-12-02 21:34:33 UTC
Permalink
Dear Hadoop Expert,

This is my first post to this group, and I am new to Hadoop, so if this is not the correct list please excuse me. If you have a better group please let me know by replying directly to me.

I have a challenge before me. In my Hadoop system I have data from three companies called ABC, XYZ, and 123. And because of my Business Need; all the records from these three companies are in the same data store. The records are randomly mixed so one record could be a record from ABC and the next could be from XYX or 123. When I query my Hadoop system, for all records that have the last name of Boudreau for data analytical work; I get all 3000 records that have the last name Boudreau.

However, I also have a contract with ABC, that says I cannot aggregate their records. So I need a way to apply these contract rules when the data is queried. Please note: I have given 20 other developers access to my Hadoop system, but I am responsible to mirage the contractual obligations for my customers

What is the best way about going about this?

Can or do I write a Plug-in or modify YARN to have it check my contract rules prior to returning a dataset? Can or do I write a plug in for each and every Gateway Application such as Pig, Elastic Search, MapR, etc (about 10 applications that have access to my Hadoop system)

What are other options?

I have installed, configured and running Hadoop onto my local machine. I have the source code also downloaded onto my machine, and I am able to dig into it and compile it.

Regards Carl

This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
Ravi Prakash
2016-12-05 22:26:01 UTC
Permalink
Hi Carl!

user@ would be the right mailing list to ask such questions. Have you
looked at Apache Sentry / Ranger / Atlas?

HTH
Ravi
Post by Boudreau, Carl
Dear Hadoop Expert,
This is my first post to this group, and I am new to Hadoop, so if this is
not the correct list please excuse me. If you have a better group please
let me know by replying directly to me.
I have a challenge before me. In my Hadoop system I have data from three
companies called ABC, XYZ, and 123. And because of my Business Need; all
the records from these three companies are in the same data store. The
records are randomly mixed so one record could be a record from ABC and the
next could be from XYX or 123. When I query my Hadoop system, for all
records that have the last name of Boudreau for data analytical work; I get
all 3000 records that have the last name Boudreau.
However, I also have a contract with ABC, that says I cannot aggregate
their records. So I need a way to apply these contract rules when the data
is queried. Please note: I have given 20 other developers access to my
Hadoop system, but I am responsible to mirage the contractual obligations
for my customers
What is the best way about going about this?
Can or do I write a Plug-in or modify YARN to have it check my contract
rules prior to returning a dataset? Can or do I write a plug in for each
and every Gateway Application such as Pig, Elastic Search, MapR, etc (about
10 applications that have access to my Hadoop system)
What are other options?
I have installed, configured and running Hadoop onto my local machine. I
have the source code also downloaded onto my machine, and I am able to dig
into it and compile it.
Regards Carl
This e-mail, including attachments, may include confidential and/or
proprietary information, and may be used only by the person or entity
to which it is addressed. If the reader of this e-mail is not the intended
recipient or his or her authorized agent, the reader is hereby notified
that any dissemination, distribution or copying of this e-mail is
prohibited. If you have received this e-mail in error, please notify the
sender by replying to this message and delete this e-mail immediately.
Loading...