Client has deep pockets and a shed load (well, say 15 so far) of external data sources (think Google Analytics, SAP Marketing Cloud, Kentico) which they want me to

1/ pull into a data lake

2/ keep refreshed.

There’s some which I’m not even sure have an API.

Could be hundreds of websites, around the world, Facebook pages, youtube etc. Why? So they can get some smart responsive PowerBI dashboards happening.

Now, I’ve extracted all of these sources in the past through python. And at this stage, I’m thinking, setup a server, write some routines to extract the data and save it to the data lake (flavour unknown as of yet). There’s likely some I haven’t had experience with, but I’m not too concerned about that.

But I’m after a sanity check.

If you were going to buy systemx to do all of this, or most of it, what would you be buying?

Would systemx include some form of GUI so client could (eg) roll their own Google Analytics query and store the data in the lake?

Thanks for thinking about this.

—=L

submitted by /u/Laurielounge
[link] [comments]


Go to Source of this post
Author Of this post: /u/Laurielounge
Title Of post: Recommendation for approach for populating and refreshing new data lake
Author Link: {authorlink}

By admin