Q1: What are the various presales activity that you are aware of as an architect?
Q2: Have you heard of any master data management platform?
Q3: We have data collection platform (System X) where data from 100+ applications or different data sources is coming.
There are some Admins roles, who can configure validations rules, clustering rules, data transformation rules on the collected data from 100+ applications into platform. Example of clustering is lets suppose same data is coming for different apps but in some alias or alter form, example - Let say app1 is credit card app company and sending the username as Himanshu Goel, while the app2 (Loan App) is sending the username as HGoel, in another app3(Insurance App) it's just HG, we know all people Himanshu Goel, HGoel, HG are same. and in my final output I want a single entity for this person instead of multiple portfolios. this final record must have only golden records from all apps combined. The final output must have golden records and all relavant informtion for that person eg for this particular person this is the credit score (data from app1), and numver plus type of insurance hodling (from app3), this much of salary/ bank balance from (app4) and so on. This identification rules can be developed by Admins which can tell its one person.
These Golden records are based on survival records eg credit score, address coming from App1 (Credit Card App) will become golden record. PAN card and loan amount, mobile number details coming from app2 will become golden records and so on
System X is collecting data at 7pm IST from all of the sources. System X is not allowed to connect directly, but they are open to push data to open api endpoints if you provide them.
There is another System, System A. This is our system. We have to expose our API, take all data from System X and put into BigQuery, or Azure Databrics, etc on this system we do all the rules that Admins apply on data eg which records will become golden records. It can do clustering in data and then show all this data to the UI for the admin to cross verify the data and modification. This new data where we have saved must be available to downstream applications after 1 hour received form System X. On the UI we have to data like how many records we have processed, how many records left, how many records fails to process,
Tell me about your approach towards the solutioning of System A.
Points to remember
>> What will be your data transfer technique,
>> How and where you will be writing this Admin rules logic
>> Let say records coming to system A is like 10,000 records per second, how to handle this via API? scaling and resilience etc?
No comments:
Post a Comment