Tuesday, January 28, 2020

Model Repository Overview

Model Repository Overview
The Model repository is a relational database that stores the metadata for projects and folders.
Connect to the Model repository to create and edit physical data objects, mapping, profiles, and other objects. Include objects in an application, and then deploy the application to make the objects available for access by end users and third-party tools.
The following image shows an open Model repository named mrs1 in the Object Explorer view:






The Model Repository Service manages the Model repository. All client applications and application services that access the Model repository connect through the Model Repository Service. Client applications include the Developer tool and the Analyst tool. Informatica services that access the Model repository include the Model Repository Service, the Analyst Service, and the Data Integration Service.
When you set up the Developer tool, you must add a Model repository. Each time you open the Developer tool, you connect to the Model repository to access projects and folders.
When you edit an object, the Model repository locks the object for your exclusive editing. You can also integrate the Model repository with a third-party version control system. With version control system integration, you can check objects out and in, undo the checkout of objects, and view and retrieve historical versions of objects.

Informatica Data Quality and Profiling

Use the data quality capabilities in the Developer tool to analyze the content and structure of your data and enhance the data in ways that meet your business needs.
Use the Developer tool to design and run processes to complete the following tasks:
  • Profile data. Profiling reveals the content and structure of data. Profiling is a key step in any data project as it can identify strengths and weaknesses in data and help you define a project plan.
  • Create scorecards to review data quality. A scorecard is a graphical representation of the quality measurements in a profile.
  • Standardize data values. Standardize data to remove errors and inconsistencies that you find when you run a profile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure that the city, state, and ZIP code values are consistent.
  • Parse data. Parsing reads a field composed of multiple values and creates a field for each value according to the type of information it contains. Parsing can also add information to records. For example, you can define a parsing operation to add units of measurement to product data.
  • Validate postal addresses. Address validation evaluates and enhances the accuracy and deliverability of postal address data. Address validation corrects errors in addresses and completes partial addresses by comparing address records against address reference data from national postal carriers. Address validation can also add postal information that speeds mail delivery and reduces mail costs.
  • Find duplicate records. Duplicate analysis calculates the degrees of similarity between records by comparing data from one or more fields in each record. You select the fields to be analyzed, and you select the comparison strategies to apply to the data. The Developer tool enables two types of duplicate analysis: field matching, which identifies similar or duplicate records, and identity matching, which identifies similar or duplicate identities in record data.
  • Manage exceptions. An exception is a record that contains data quality issues that you correct by hand. You can run a mapping to capture any exception record that remains in a data set after you run other data quality processes. You review and edit exception records in the Analyst tool.
  • Create reference data tables. Informatica provides reference data that can enhance several types of data quality process, including standardization and parsing. You can create reference tables using data from profile results.
  • Create and run data quality rules. Informatica provides rules that you can run or edit to meet your project objectives. You can create mapplets and validate them as rules in the Developer tool.
  • Collaborate with Informatica users. The Model repository stores reference data and rules, and this repository is available to users of the Developer tool and Analyst tool. Users can collaborate on projects, and different users can take ownership of objects at different stages of a project.
  • Export mappings to PowerCenter®. You can export and run mappings in PowerCenter. You can export mappings to PowerCenter to reuse the metadata for physical data integration or to create web services.

Informatica Nodes/Core Services/Gateway Nodes

Informatica Nodes/Core Services/Gateway Nodes

Service Manager:
The Service Manager is a service that manages all domain operations. It runs within Informatica Services. It runs as a service on Windows . When you start Informatica Services, you start the Service Manager. The Service Manager runs on each node. If the Service Manager is not running, the node is not available.
The Service Manager runs on all nodes in the domain to support the application services and the domain:
Application Services: Application services represent PowerCenter server-based functionality. Here are some application services in Informatica Power Center.
  • Integration Service
  • Repository Service
  • Reporting Service
  • Metadata Manager Service
  • SAP BW Service
  • Web Services Hub
  • Reference Table Manager Service

Required Services and DB Schemas for 10.1 IDQ Configuration



  • Required Services are:
    • Model repository service (MRS)
    • Data Integration Service
    • Analyst Service
    • Content Management Service
    • Data Director Service

    Required database Schemas are :
    • Database schema for Model repository (for MRS)
    • Database schema for data profiling  (for Analyst service)
IDQ Application Services and Schemas:

List of application services part of the Data Quality Standard Edition:
  1. AnalysServicee
  2. Content Management Service
  3. Data Integration Service
  4. Model Repository Service
  5. Search Service

Set up the following databases:
  1. Model repository for the Model Repository Service.
  2. Data object cache database to cache logical data objects and virtual tables.
  3. Profiling warehouse to perform data profiling and discovery.
  4. Workflow database to store run-time metadata for workflows.
  5. Reference data warehouse to store reference table data for the Content Management Service.


Data engineering Interview Questions

1)  What all challenges you have faced and how did you overcome from it? Ans:- Challenges Faced and Overcome As a hypothetical Spark develop...