BIRD 벤치마크 Databricks 시도

지니 모델 생성방법

  • GENIE 에 들어가서 SEMENTIC VIEW 생성하기 클릭
  • MODEL + SEMENTIC VIEW 선택

  • 생성된 Sementic 모델에서 각 테이블마다 realationship 설정을 추가해주기

 
  • 질의를 날려 SQL 쿼리 / 실행결과 답변 결과를 볼 수 있음
 

  • 생성된 답변에 설명을 추가 / 벤치마크에 추가 / 결과를 CSV로 카피 등의 추가작업도 가능

  • 벤치마크 목록을 추가하여 점수를 볼 수도 있습니다
    틀린답의 경우 아래처럼 비교도 가능

 


아래는 정답인 경우

 

지니 모델의 데이터 & SEMENTIC VIEW 구성

  • 1 테이블당 1개 sementic view 생성
     
  • relationship은 지니에서 직접 수기로 설정
 
  • PK/UNIQUE KEY는 YAML에서 설정하는 부분을 찾지못해서 생략

8개 SEMENTIC VIEW YAML파일

trans

더보기

version: 1.1

source: dsml_dev.pj_aiad_trgt_rcmd.bird_trans

comment: "Transaction metrics enriched with account, district, client, and card context."

dimensions:
  - name: trans_id
    expr: source.trans_id
    comment: transaction id
    display_name: trans_id
    synonyms:
      - transaction id
  - name: date
    expr: source.date
    comment: date of transaction
    display_name: date
    synonyms:
      - date of transaction
  - name: type
    expr: source.type
    comment: |-
      +/- transaction
      "PRIJEM" stands for credit
      "VYDAJ" stands for withdrawal
    display_name: type
    synonyms:
      - +/- transaction
  - name: operation
    expr: source.operation
    comment: |-
      mode of transaction
      "VYBER KARTOU": credit card withdrawal
      "VKLAD": credit in cash
      "PREVOD Z UCTU": collection from another bank
      "VYBER": withdrawal in cash
      "PREVOD NA UCET": remittance to another bank
    display_name: operation
    synonyms:
      - mode of transaction
  - name: k_symbol
    expr: source.k_symbol
    comment: |-
      "POJISTNE": insurance payment
      "SLUZBY": payment for statement
      "UROK": interest credited
      "SANKC. UROK": sanction interest if negative balance
      "SIPO": household
      "DUCHOD": old-age pension
      "UVER": loan payment
    display_name: k_symbol
    synonyms:
      - characterization of the transaction
  - name: bank
    expr: source.bank
    comment: each bank has unique two-letter code
    display_name: bank
  - name: account
    expr: source.account
    display_name: account
    synonyms:
      - account of the partner
  - name: account_id
    expr: source.account_id
    display_name: account_id
  - name: balance
    expr: source.balance
    comment: |-
      balance after transaction
      Unit: USD
    display_name: balance
    synonyms:
      - balance after transaction

measures:
  - name: amount
    expr: SUM(source.amount)
    comment: |-
      amount of money
      Unit: USD
    display_name: amount
    synonyms:
      - amount of money

district

더보기

version: 1.1

source: dsml_dev.pj_aiad_trgt_rcmd.bird_district

comment: "District-level demographic, economic, and social attributes."

dimensions:
  - name: district_id
    expr: source.district_id
    comment: location of branch
    synonyms:
      - location of branch
  - name: a2
    expr: source.a2
    comment: district_name
    synonyms:
      - district_name
  - name: a3
    expr: source.a3
    comment: region
    synonyms:
      - region
  - name: a4
    expr: source.a4
    comment: number of inhabitants
    synonyms:
      - number of inhabitants
  - name: a5
    expr: source.a5
    comment: municipality < district < region
    synonyms:
      - no. of municipalities with inhabitants < 499
  - name: a6
    expr: source.a6
    comment: municipality < district < region
    synonyms:
      - no. of municipalities with inhabitants 500-1999
  - name: a7
    expr: source.a7
    comment: municipality < district < region
    synonyms:
      - no. of municipalities with inhabitants 2000-9999
  - name: a8
    expr: source.a8
    comment: municipality < district < region
    synonyms:
      - no. of municipalities with inhabitants > 10000
  - name: a9
    expr: source.a9
    comment: not useful
  - name: a10
    expr: source.a10
    comment: ratio of urban inhabitants
    synonyms:
      - ratio of urban inhabitants
  - name: a11
    expr: source.a11
    comment: average salary
    synonyms:
      - average salary
  - name: a12
    expr: source.a12
    comment: unemployment rate 1995
    synonyms:
      - unemployment rate 1995
  - name: a13
    expr: source.a13
    comment: unemployment rate 1996
    synonyms:
      - unemployment rate 1996
  - name: a14
    expr: source.a14
    comment: no. of entrepreneurs per 1000 inhabitants
    synonyms:
      - no. of entrepreneurs per 1000 inhabitants
  - name: a15
    expr: source.a15
    comment: no. of committed crimes 1995
    synonyms:
      - no. of committed crimes 1995
  - name: a16
    expr: source.a16
    comment: no. of entrepreneurs per 1000 inhabitants
    synonyms:
      - no. of entrepreneurs per 1000 inhabitants

loan

더보기

version: 1.1

source: dsml_dev.pj_aiad_trgt_rcmd.bird_loan

comment: "Loan information including approval details, duration, status, and payment\
  \ amounts."

dimensions:
  - name: account_id
    expr: source.account_id
    comment: the id number identifying the account
  - name: amount
    expr: source.amount
    comment: |-
      approved amount
      unit: US dollar
  - name: date
    expr: source.date
    comment: the date when the loan is approved
  - name: duration
    expr: source.duration
    comment: |-
      loan duration
      unit: month
  - name: loan_id
    expr: source.loan_id
    comment: the id number identifying the loan data
  - name: status
    expr: source.status
    comment: |-
      repayment status
      "A" stands for contract finished, no problems
      "B" stands for contract finished, loan not paid
      "C" stands for running contract, OK so far
      "D" stands for running contract, client in debt

measures:
  - name: payments
    expr: sum(source.payments)
    comment: |-
      monthly payments
      unit: month
    synonyms:
      - monthly payments

client

더보기

version: 1.1

source: dsml_dev.pj_aiad_trgt_rcmd.bird_client

comment: Client information including demographic attributes and branch location.

dimensions:
  - name: client_id
    expr: source.client_id
    comment: the unique number
    display_name: client_id
  - name: birth_date
    expr: source.birth_date
    comment: birth date
    display_name: birth_date
  - name: district_id
    expr: source.district_id
    comment: location of branch
    display_name: district_id
    synonyms:
      - location of branch
  - name: gender
    expr: source.gender
    comment: |-
      "F": female
      "M": male
    display_name: gender

disp

더보기

version: 1.1

source: dsml_dev.pj_aiad_trgt_rcmd.bird_disp

comment: "Disposition information linking clients and accounts, including role type."

dimensions:
  - name: disp_id
    expr: source.disp_id
    comment: unique number of identifying this row of record
    display_name: disp_id
    synonyms:
      - disposition id
  - name: account_id
    expr: source.account_id
    comment: id number of account
    display_name: account_id
  - name: client_id
    expr: source.client_id
    comment: id number of client
    display_name: client_id
  - name: type
    expr: source.type
    comment: |-
      type of disposition
      "OWNER" : owner of the account
      "USER" : user of the account
      "DISPONENT" : disponent
      commonsense evidence:
      the account can only have the right to issue permanent orders or apply for loans
    display_name: type

account

더보기

version: 1.1

source: dsml_dev.pj_aiad_trgt_rcmd.bird_account

comment: "Account information including creation date, branch location, and issuance\
  \ frequency."

dimensions:
  - name: account_id
    expr: source.account_id
    comment: the id of the account
    synonyms:
      - account id
  - name: date
    expr: source.date
    comment: |-
      the creation date of the account
      in the form YYMMDD
    synonyms:
      - date
  - name: district_id
    expr: source.district_id
    comment: location of branch
    synonyms:
      - location of branch
  - name: frequency
    expr: source.frequency
    comment: |-
      frequency of the account
      "POPLATEK MESICNE" stands for monthly issuance
      "POPLATEK TYDNE" stands for weekly issuance
      "POPLATEK PO OBRATU" stands for issuance after transaction
    synonyms:
      - frequency

card

더보기

version: 1.1

source: dsml_dev.pj_aiad_trgt_rcmd.bird_card

comment: "Credit card information including card type, issuance date, and disposition\
  \ linkage."

dimensions:
  - name: card_id
    expr: source.card_id
    comment: id number of credit card
    display_name: card_id
    synonyms:
      - credit card id
  - name: disp_id
    expr: source.disp_id
    comment: disposition id
    display_name: disp_id
    synonyms:
      - disposition id
  - name: issued
    expr: source.issued
    comment: |-
      the date when the credit card was issued
      format: YYMMDD
    display_name: issued
  - name: type
    expr: source.type
    comment: |-
      type of credit card
      "junior": junior class
      "classic": standard class
      "gold": high-level class
    display_name: type

order

더보기

version: 1.1

source: dsml_dev.pj_aiad_trgt_rcmd.bird_order

comment: "Standing orders including recipient account, bank, purpose, and debited\
  \ amount."

dimensions:
  - name: order_id
    expr: source.order_id
    comment: identifying the unique order
  - name: account_id
    expr: source.account_id
    comment: id number of account
  - name: account_to
    expr: source.account_to
    comment: |-
      account of the recipient
      each bank has unique two-letter code
    synonyms:
      - account of the recipient
  - name: bank_to
    expr: source.bank_to
    comment: bank of the recipient
    synonyms:
      - bank of the recipient
  - name: k_symbol
    expr: source.k_symbol
    comment: |-
      purpose of the payment
      "POJISTNE" stands for insurance payment
      "SIPO" stands for household payment
      "LEASING" stands for leasing
      "UVER" stands for loan payment
    synonyms:
      - characterization of the payment

measures:
  - name: total_amount
    expr: SUM(source.amount)
    comment: total debited amount
    synonyms:
      - debited amount

 

 

EVALUATION (정답률 67/106)

  • simple 49/62
  • moderate 17/37
  • challenging 1/7
 

사용한 모델

 

GENIE ERROR

  • 같은질문인데 어떨때는 에러발생 어떨때는 정상답변
 

참조 https://community.databricks.com/t5/data-engineering/intermittent-permission-denied-error-in-genie/td-p/146848
(databricks community에서 3일전 똑같은문제발생했다는 문의내역)

 
반응형

'<개념> > 기타' 카테고리의 다른 글

SNOWFLAKE 기본사용법  (0) 2026.04.07
BIRD 벤치마크 snowflake 시도  (0) 2026.04.07
벡터 유사도 (vector similiarity)  (0) 2022.06.05
FPGA, NPU 초간단 개념  (0) 2022.02.28
푸리에 변환 초간단 정리  (0) 2022.01.30