범위 : BIRD BENCH 의 FIANANCIAL 데이터셋에 한정하여 수동 벤치마크테스트
1.데이터 업로드 : Financial 데이터셋만 SQlite → Parquet 변환후 8개 테이블 업로드
- 파일1위치 : https://bird-bench.github.io/ - DEV 데이터셋 다운로드 - Dev_20240627 - dev_databases - financial - financial_sqlite
- 파일2위치 : https://github.com/AlibabaResearch/DAMO-ConvAI/tree/main/bird
#sqlite -> parquet 변환 python
import sqlite3
import pandas as pd
from pathlib import Path
SQLITE_PATH = "./dev_20240627/dev_databases/financial/financial.sqlite"
OUT_DIR = Path("./parquet/financial")
OUT_DIR.mkdir(parents=True, exist_ok=True)
conn = sqlite3.connect(SQLITE_PATH)
# 테이블 목록
tables = pd.read_sql(
"SELECT name FROM sqlite_master WHERE type='table'",
conn
)["name"]
print("테이블 목록:", list(tables))
for table in tables:
query = f'SELECT * FROM "{table}"'
df = pd.read_sql(query, conn)
df.columns = [c.lower() for c in df.columns]
out_path = OUT_DIR / f"{table}.parquet"
df.to_parquet(out_path, index=False)
print(f"V {table} → {out_path}")
conn.close()
print("SQLite → Parquet 변환 완료")
2.Sementic View 설정 : Cortex Anaylst - [Create New Sementic View] 방식으로 설정
dev_20240627/dev_databases/financial/database_description , dev_20240627/dev_tables.json 참조
- 8개 테이블별로 PK/Unique key 설정
- Bird Bench에서 제공하는 컬럼별 description/별칭 등입력
- relationship 설정
YAML/SEMENTIC VIEW 생성코드는 아래에 [더보기]
더보기
name: E260128
tables:
- name: CARD
base_table:
database: SAMPLE_DATA
schema: PUBLIC
table: CARD
dimensions:
- name: CARD_ID
synonyms:
- credit card id
description: id number of credit card
expr: CARD_ID
data_type: NUMBER(38,0)
- name: DISP_ID
synonyms:
- disposition id
description: disposition id
expr: DISP_ID
data_type: NUMBER(38,0)
- name: ISSUED
description: |-
the date when the credit card issued
in the form YYMMDD
expr: ISSUED
data_type: VARCHAR(16777216)
- name: TYPE
description: |-
type of credit card
""junior": junior class of credit card;
"classic": standard class of credit card;
"gold": high-level credit card"
expr: '"TYPE"'
data_type: VARCHAR(16777216)
primary_key:
columns:
- CARD_ID
- name: CLIENT
base_table:
database: SAMPLE_DATA
schema: PUBLIC
table: CLIENT
dimensions:
- name: BIRTH_DATE
description: birth date
expr: BIRTH_DATE
data_type: VARCHAR(16777216)
- name: CLIENT_ID
description: the unique number
expr: CLIENT_ID
data_type: NUMBER(38,0)
- name: DISTRICT_ID
synonyms:
- location of branch
description: location of branch
expr: DISTRICT_ID
data_type: NUMBER(38,0)
- name: GENDER
description: |-
"F:female
M:male "
expr: GENDER
data_type: VARCHAR(16777216)
primary_key:
columns:
- CLIENT_ID
- name: DISP
base_table:
database: SAMPLE_DATA
schema: PUBLIC
table: DISP
dimensions:
- name: ACCOUNT_ID
description: id number of account
expr: ACCOUNT_ID
data_type: NUMBER(38,0)
- name: CLIENT_ID
description: id number of client
expr: CLIENT_ID
data_type: NUMBER(38,0)
- name: DISP_ID
synonyms:
- disposition id
description: unique number of identifying this row of record
expr: DISP_ID
data_type: NUMBER(38,0)
- name: TYPE
description: |-
type of disposition
"OWNER" : "USER" : "DISPONENT"
commonsense evidence:
the account can only have the right to issue permanent orders or apply for loans
expr: '"TYPE"'
data_type: VARCHAR(16777216)
primary_key:
columns:
- DISP_ID
- name: DISTRICT
base_table:
database: SAMPLE_DATA
schema: PUBLIC
table: DISTRICT
dimensions:
- name: A10
synonyms:
- ratio of urban inhabitants
description: ratio of urban inhabitants
expr: A10
data_type: FLOAT
- name: A11
synonyms:
- average salary
description: average salary
expr: A11
data_type: NUMBER(38,0)
- name: A12
synonyms:
- unemployment rate 1995
description: unemployment rate 1995
expr: A12
data_type: FLOAT
- name: A13
synonyms:
- unemployment rate 1996
description: unemployment rate 1996
expr: A13
data_type: FLOAT
- name: A14
synonyms:
- no. of entrepreneurs per 1000 inhabitants
description: no. of entrepreneurs per 1000 inhabitants
expr: A14
data_type: NUMBER(38,0)
- name: A15
synonyms:
- no. of committed crimes 1995
description: no. of committed crimes 1995
expr: A15
data_type: FLOAT
- name: A16
synonyms:
- no. of entrepreneurs per 1000 inhabitants
description: no. of entrepreneurs per 1000 inhabitants
expr: A16
data_type: NUMBER(38,0)
- name: A2
synonyms:
- district_name
description: district_name
expr: A2
data_type: VARCHAR(16777216)
- name: A3
synonyms:
- region
description: region
expr: A3
data_type: VARCHAR(16777216)
- name: A4
synonyms:
- number of inhabitants
expr: A4
data_type: VARCHAR(16777216)
- name: A5
synonyms:
- no. of municipalities with inhabitants < 499
description: municipality < district < region
expr: A5
data_type: VARCHAR(16777216)
- name: A6
synonyms:
- no. of municipalities with inhabitants 500-1999
description: municipality < district < region
expr: A6
data_type: VARCHAR(16777216)
- name: A7
synonyms:
- no. of municipalities with inhabitants 2000-9999
description: municipality < district < region
expr: A7
data_type: VARCHAR(16777216)
- name: A8
synonyms:
- no. of municipalities with inhabitants > 10000
description: municipality < district < region
expr: A8
data_type: NUMBER(38,0)
- name: A9
description: not useful
expr: A9
data_type: NUMBER(38,0)
- name: DISTRICT_ID
synonyms:
- location of branch
description: location of branch
expr: DISTRICT_ID
data_type: NUMBER(38,0)
facts: []
primary_key:
columns:
- DISTRICT_ID
- name: LOAN
base_table:
database: SAMPLE_DATA
schema: PUBLIC
table: LOAN
dimensions:
- name: ACCOUNT_ID
description: the id number identifying the account
expr: ACCOUNT_ID
data_type: NUMBER(38,0)
- name: AMOUNT
description: |-
approved amount
unit:US dollar
expr: AMOUNT
data_type: NUMBER(38,0)
- name: DATE
description: the date when the loan is approved
expr: DATE
data_type: VARCHAR(16777216)
- name: DURATION
description: |-
loan duration
unit:month
expr: DURATION
data_type: NUMBER(38,0)
- name: LOAN_ID
description: the id number identifying the loan data
expr: LOAN_ID
data_type: NUMBER(38,0)
- name: STATUS
description: |-
repayment status
"'A' stands for contract finished, no problems;
'B' stands for contract finished, loan not paid;
'C' stands for running contract, OK so far;
'D' stands for running contract, client in debt"
expr: STATUS
data_type: VARCHAR(16777216)
facts:
- name: PAYMENTS
synonyms:
- monthly payments
description: |-
monthly payments
unit:month
expr: PAYMENTS
data_type: FLOAT
access_modifier: public_access
primary_key:
columns:
- LOAN_ID
- name: TRANS
base_table:
database: SAMPLE_DATA
schema: PUBLIC
table: TRANS
dimensions:
- name: _ACCOUNT
synonyms:
- account of the partner
expr: '"ACCOUNT"'
data_type: FLOAT
- name: ACCOUNT_ID
expr: ACCOUNT_ID
data_type: NUMBER(38,0)
- name: BALANCE
synonyms:
- balance after transaction
description: |-
balance after transaction
Unit:USD
expr: BALANCE
data_type: NUMBER(38,0)
- name: BANK
synonyms:
- Unit:USD
description: each bank has unique two-letter code
expr: BANK
data_type: VARCHAR(16777216)
- name: DATE
synonyms:
- date of transaction
description: date of transaction
expr: DATE
data_type: VARCHAR(16777216)
- name: K_SYMBOL
synonyms:
- characterization of the transaction
description: |-
"POJISTNE": stands for insurrance payment
"SLUZBY": stands for payment for statement
"UROK": stands for interest credited
"SANKC. UROK": sanction interest if negative balance
"SIPO": stands for household
"DUCHOD": stands for old-age pension
"UVER": stands for loan payment
expr: K_SYMBOL
data_type: VARCHAR(16777216)
- name: OPERATION
synonyms:
- mode of transaction
description: |-
mode of transaction
""VYBER KARTOU": credit card withdrawal
"VKLAD": credit in cash
"PREVOD Z UCTU" :collection from another bank
"VYBER": withdrawal in cash
"PREVOD NA UCET": remittance to another bank"
expr: OPERATION
data_type: VARCHAR(16777216)
- name: TRANS_ID
synonyms:
- transaction id
description: transaction id
expr: TRANS_ID
data_type: NUMBER(38,0)
- name: TYPE
synonyms:
- +/- transaction
description: |-
+/- transaction
""PRIJEM" stands for credit
"VYDAJ" stands for withdrawal"
expr: '"TYPE"'
data_type: VARCHAR(16777216)
facts:
- name: AMOUNT
synonyms:
- amount of money
description: |-
amount of money
Unit:USD
expr: AMOUNT
data_type: NUMBER(38,0)
access_modifier: public_access
primary_key:
columns:
- TRANS_ID
- name: _ACCOUNT
base_table:
database: SAMPLE_DATA
schema: PUBLIC
table: ACCOUNT
dimensions:
- name: ACCOUNT_ID
synonyms:
- account id
description: the id of the account
expr: ACCOUNT_ID
data_type: NUMBER(38,0)
- name: DATE
synonyms:
- date
description: |-
the creation date of the account
in the form YYMMDD
expr: DATE
data_type: VARCHAR(16777216)
- name: DISTRICT_ID
synonyms:
- location of branch
description: location of branch
expr: DISTRICT_ID
data_type: NUMBER(38,0)
- name: FREQUENCY
synonyms:
- frequency
description: |-
frequency of the acount
""POPLATEK MESICNE" stands for monthly issuance
"POPLATEK TYDNE" stands for weekly issuance
"POPLATEK PO OBRATU" stands for issuance after transaction"
expr: FREQUENCY
data_type: VARCHAR(16777216)
primary_key:
columns:
- ACCOUNT_ID
- name: _ORDER
base_table:
database: SAMPLE_DATA
schema: PUBLIC
table: '"ORDER"'
dimensions:
- name: ACCOUNT_ID
description: id number of account
expr: ACCOUNT_ID
data_type: NUMBER(38,0)
- name: ACCOUNT_TO
synonyms:
- account of the recipient
description: |-
account of the recipient
each bank has unique two-letter code
expr: ACCOUNT_TO
data_type: NUMBER(38,0)
- name: BANK_TO
synonyms:
- bank of the recipient
description: bank of the recipient
expr: BANK_TO
data_type: VARCHAR(16777216)
- name: K_SYMBOL
synonyms:
- characterization of the payment
description: |-
purpose of the payment
""POJISTNE" stands for insurance payment
"SIPO" stands for household payment
"LEASING" stands for leasing
"UVER" stands for loan payment"
expr: K_SYMBOL
data_type: VARCHAR(16777216)
- name: ORDER_ID
description: identifying the unique order
expr: ORDER_ID
data_type: NUMBER(38,0)
facts:
- name: AMOUNT
synonyms:
- debited amount
description: debited amount
expr: AMOUNT
data_type: FLOAT
access_modifier: public_access
primary_key:
columns:
- ORDER_ID
relationships:
- name: R2
left_table: CARD
right_table: DISP
relationship_columns:
- left_column: DISP_ID
right_column: DISP_ID
- name: R3
left_table: CLIENT
right_table: DISTRICT
relationship_columns:
- left_column: DISTRICT_ID
right_column: DISTRICT_ID
- name: R4
left_table: DISP
right_table: CLIENT
relationship_columns:
- left_column: CLIENT_ID
right_column: CLIENT_ID
- name: R5
left_table: DISP
right_table: _ACCOUNT
relationship_columns:
- left_column: ACCOUNT_ID
right_column: ACCOUNT_ID
- name: R6
left_table: LOAN
right_table: _ACCOUNT
relationship_columns:
- left_column: ACCOUNT_ID
right_column: ACCOUNT_ID
- name: R8
left_table: TRANS
right_table: _ACCOUNT
relationship_columns:
- left_column: ACCOUNT_ID
right_column: ACCOUNT_ID
- name: R1
left_table: _ACCOUNT
right_table: DISTRICT
relationship_columns:
- left_column: DISTRICT_ID
right_column: DISTRICT_ID
- name: R7
left_table: _ORDER
right_table: _ACCOUNT
relationship_columns:
- left_column: ACCOUNT_ID
right_column: ACCOUNT_ID
create or replace semantic view SAMPLE_DATA.PUBLIC.E260128
tables (
SAMPLE_DATA.PUBLIC.CARD primary key (CARD_ID),
SAMPLE_DATA.PUBLIC.CLIENT primary key (CLIENT_ID),
SAMPLE_DATA.PUBLIC.DISP primary key (DISP_ID),
SAMPLE_DATA.PUBLIC.DISTRICT primary key (DISTRICT_ID),
SAMPLE_DATA.PUBLIC.LOAN primary key (LOAN_ID),
SAMPLE_DATA.PUBLIC.TRANS primary key (TRANS_ID),
_ACCOUNT as SAMPLE_DATA.PUBLIC.ACCOUNT primary key (ACCOUNT_ID),
_ORDER as SAMPLE_DATA.PUBLIC."ORDER" primary key (ORDER_ID)
)
relationships (
R2 as CARD(DISP_ID) references DISP(DISP_ID),
R3 as CLIENT(DISTRICT_ID) references DISTRICT(DISTRICT_ID),
R4 as DISP(CLIENT_ID) references CLIENT(CLIENT_ID),
R5 as DISP(ACCOUNT_ID) references _ACCOUNT(ACCOUNT_ID),
R6 as LOAN(ACCOUNT_ID) references _ACCOUNT(ACCOUNT_ID),
R8 as TRANS(ACCOUNT_ID) references _ACCOUNT(ACCOUNT_ID),
R1 as _ACCOUNT(DISTRICT_ID) references DISTRICT(DISTRICT_ID),
R7 as _ORDER(ACCOUNT_ID) references _ACCOUNT(ACCOUNT_ID)
)
facts (
DISTRICT.A10 as A10 with synonyms=('ratio of urban inhabitants') comment='ratio of urban inhabitants',
DISTRICT.A12 as A12 with synonyms=('unemployment rate 1995') comment='unemployment rate 1995',
DISTRICT.A13 as A13 with synonyms=('unemployment rate 1996') comment='unemployment rate 1996',
DISTRICT.A15 as A15 with synonyms=('no. of committed crimes 1995') comment='no. of committed crimes 1995',
LOAN.PAYMENTS as PAYMENTS with synonyms=('monthly payments') comment='monthly payments
unit:month',
TRANS.AMOUNT as AMOUNT with synonyms=('amount of money') comment='amount of money
Unit:USD',
_ORDER.AMOUNT as AMOUNT with synonyms=('debited amount') comment='debited amount'
)
dimensions (
CARD.CARD_ID as CARD_ID with synonyms=('credit card id') comment='id number of credit card',
CARD.DISP_ID as DISP_ID with synonyms=('disposition id') comment='disposition id',
CARD.ISSUED as ISSUED comment='the date when the credit card issued
in the form YYMMDD',
CARD.TYPE as "TYPE" comment='type of credit card
""junior": junior class of credit card;
"classic": standard class of credit card;
"gold": high-level credit card"',
CLIENT.BIRTH_DATE as BIRTH_DATE comment='birth date',
CLIENT.CLIENT_ID as CLIENT_ID comment='the unique number',
CLIENT.DISTRICT_ID as DISTRICT_ID with synonyms=('location of branch') comment='location of branch',
CLIENT.GENDER as GENDER comment='"F:female
M:male "',
DISP.ACCOUNT_ID as ACCOUNT_ID comment='id number of account',
DISP.CLIENT_ID as CLIENT_ID comment='id number of client',
DISP.DISP_ID as DISP_ID with synonyms=('disposition id') comment='unique number of identifying this row of record',
DISP.TYPE as "TYPE" comment='type of disposition
"OWNER" : "USER" : "DISPONENT"
commonsense evidence:
the account can only have the right to issue permanent orders or apply for loans',
DISTRICT.A11 as A11 with synonyms=('average salary') comment='average salary',
DISTRICT.A14 as A14 with synonyms=('no. of entrepreneurs per 1000 inhabitants') comment='no. of entrepreneurs per 1000 inhabitants',
DISTRICT.A16 as A16 with synonyms=('no. of entrepreneurs per 1000 inhabitants') comment='no. of entrepreneurs per 1000 inhabitants',
DISTRICT.A2 as A2 with synonyms=('district_name') comment='district_name',
DISTRICT.A3 as A3 with synonyms=('region') comment='region',
DISTRICT.A4 as A4 with synonyms=('number of inhabitants'),
DISTRICT.A5 as A5 with synonyms=('no. of municipalities with inhabitants < 499') comment='municipality < district < region',
DISTRICT.A6 as A6 with synonyms=('no. of municipalities with inhabitants 500-1999') comment='municipality < district < region',
DISTRICT.A7 as A7 with synonyms=('no. of municipalities with inhabitants 2000-9999') comment='municipality < district < region',
DISTRICT.A8 as A8 with synonyms=('no. of municipalities with inhabitants > 10000') comment='municipality < district < region',
DISTRICT.A9 as A9 comment='not useful',
DISTRICT.DISTRICT_ID as DISTRICT_ID with synonyms=('location of branch') comment='location of branch',
LOAN.ACCOUNT_ID as ACCOUNT_ID comment='the id number identifying the account',
LOAN.AMOUNT as AMOUNT comment='approved amount
unit:US dollar',
LOAN.DATE as DATE comment='the date when the loan is approved',
LOAN.DURATION as DURATION comment='loan duration
unit:month',
LOAN.LOAN_ID as LOAN_ID comment='the id number identifying the loan data',
LOAN.STATUS as STATUS comment='repayment status
"''A'' stands for contract finished, no problems;
''B'' stands for contract finished, loan not paid;
''C'' stands for running contract, OK so far;
''D'' stands for running contract, client in debt"',
TRANS._ACCOUNT as "ACCOUNT" with synonyms=('account of the partner'),
TRANS.ACCOUNT_ID as ACCOUNT_ID,
TRANS.BALANCE as BALANCE with synonyms=('balance after transaction') comment='balance after transaction
Unit:USD',
TRANS.BANK as BANK with synonyms=('Unit:USD') comment='each bank has unique two-letter code',
TRANS.DATE as DATE with synonyms=('date of transaction') comment='date of transaction',
TRANS.K_SYMBOL as K_SYMBOL with synonyms=('characterization of the transaction') comment='"POJISTNE": stands for insurrance payment
"SLUZBY": stands for payment for statement
"UROK": stands for interest credited
"SANKC. UROK": sanction interest if negative balance
"SIPO": stands for household
"DUCHOD": stands for old-age pension
"UVER": stands for loan payment',
TRANS.OPERATION as OPERATION with synonyms=('mode of transaction') comment='mode of transaction
""VYBER KARTOU": credit card withdrawal
"VKLAD": credit in cash
"PREVOD Z UCTU" :collection from another bank
"VYBER": withdrawal in cash
"PREVOD NA UCET": remittance to another bank"',
TRANS.TRANS_ID as TRANS_ID with synonyms=('transaction id') comment='transaction id',
TRANS.TYPE as "TYPE" with synonyms=('+/- transaction') comment='+/- transaction
""PRIJEM" stands for credit
"VYDAJ" stands for withdrawal"',
_ACCOUNT.ACCOUNT_ID as ACCOUNT_ID with synonyms=('account id') comment='the id of the account',
_ACCOUNT.DATE as DATE with synonyms=('date') comment='the creation date of the account
in the form YYMMDD',
_ACCOUNT.DISTRICT_ID as DISTRICT_ID with synonyms=('location of branch') comment='location of branch',
_ACCOUNT.FREQUENCY as FREQUENCY with synonyms=('frequency') comment='frequency of the acount
""POPLATEK MESICNE" stands for monthly issuance
"POPLATEK TYDNE" stands for weekly issuance
"POPLATEK PO OBRATU" stands for issuance after transaction"',
_ORDER.ACCOUNT_ID as ACCOUNT_ID comment='id number of account',
_ORDER.ACCOUNT_TO as ACCOUNT_TO with synonyms=('account of the recipient') comment='account of the recipient
each bank has unique two-letter code',
_ORDER.BANK_TO as BANK_TO with synonyms=('bank of the recipient') comment='bank of the recipient',
_ORDER.K_SYMBOL as K_SYMBOL with synonyms=('characterization of the payment') comment='purpose of the payment
""POJISTNE" stands for insurance payment
"SIPO" stands for household payment
"LEASING" stands for leasing
"UVER" stands for loan payment"',
_ORDER.ORDER_ID as ORDER_ID comment='identifying the unique order'
)
with extension (CA='{"tables":[{"name":"CARD","dimensions":[{"name":"CARD_ID"},{"name":"DISP_ID"},{"name":"ISSUED"},{"name":"TYPE"}]},{"name":"CLIENT","dimensions":[{"name":"BIRTH_DATE"},{"name":"CLIENT_ID"},{"name":"DISTRICT_ID"},{"name":"GENDER"}]},{"name":"DISP","dimensions":[{"name":"ACCOUNT_ID"},{"name":"CLIENT_ID"},{"name":"DISP_ID"},{"name":"TYPE"}]},{"name":"DISTRICT","dimensions":[{"name":"A11"},{"name":"A14"},{"name":"A16"},{"name":"A2"},{"name":"A3"},{"name":"A4"},{"name":"A5"},{"name":"A6"},{"name":"A7"},{"name":"A8"},{"name":"A9"},{"name":"DISTRICT_ID"}],"facts":[{"name":"A10"},{"name":"A12"},{"name":"A13"},{"name":"A15"}]},{"name":"LOAN","dimensions":[{"name":"ACCOUNT_ID"},{"name":"AMOUNT"},{"name":"DATE"},{"name":"DURATION"},{"name":"LOAN_ID"},{"name":"STATUS"}],"facts":[{"name":"PAYMENTS"}]},{"name":"TRANS","dimensions":[{"name":"_ACCOUNT"},{"name":"ACCOUNT_ID"},{"name":"BALANCE"},{"name":"BANK"},{"name":"DATE"},{"name":"K_SYMBOL"},{"name":"OPERATION"},{"name":"TRANS_ID"},{"name":"TYPE"}],"facts":[{"name":"AMOUNT"}]},{"name":"_ACCOUNT","dimensions":[{"name":"ACCOUNT_ID"},{"name":"DATE"},{"name":"DISTRICT_ID"},{"name":"FREQUENCY"}]},{"name":"_ORDER","dimensions":[{"name":"ACCOUNT_ID"},{"name":"ACCOUNT_TO"},{"name":"BANK_TO"},{"name":"K_SYMBOL"},{"name":"ORDER_ID"}],"facts":[{"name":"AMOUNT"}]}],"relationships":[{"name":"R2"},{"name":"R3"},{"name":"R4"},{"name":"R5"},{"name":"R6"},{"name":"R8"},{"name":"R1"},{"name":"R7"}]}');
2-1. (option) REST API 발급받기 전에 NETWORK POLICY 임시 설정해줘야 401 에러 안남
- US region 을 사용하면 디폴트로 NETWORK POLICY에 의해 허용된 IP 제외하고 모든 접근이 막혀있음. DEFAULT 설정 삭제가 불가능해서 임시로 사용하는 IP 설정해준다음 다 사용하면 해지해주어야함
(해지 깜박하면 설정해준 IP 를 제외한 다른 아이피로 로그인도 안됨)
-- 현재 사용하는 IP 허용
CREATE NETWORK POLICY TEMP_POLICY
ALLOWED_IP_LIST = ('147.6.93.99');
ALTER USER YOONSEO SET NETWORK_POLICY = TEMP_POLICY;
-- 위에서 해준 셋팅 해제
SHOW PARAMETERS LIKE 'NETWORK_POLICY' IN ACCOUNT;
ALTER USER YOONSEO UNSET NETWORK_POLICY;
DROP NETWORK POLICY TEMP_POLICY;
3.Cortex Analyst Rest API 가져와서 SQL 결과 나오는지 확인
- 토큰 발급받는법 : 프로필 클릭 - [setting] - [Authentication] - [Programmatic access tokens] - [Generate New Token] 후 복사
- 내account 확인 : 프로필 클릭 - [account] - [view account detail] - Account identifier 복사
Cortex Analyst Rest API 확인용
더보기
import requests
URL = "https://[내account].snowflakecomputing.com/api/v2/cortex/analyst/message"
TOKEN = "[발급받은내토큰]"
payload = {
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": (
"-- External Knowledge:\n"
"-- A3 contains the data of region\n"
"-- 'POPLATEK PO OBRATU' represents 'issuance after transaction'\n\n"
"-- Using valid SQLite and understading External Knowledge, answer the following questions for the tables provided above."
"-- Question:\n"
"-- How many accounts that choose issuance after transaction are staying in the East Bohemia region?"
)
}
]
}
],
"semantic_view": "SAMPLE_DATA.PUBLIC.E260128"
}
headers = {
"Authorization": f"Bearer {TOKEN}",
"Content-Type": "application/json"
}
res = requests.post(URL, headers=headers, json=payload)
print("status:", res.status_code)
print("text:", res.text)
- API 양식 참조 https://docs.snowflake.com/en/user-guide/snowflake-cortex/cortex-analyst/rest-api
- EX) REST API 예제
{
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "which company had the most revenue?"
}
]
}
],
"semantic_view": "MY_DB.MY_SCH.MY_SEMANTIC_VIEW"
}
4. Cortex Anaylst로 답변(predict_dev.json) 생성 :
4.1 파일 구성
- 필요한 데이터는 아래 3개로 dev.json / dev_gold.sql / predict_dev.json 이다.
bird/llm
├── src/
│ └── evaluation.py
├── data/
│ ├── financila_only.json (문제 JSON 파일,json 파일이기만 하면 됨)
│ ├── dev_gold.sql (정답만 모아져있는 SQL 파일)
├── exp_result/turbo_output/
└── predict_dev.json (Cortex Analyst로 얻은답변 SQL 파일)
- dev.json 파일에서 Financial 데이터만 필터링 (106개남음)
import json
with open("./dev_20240627/dev.json", "r", encoding="utf-8") as f:
data = json.load(f) # ← 여기서 이제 list of dict
filtered = [item for item in data if item.get("db_id") == "financial"]
with open("financial_only.json", "w", encoding="utf-8") as f:
json.dump(filtered, f, ensure_ascii=False, indent=2)
- snowflake 에 날려서 받은 답변 SQL은 predict_dev.json로 저장
Financial 만 sql 결과 json으로 저장
더보기
import requests
import json
import os
# =========================
# CONFIG
# =========================
URL = "https://LXMVPPL-KB09931.snowflakecomputing.com/api/v2/cortex/analyst/message"
TOKEN = "eyJraWQiOiIyOTc3ODcyNjI2MjM4MzgiLCJhbGciOiJFUzI1NiJ9.eyJwIjoiMTc3NDk1MDg6MTc3NDk2MzYiLCJpc3MiOiJTRjoyMDEyIiwiZXhwIjoxODAxMTUzNTAzfQ.4youa0W2rDUOF8ZrGp-qnrHV2481MO2llsTNwYj4WtiUTlEnF9s9wO_chSCuuEU85YRyOHti38G7FLeiSfYuAg"
SEMANTIC_VIEW = "SAMPLE_DATA.PUBLIC.E260128"
INPUT_FILE = "financial_only.json"
OUTPUT_FILE = "/Users/yoonseo/Downloads/output.json"
DB_NAME = "financial"
DELIMITER = "\t----- bird -----\t"
output_json = {}
headers = {
"Authorization": f"Bearer {TOKEN}",
"Content-Type": "application/json"
}
with open(INPUT_FILE, "r", encoding="utf-8") as f:
input_data = json.load(f)
for item in input_data:
qid = str(item["question_id"])
question = item["question"]
evidence = item["evidence"]
payload = {
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": (
"-- External Knowledge:\n"
f"-- {evidence}\n\n"
"-- Using valid Table and understanding External Knowledge,The SQL query will be executed locally using SQLite (not Snowflake); use only SQLite-compatible syntax and reference tables directly (e.g., FROM A).\n"
f"-- {question}"
)
}
]
}
],
"semantic_view": SEMANTIC_VIEW
}
# API CALL
res = requests.post(URL, headers=headers, json=payload)
sql_stmt = ""
if res.headers.get("Content-Type", "").startswith("application/json"):
data = res.json()
for c in data.get("message", {}).get("content", []):
if c.get("type") == "sql":
sql_stmt = c.get("statement", "").strip()
break
sql_stmt = f"{sql_stmt}{DELIMITER}{DB_NAME}"
output_json[qid] = sql_stmt
with open(OUTPUT_FILE, "w", encoding="utf-8") as f:
json.dump(output_json, f, indent=2, ensure_ascii=False)
print("✅ output.json 저장 완료")
print("📄 경로:", os.path.abspath(OUTPUT_FILE))
- financial 에 해당하는 답변만 필터링해서 dev_gold.sql 필터링해서 저장하기 (4.2 참조)
4.2 프롬프트 구성
- 원래 Bird Bench 에서 물어보는 프롬프트구성
- bird/llm/src/gpt_request.py 참조
def generate_comment_prompt(question, knowledge=None):
pattern_prompt_no_kg = "-- Using valid SQLite, answer the following questions for the tables provided above."
pattern_prompt_kg = "-- Using valid SQLite and understading External Knowledge, answer the following questions for the tables provided above."
# question_prompt = "-- {}".format(question) + '\n SELECT '
question_prompt = "-- {}".format(question)
knowledge_prompt = "-- External Knowledge: {}".format(knowledge)
if not knowledge_prompt:
result_prompt = pattern_prompt_no_kg + '\n' + question_prompt
else:
result_prompt = knowledge_prompt + '\n' + pattern_prompt_kg + '\n' + question_prompt
return result_prompt
| -- External Knowledge: {evidence} -- Using valid SQLite and understading External Knowledge, answer the following questions for the tables provided above. -- {question} |
5.Evaluation
import subprocess
cmd = [
"python3", "-u", "./DAMO-ConvAI-main/bird/llm/src/evaluation.py",
"--db_root_path", "./dev_20240627/dev_databases/financial", # 데이터 파일경로
"--predicted_sql_path", "./DAMO-ConvAI-main/bird/llm/exp_result/turbo_output/",
"--data_mode", "dev",
"--ground_truth_path", "./DAMO-ConvAI-main/bird/llm/data/",
"--num_cpus", "16",
"--mode_gt", "gt",
"--mode_predict", "gpt",
"--diff_json_path", "./financial_only.json", # financial만 뽑아낸 문제경로
"--meta_time_out", "30.0"
]
subprocess.run(cmd)
start calculate
simple moderate challenging total
count 62 37 7 106
====================================== ACCURACY =====================================
accuracy 0.00 0.00 0.00 0.00
===========================================================================================
Finished evaluation
5.1 프롬프트를 여러번 수정해도 SNOWFLAKE SEMENTIC VIEW 이름을 포함해서 결과를 뱉어내기 때문에, 벤치결과0.0으로 나옴 → 수기비교
5.1.1 프롬프트 수정 방식
- SQLite 아니고 Parquet 변환되어 테이블형태로 올라가있기 때문에 질문 수정 SQLite → Table
- snowflake 기준으로 sql문을 생성하면 bird bench 실행시 못읽어와서 로컬 테이블 기준으로 읽어내라라는 프롬프트 추가해서 실행
-- IMPORTANT RULES:
-- The SQL query must be written for local execution, NOT for Snowflake.
-- Use local table references only (e.g., FROM DISTRICT, NOT FROM SAMPLE_DATA.PUBLIC.DISTRICT).
-- Table names must NOT contain underscores (_).
| -- External Knowledge: {evidence} -- Using valid Table and understading External Knowledge, answer the following questions for the tables provided above. -- The SQL query will be executed locally using SQLite (not Snowflake); use only SQLite-compatible syntax and reference tables directly (e.g., FROM A). -- When generating the SQL query, table names after FROM must not contain underscores (_). -- {question} |
5.1.1 원인파악 및 수기확인
- 90번 문제 정답지
SELECT COUNT(T1.account_id)
FROM account AS T1
INNER JOIN loan AS T2 ON T1.account_id = T2.account_id
INNER JOIN district AS T3 ON T1.district_id = T3.district_id
WHERE T3.A3 = 'Prague'",
- 90번 문제 실제 SQL 생성결과
WITH unemployment_rates AS (
SELECT
AVG(a12) AS avg_unemployment_1995,
AVG(a13) AS avg_unemployment_1996
FROM SEMANTIC_VIEW( -- 이부분이 SNOWFLAKE 기준으로 작성됨
SAMPLE_DATA.PUBLIC.E260128
FACTS district.a12, district.a13
)
)
SELECT
avg_unemployment_1995,
avg_unemployment_1996,
CASE
WHEN avg_unemployment_1995 > avg_unemployment_1996 THEN '1995 has higher unemployment rate'
WHEN avg_unemployment_1996 > avg_unemployment_1995 THEN '1996 has higher unemployment rate'
ELSE 'Both years have equal unemployment rates'
END AS comparison_result
FROM unemployment_rates
- 벤치마크 비교를 위해서는 이렇게 결과가 나와야함
FROM account AS T1
- snowflake 에서 아래처럼 시멘틱뷰 이름이 꼭 들어가기 때문에 0.0점이 나오는것으로 파악
FROM SEMANTIC_VIEW(
SAMPLE_DATA.PUBLIC.E260128 – – 시멘틱뷰 이름
FACTS district.a12, district.a13)
5.2 수기로 결과비교 확인 정답률 (Finance에 해당하는 106개 질의)
benchmark용 답지 json 파일 생성
- 답변여부
- 답변X : 38/106 (SQL생성X)
- 답변0 : 68/106
- 정답 : 37/68
- 오답 : 31/68
- 정답률 37/106
- simple/moderate/challenging 난이도별 정답률
- simple 29/62
- moderate 8/37
- challenging 0/7
5.3 사용한 모델
- 모델은 랜덤배정
- 결과에서는 claude-4-sonnet 모델을 사용함
반응형
'<개념> > 기타' 카테고리의 다른 글
| SNOWFLAKE 기본사용법 (0) | 2026.04.07 |
|---|---|
| BIRD 벤치마크 Databricks 시도 (0) | 2026.04.07 |
| 벡터 유사도 (vector similiarity) (0) | 2022.06.05 |
| FPGA, NPU 초간단 개념 (0) | 2022.02.28 |
| 푸리에 변환 초간단 정리 (0) | 2022.01.30 |