Instructions to use cameltech/japanese-gpt-1b-PII-masking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use cameltech/japanese-gpt-1b-PII-masking with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="cameltech/japanese-gpt-1b-PII-masking")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("cameltech/japanese-gpt-1b-PII-masking")
model = AutoModelForCausalLM.from_pretrained("cameltech/japanese-gpt-1b-PII-masking")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use cameltech/japanese-gpt-1b-PII-masking with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "cameltech/japanese-gpt-1b-PII-masking"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cameltech/japanese-gpt-1b-PII-masking",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/cameltech/japanese-gpt-1b-PII-masking

SGLang

How to use cameltech/japanese-gpt-1b-PII-masking with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "cameltech/japanese-gpt-1b-PII-masking" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cameltech/japanese-gpt-1b-PII-masking",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "cameltech/japanese-gpt-1b-PII-masking" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "cameltech/japanese-gpt-1b-PII-masking",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use cameltech/japanese-gpt-1b-PII-masking with Docker Model Runner:
```
docker model run hf.co/cameltech/japanese-gpt-1b-PII-masking
```

japanese-gpt-1b-PII-masking

Model Description

japanese-gpt-1b-PII-masking は、日本語事前学習済み1B GPTモデルをベースとして、日本語の文章から個人情報をマスキングするように学習したモデルです。

個人情報は以下の対応関係でマスキングされます。

タグ	項目
<name>	氏名
<birthday>	生年月日
<phone-number>	電話番号
<mail-address>	メールアドレス
<customer-id>	会員番号・ID
<address>	住所
<post-code>	郵便番号
<company>	会社名

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

instruction = "# タスク\n入力文中の個人情報をマスキングせよ\n\n# 入力文\n"
text = """オペレーター：ありがとうございます。カスタマーサポートセンターでございます。お名前と生年月日、ご住所を市区町村まで教えていただけますか？
顧客：あ、はい。西山...すみません、西山俊之です。生年月日は、えーっと、1983年1月23日です。東京都練馬区在住です。
オペレーター：西山俊之様、1983年1月23日生まれ、東京都練馬区にお住まいですね。確認いたしました。お電話の件につきまして、さらにご本人様確認をさせていただきます。"""
input_text = instruction + text

model_name = "cameltech/japanese-gpt-1b-PII-masking"
model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

if torch.cuda.is_available():
    model = model.to("cuda")

def preprocess(text):
    return text.replace("\n", "<LB>")

def postprocess(text):
    return text.replace("<LB>", "\n")

generation_config = {
    "max_new_tokens": 256,
    "num_beams": 3,
    "num_return_sequences": 1,
    "early_stopping": True,
    "eos_token_id": tokenizer.eos_token_id,
    "pad_token_id": tokenizer.pad_token_id,
    "repetition_penalty": 3.0
}

input_text += "<SEP>"
input_text = preprocess(input_text)

with torch.no_grad():
    token_ids = tokenizer.encode(input_text, add_special_tokens=False, return_tensors="pt")

    output_ids = model.generate(
        token_ids.to(model.device),
        **generation_config
    )
output = tokenizer.decode(output_ids.tolist()[0][token_ids.size(1) :], skip_special_tokens=True)
output = postprocess(output)

print(output)
"""
オペレーター:ありがとうございます。カスタマーサポートセンターでございます。お名前と生年月日、ご住所を<address>まで教えていただけますか?
顧客:あ、はい。<name>です。生年月日は、えーっと、<birthday>です。<address>在住です。
オペレーター:<name>様、<birthday>生まれ、<address>にお住まいですね。確認いたしました。お電話の件につきまして、さらにご本人様確認をさせていただきます。
"""

Licenese

The MIT license

Downloads last month: 148

Safetensors

Model size

1B params

Tensor type

F32

cameltech
/

japanese-gpt-1b-PII-masking

japanese-gpt-1b-PII-masking

Model Description

Usage

Licenese

Space using cameltech/japanese-gpt-1b-PII-masking 1