Python?調(diào)用GPT-3?API實現(xiàn)過程詳解

更新時間：2023年02月16日 15:14:43 作者：老齊Py

這篇文章主要為大家介紹了Python?調(diào)用GPT-3?API實現(xiàn)過程詳解，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進(jìn)步，早日升職加薪

用 Python 調(diào)用 GPT-3 API

GPT-3 是去年由 Open AI 推出的語言機器學(xué)習(xí)模型。它因其能夠?qū)懽?、寫歌、寫詩，甚至寫代碼而獲得了廣泛的媒體關(guān)注！該工具免費使用，只需要注冊一個電子郵件即可。

GPT-3 是一種叫 transformer 的機器學(xué)習(xí)模型。具體來說，它就是 Generative Pre-training Transformer，因此叫做“GPT”。Transformer 架構(gòu)使用自我注意和強化學(xué)習(xí)來模擬會話文本。通常，它一次處理一個單詞，并使用前面的單詞預(yù)測序列中的下一個單詞。

GPT-3 具有廣泛的應(yīng)用場景，涵蓋科學(xué)、藝術(shù)和技術(shù)等所有領(lǐng)域。它可以用來回答有關(guān)科學(xué)和數(shù)學(xué)的基本問題。甚至可以準(zhǔn)確回答研究生級別的數(shù)學(xué)和科學(xué)概念相關(guān)的問題。更令人驚訝的是，我詢問了一些與我的物理化學(xué)博士研究有關(guān)的問題，它能夠提供較好的解釋。不過，它也有其局限性。當(dāng)我詢問 GPT-3 有關(guān)物理化學(xué)中更新奇的研究方法時，它無法提供明確的答案。因此，在作為教育和研究的搜索引擎使用時，應(yīng)該謹(jǐn)慎使用 GPT-3。GPT-3 沒有事實核查功能。隨著事實核查功能的提高，我可以想象 GPT-3 在研究生階段甚至在研究領(lǐng)域?qū)⒎浅Ｓ杏谩?/p>

此外，除了我個人的經(jīng)驗外，我還看到了其他很多很酷的工具應(yīng)用。例如，一個開發(fā)人員使用 GPT-3 來編排完成復(fù)雜任務(wù)的云服務(wù)。其他用戶使用 GPT-3 生成了工作的 python 和 SQL 腳本，以及其他語言的程序。在藝術(shù)領(lǐng)域，用戶請 GPT-3 寫一篇比較現(xiàn)代和當(dāng)代藝術(shù)的文章。GPT-3 的潛在應(yīng)用幾乎在任何領(lǐng)域都是豐富的。

GPT-3 在回答有準(zhǔn)確內(nèi)容的基本問題方面表現(xiàn)得很好。例如，它可以對光合作用做出相當(dāng)不錯的解釋。它不能很好地回答關(guān)于光合作用的前沿研究問題，例如，它不能描述光合作用的機理和涉及的量子概念。它可以給出體面的回應(yīng)，但不太可能提供大多數(shù)研究問題的技術(shù)細(xì)節(jié)。同樣，GPT-3 可以編寫一些簡單的工作代碼，但是隨著任務(wù)的復(fù)雜度增加，生成的代碼就越容易出錯。它也不能生成政治觀點、倫理價值觀、投資建議、準(zhǔn)確的新聞報道等通常是由人類生成的內(nèi)容。

盡管 GPT-3 有其局限性，但其廣泛適用性令人印象深刻。我認(rèn)為提出一些有趣的數(shù)據(jù)科學(xué)和機器學(xué)習(xí)提示，以看看它們是否可以補充數(shù)據(jù)科學(xué)工作流程的部分是有趣的。

首先，我們將根據(jù)一些簡單的提示生成一些與數(shù)據(jù)科學(xué)有關(guān)的文本。一旦我們對該工具有了一些了解，就可以詢問一些可以幫助解決數(shù)據(jù)科學(xué)任務(wù)的問題。有幾個有趣的數(shù)據(jù)科學(xué)和機器學(xué)習(xí)問題，我們可以向 GPT-3 詢問。例如，是否可以使用 GPT-3 源自公開可用的數(shù)據(jù)集？GPT-3 的訓(xùn)練數(shù)據(jù)有多少等。另一個有趣的應(yīng)用是問題框架。 GPT-3 可以幫助用戶構(gòu)建良好的機器學(xué)習(xí)研究問題嗎？雖然它難以給出具體的技術(shù)答案，但也許它可以很好地構(gòu)建出未解決的研究問題。

另一個很酷的應(yīng)用是使用 GPT-3 來決定用于特定應(yīng)用程序的 ML 模型。這很好，因為對于在線文獻(xiàn)豐富的經(jīng)過驗證的技術(shù)，它應(yīng)該能夠很好地幫助用戶選擇模型，并解釋為什么選定的模型最適合。最后，我們可以嘗試使用GPT-3 編寫一些數(shù)據(jù)科學(xué)任務(wù)的 Python 代碼。例如，我們將看看是否可以使用它來編寫生成特定用例的合成數(shù)據(jù)的代碼。

注意：GPT-3 API 的結(jié)果是不確定的。因此，您獲得的結(jié)果可能與此處顯示的輸出略有不同。此外，由于 GPT-3 沒有事實核查機制，建議您對計劃用于工作，學(xué)?；騻€人項目的任何事實結(jié)果進(jìn)行雙重核查。

在這項工作中，我將在 Deepnote 中編寫代碼，它是一個協(xié)作數(shù)據(jù)科學(xué)筆記本，使得運行可再現(xiàn)實驗非常簡單。

安裝 GPT-3

首先，讓我們到 Deepnote 并創(chuàng)建一個新項目（如果您還沒有賬戶，可以免費注冊）。

創(chuàng)建一個名為“GPT3”的項目以及該項目中的一個名為“GPT3_ds”的 notebook。

接下來，我們在第一個單元中使用 pip 安裝 OpenAI：

%pip install openai
%pip install catboost

將密鑰保存在 openAI 對象的 api_key 屬性：

import openai
openai.api_key = "your-key"

接下來就可以提問了，比如問“什么是 Pandas 庫”，GP3 會給反饋：

completion = openai.Completion.create(engine="text-davinci-003", prompt="What is the pandas library?", max_tokens=1000)
print(completion.choices[0]['text'])
# output
Pandas is an open source software library written in Python for data manipulation and analysis. Pandas is widely used in data science, machine learning and many other fields. It provides high-level data structures and tools for handling and manipulating data, including data frames, series, plotting tools and more.

我們甚至可以詢問更具體的問題，例如“Pandas 的一些常見用途是什么？”。它給出了合理的答案，列出了數(shù)據(jù)整理、數(shù)據(jù)可視化、數(shù)據(jù)聚合和時間序列分析：

completion = openai.Completion.create(engine="text-davinci-003", prompt="what are some common Pandas use cases?", max_tokens=240)
print(completion.choices[0]['text'])
# output
1. Data Cleaning and Transformation
2. Data Analysis and Exploration
3. Time Series Analysis
4. Data Visualization
5. Statistical Modeling
6. Predictive Modeling
7. Machine Learning
8. Web Scraping

詢問“最常見的深度學(xué)習(xí)庫是什么？”：

#what are the most common deep learning libraries?
completion = openai.Completion.create(engine="text-davinci-003", prompt="what are the most common deep learning libraries?", max_tokens=240)
print(completion.choices[0]['text'])
# output
. TensorFlow 
2. PyTorch 
3. Keras 
4. Caffe 
5. CNTK 
6. MXNet 
7. Theano 
8. Deeplearning4j 
9. Gensim 
10. LUNA

甚至可以詢問“什么是深度神經(jīng)網(wǎng)絡(luò)？”：

completion = openai.Completion.create(engine="text-davinci-003", prompt="What is a deep neural network?", max_tokens=1000)
print(completion.choices[0]['text'])
# output
A deep neural network (DNN) is a type of artificial neural network (ANN) with multiple layers of neurons between the input and output layers. DNNs are designed to learn complex non-linear relationships from data, and have been successfully applied in a wide range of areas such as image recognition, natural language processing, and financial forecasting.

使用 GPT-3 進(jìn)行事實搜索應(yīng)該謹(jǐn)慎進(jìn)行，因為沒有任何事實核查機制。在大多數(shù)情況下，如果您想獲得關(guān)于數(shù)據(jù)科學(xué)和機器學(xué)習(xí)的眾多在線博客和論壇中已知概念的表面理解，GPT-3 應(yīng)該能夠很好地工作。

特征工程

GPT-3 的另一個有趣的應(yīng)用案例是用作指導(dǎo)數(shù)據(jù)科學(xué)工作的輔助工具。例如，GPT-3 能否給出一些改進(jìn)模型性能的特征變換的想法？讓我們詢問 GPT-3：

completion = openai.Completion.create(engine="text-davinci-003", prompt="give some ideas on feature transformations that can improve model performance", max_tokens=1000)
print(completion.choices[0]['text'])
# output
1. Standardization/Normalization: A common feature transform used to ensure features are on the same scale, standardizing or normalizing variables can help limit the severity of outliers and improve the overall model performance.
2. Feature Binning: Binning is a process of transforming numerical variables into categorical ones. This can be useful when working with variables that have too many levels and can have a significant effect on the model performance.
3. Polynomial Expansion: When a nonlinear relationship is expected between features and the output variable, a polynomial expansion feature transformation can help improve model performance.
4. Feature Selection: Removing redundant or irrelevant features from the dataset can help improve the model performance as these features may lead to overfitting.
5. Ensemble: Combining different types of models (or different versions of the same model) can often improve performance due to their combined capabilities.

我們看到它給出了一些很好的特征變換建議以及每個變換的解釋。

讓我們看看是否可以更進(jìn)一步。讓它寫一些 Python 代碼：

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that performs data standardization", max_tokens=1000)
print(completion.choices[0]['text'])
#output
# Import the necessary libraries
import numpy as np
# Define the data 
data = np.array([[-3, 9, 0, 8],
                 [ 4, 6, 5, 12],
                 [20, 2, 3, 15]])
# Calculate mean and standard deviation
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)
# Perform data standardization
standardized_data = (data - mean) / std
# Print the results
print(standardized_data)

復(fù)制并粘貼到一個新單元格中并運行它：

# Import the necessary libraries
import numpy as np
# Define the data 
data = np.array([[-3, 9, 0, 8],
                 [ 4, 6, 5, 12],
                 [20, 2, 3, 15]])
# Calculate mean and standard deviation
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)
# Perform data standardization
standardized_data = (data - mean) / std
# Print the results
print(standardized_data)
# output
[[-1.03881504  1.16247639 -1.29777137 -1.27872403]
 [-0.31164451  0.11624764  1.13554995  0.11624764]
 [ 1.35045955 -1.27872403  0.16222142  1.16247639]]

接下來，讓我們對特征標(biāo)準(zhǔn)化變換做同樣的操作：

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that performs data normalization on fake data", max_tokens=1000)
print(completion.choices[0]['text'])
# output
# Normalizing data will rescale features in the range [0,1]
data = [3, 7, 10, 13] # Sample data
# Calculate the maximum and minimum of the data
max_data = max(data)
min_data = min(data)
# Normalize the data
normalized_data = [(x-min_data)/(max_data-min_data) for x in data]
# Print first value to check 
print(normalized_data[0]) # Prints 0.2

執(zhí)行返回的代碼：

# Normalizing data will rescale features in the range [0,1]
data = [3, 7, 10, 13] # Sample data
# Calculate the maximum and minimum of the data
max_data = max(data)
min_data = min(data)
# Normalize the data
normalized_data = [(x-min_data)/(max_data-min_data) for x in data]
# Print first value to check 
print(normalized_data)

模型選擇

另一個酷的應(yīng)用是模型選擇。讓我們看看 GPT-3 是否可以指導(dǎo)我們選擇機器學(xué)習(xí)模型。詢問“如何選擇時間序列模型？”

completion = openai.Completion.create(engine="text-davinci-003", prompt="How do i select a time series model?", max_tokens=1000)
print(completion.choices[0]['text'])
# output
The selection of a time series model depends on the type of data that is being analyzed. If there are long-term trends or cycles in the data, then an autoregressive integrated moving average (ARIMA) model or a exponential smoothing (ETS) model may be the best option. If the data are seasonal or there are outliers present, then a Holt-Winters model may be a better choice. If the data do not contain any long-term trends or seasonality, a Box-Jenkins model may be suitable. Regardless of the model selected, it is important to consider the type of data that is being modeled when selecting a time series model.

答案是合理的，列出了平穩(wěn)性、季節(jié)性、趨勢以及模型性能評估的檢查。讓我們詢問更具體的問題，例如“何時使用 ARIMA vs Prophet？”：

completion = openai.Completion.create(engine="text-davinci-003", prompt="When to use ARIMA vs FB prophet?", max_tokens=1000)
print(completion.choices[0]['text'])
# output
ARIMA and FB Prophet are both used for time series forecasting. ARIMA is a linear model and is better for predicting long-term trends with stationary data. FB Prophet is a non-linear model and is better for predicting short-term trends with non-stationary data. So, you would use ARIMA when you are mainly concerned with predicting long-term trends, and you would use FB Prophet when you are mainly concerned with predicting short-term trends.

生成合成數(shù)據(jù)

能用 GPT-3 生成合成數(shù)據(jù)嗎？讓我們看一看！我們將生成一個眾所周知的數(shù)據(jù)：醫(yī)院再入院和交易數(shù)據(jù)。

醫(yī)院再入院數(shù)據(jù)

詢問 GPT-3：“編寫 python 代碼，生成合成醫(yī)院再入院數(shù)據(jù)”。

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that generates synthetic healthcare readmission data stored in a dataframe", max_tokens=1000)
print(completion.choices[0]['text'])
# output
import pandas as pd
import numpy as np
# Create Dataframe 
df = pd.DataFrame(columns=['Patient_ID', 'Age', 'Admission_Type', 'Readmitted'])
# Generate Data
np.random.seed(0)
for i in range(10):
  admission_type = np.random.choice(['Urgent', 'Scheduled', 'Emergency'])
  patient_age = np.random.randint(18, 80)
  readmission = np.random.choice([0, 1])
  df.loc[i] = [i+1, patient_age, admission_type, readmission]
# Print Dataframe to Console
print(df)

執(zhí)行此代碼：

import pandas as pd
import numpy as np
# Create Dataframe 
df = pd.DataFrame(columns=['Patient_ID', 'Age', 'Admission_Type', 'Readmitted'])
# Generate Data
np.random.seed(0)
for i in range(10):
  admission_type = np.random.choice(['Urgent', 'Scheduled', 'Emergency'])
  patient_age = np.random.randint(18, 80)
  readmission = np.random.choice([0, 1])
  df.loc[i] = [i+1, patient_age, admission_type, readmission]
# Print Dataframe to Console
df

輸出結(jié)果：

讓我們看看是否可以用這個合成數(shù)據(jù)構(gòu)建一個分類模型，預(yù)測重新入院的人，并評估性能。

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that generates synthetic healthcare readmission data stored in a dataframe. From this write code that builds a catboost model that predicts readmission outcomes. Also write code to calculate and print performance", max_tokens=3000)
print(completion.choices[0]['text'])
# output
 metrics
## Generate Synthetic Healthcare Readmission Data
import pandas as pd 
import numpy as np 
# set the seed for reproducibility 
np.random.seed(1)
# create dataframe 
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 10)), columns=['age','gender','length_of_stay','diagnosis','NIV','laboratory','past_hospitalizations','medications','bmi','readmission'])
# add labels to data frame 
df['age'] = np.random.randint(20, 80, size=(100))
df['gender'] = np.random.randint(1, 2, size=(100))
df['length_of_stay'] = np.random.randint(2, 14, size=(100))
df['diagnosis'] = np.random.randint(1, 5, size=(100))
df['NIV'] = np.random.randint(0, 2, size=(100))
df['laboratory'] = np.random.randint(1, 6, size=(100))
df['past_hospitalizations'] = np.random.randint(0, 10, size=(100))
df['medications'] = np.random.randint(1, 6, size=(100))
df['bmi'] = np.random.randint(18, 35, size=(100))
df['readmission'] = np.random.randint(0, 2, size=(100))
# print the dataframe 
print(df)
## Build a CatBoost Model
from catboost import CatBoostClassifier
from sklearn.metrics import confusion_matrix
# separate X and y
X = df.iloc[:, 0:9]
y = df.iloc[:, 9]
# initialize catboost classifier 
cat_clf = CatBoostClassifier(iterations=50,
                            learning_rate=0.3,
                            depth=8,
                            eval_metric='Accuracy',
                            random_seed=42)
# fit the model 
cat_clf.fit(X, y)
# predict values
y_pred = cat_clf.predict(X)
# print confusion matrix
conf_mat = confusion_matrix(y, y_pred)
print(conf_mat)
## Calculate and Print Performance Metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score
# calculate performance metrics
acc = accuracy_score(y, y_pred)
precision = precision_score(y, y_pred)
recall = recall_score(y, y_pred)
# print performance metrics 
print("Accuracy: {}".format(acc))
print("Precision: {}".format(precision))
print("Recall: {}".format(recall))

復(fù)制上面的代碼，并執(zhí)行：

## Generate Synthetic Healthcare Readmission Data
import pandas as pd 
import numpy as np 
# set the seed for reproducibility 
np.random.seed(1)
# create dataframe 
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 10)), columns=['age','gender','length_of_stay','diagnosis','NIV','laboratory','past_hospitalizations','medications','bmi','readmission'])
# add labels to data frame 
df['age'] = np.random.randint(20, 80, size=(100))
df['gender'] = np.random.randint(1, 2, size=(100))
df['length_of_stay'] = np.random.randint(2, 14, size=(100))
df['diagnosis'] = np.random.randint(1, 5, size=(100))
df['NIV'] = np.random.randint(0, 2, size=(100))
df['laboratory'] = np.random.randint(1, 6, size=(100))
df['past_hospitalizations'] = np.random.randint(0, 10, size=(100))
df['medications'] = np.random.randint(1, 6, size=(100))
df['bmi'] = np.random.randint(18, 35, size=(100))
df['readmission'] = np.random.randint(0, 2, size=(100))
# print the dataframe 
print(df)
## Build a CatBoost Model
from catboost import CatBoostClassifier
from sklearn.metrics import confusion_matrix
# separate X and y
X = df.iloc[:, 0:9]
y = df.iloc[:, 9]
# initialize catboost classifier 
cat_clf = CatBoostClassifier(iterations=50,
                            learning_rate=0.3,
                            depth=8,
                            eval_metric='Accuracy',
                            random_seed=42)
# fit the model 
cat_clf.fit(X, y)
# predict values
y_pred = cat_clf.predict(X)
# print confusion matrix
conf_mat = confusion_matrix(y, y_pred)
print(conf_mat)
## Calculate and Print Performance Metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score
# calculate performance metrics
acc = accuracy_score(y, y_pred)
precision = precision_score(y, y_pred)
recall = recall_score(y, y_pred)
# print performance metrics 
print("Accuracy: {}".format(acc))
print("Precision: {}".format(precision))
print("Recall: {}".format(recall))
# output
略

交易數(shù)據(jù)

詢問 GPT-3：“編寫 Python 代碼，生成交易數(shù)據(jù)”。

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that generates synthetic transaction data stored in a dataframe", max_tokens=1000)
print(completion.choices[0]['text'])
# output
import pandas as pd 
import numpy as np 
#create randomly generated customer data
customer_id = np.arange(1,101) 
customer_names = [f'John Doe {x}' for x in range(1,101)] 
#create randomly generated transaction data
transaction_id = np.arange(1,101)
dates = [f'2020-07-{x}' for x in range(1,101)]
amounts = np.random.randint(low=1, high=1000, size=(100,)) 
#create dataframe with randomly generated data
transaction_data = pd.DataFrame({'Customer ID': customer_id, 
                            'Customer Name': customer_names,
                            'Transaction ID': transaction_id, 
                            'Date': dates, 
                            'Amount': amounts})
print(transaction_data)

拷貝代碼，并執(zhí)行：

import pandas as pd 
import numpy as np 
#create randomly generated customer data
customer_id = np.arange(1,101) 
customer_names = [f'John Doe {x}' for x in range(1,101)] 
#create randomly generated transaction data
transaction_id = np.arange(1,101)
dates = [f'2020-07-{x}' for x in range(1,101)]
amounts = np.random.randint(low=1, high=1000, size=(100,)) 
#create dataframe with randomly generated data
transaction_data = pd.DataFrame({'Customer ID': customer_id, 
                            'Customer Name': customer_names,
                            'Transaction ID': transaction_id, 
                            'Date': dates, 
                            'Amount': amounts})
transaction_data

（部分輸出結(jié)果）

現(xiàn)在有物品 ID、客戶和購買金額。讓我們看看是否可以更具體一點。再增加年齡、性別和郵政編碼。

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that generates synthetic transaction data stored in a dataframe. Include customer ID, transaction amount, item ID, item name, age, gender, and zipcode", max_tokens=2000)
print(completion.choices[0]['text'])
# output
import pandas as pd
import numpy as np
rows = ['customer_ID', 'transaction_amnt', 'item_ID', 'item_name', 'age', 'gender', 'zipcode']
data = pd.DataFrame(columns=rows)  
for i in range(1,100):
        customer_ID = int( np.random.uniform(100,600-100)) 
        transaction_amnt = np.random.uniform(1.25, 10.00)
        item_ID = int( np.random.uniform(1,35))
        item_name = np.random.choice(["phone", "tablet", "laptop", "smartwatch"])
        age = int( np.random.uniform(17,75)) 
        gender = np.random.choice(["male", "female"]) 
        zipcode = np.random.choice(["98101", "98200", "98469", "98801"])
        data.loc[i] = [customer_ID, transaction_amnt, item_ID, item_name, age, gender, zipcode]
print (data)

執(zhí)行代碼：

import pandas as pd
import numpy as np
rows = ['customer_ID', 'transaction_amnt', 'item_ID', 'item_name', 'age', 'gender', 'zipcode']
data = pd.DataFrame(columns=rows)  
for i in range(1,100):
        customer_ID = int( np.random.uniform(100,600-100)) 
        transaction_amnt = np.random.uniform(1.25, 10.00)
        item_ID = int( np.random.uniform(1,35))
        item_name = np.random.choice(["phone", "tablet", "laptop", "smartwatch"])
        age = int( np.random.uniform(17,75)) 
        gender = np.random.choice(["male", "female"]) 
        zipcode = np.random.choice(["98101", "98200", "98469", "98801"])
        data.loc[i] = [customer_ID, transaction_amnt, item_ID, item_name, age, gender, zipcode]
data

（部分輸出結(jié)果）

公共數(shù)據(jù)集的詢問提示

另一種應(yīng)用是詢問 GPT-3 關(guān)于公共數(shù)據(jù)集。讓我們詢問 GPT-3 列出一些公共數(shù)據(jù)集：

completion = openai.Completion.create(engine="text-davinci-003", prompt=" list some good public datasets", max_tokens=1000)
print(completion.choices[0]['text'])
# output
1. US Census Data
2. Enron Email Dataset
3. Global Open Data Index
4. Air Quality Monitoring Data
5. New York City Taxi Trip Data
6. IMF Data
7. World Bank Open Data
8. Google Books Ngrams Dataset
9. Amazon Reviews Dataset
10. UCI Machine Learning Repository

讓我們看看是否可以找到根據(jù) Apache 2.0 許可的公共數(shù)據(jù)。還詢問源鏈接：

completion = openai.Completion.create(engine="text-davinci-003", prompt=" list some good public datasets under apache 2.0 license. provide links to their source", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. OpenStreetMap: https://www.openstreetmap.org/
2. US Census Data: https://www.census.gov/data.html
3. Google Books Ngrams: https://aws.amazon.com/datasets/google-books-ngrams/
4. Wikipedia: https://dumps.wikimedia.org/enwiki/
5. US Government Spending Data: https://www.usaspending.gov/
6. World Bank Open Data: https://data.worldbank.org/
7. Common Crawl: http://commoncrawl.org/
8. Open Images: https://storage.googleapis.com/openimages/web/index.html
9. OpenFlights: https://openflights.org/data.html
10. GDELT: http://data.gdeltproject.org/

雖然并不是所有這些鏈接都是正確的，但它在尋找源鏈接方面做得相當(dāng)不錯。Google Ngrams、Common Crawl和 NASA 數(shù)據(jù)都相當(dāng)出色。如果不提供數(shù)據(jù)的確切位置，在大多數(shù)情況下，它提供了一個可以找到數(shù)據(jù)的網(wǎng)頁鏈接。

再請求對數(shù)據(jù)進(jìn)行描述。請注意，雖然結(jié)果可能重疊，但它們在每次運行時略有不同。據(jù)我所知，結(jié)果并不總是可以相同的：

completion = openai.Completion.create(engine="text-davinci-003", prompt=" list some good public datasets under apache 2.0 license. provide links to their source and descriptions", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. OpenStreetMap: OpenStreetMap is a free, editable map of the world, created and maintained by volunteers and available for use under an open license. It contains millions of data points, including roads, buildings, and points of interest. Source: https://www.openstreetmap.org/
2. Google Books Ngrams: Google Books Ngrams is a dataset of over 5 million books from Google Books, spanning from 1500 to 2008. It contains word counts for each year, allowing researchers to track the usage of words over time. Source: https://aws.amazon.com/datasets/google-books-ngrams/
3. Wikipedia: Wikipedia is a free, open-source encyclopedia with millions of articles in hundreds of languages. It is available for use under the Creative Commons Attribution-ShareAlike license. Source: https://www.wikipedia.org/
4. Common Crawl: Common Crawl is a large-scale web crawl that collects data from over 5 billion webpages. It is available for use under the Apache 2.0 license. Source: https://commoncrawl.org/
5. Open Images Dataset: The Open Images Dataset is a collection of 9 million images annotated with labels spanning over 6000 categories. It is available for use under the Apache 2.0 license. Source: https://storage.googleapis.com/openimages/web/index.html

機器學(xué)習(xí)問題整理

最后一個示例，將看看 GPT-3 是否可以幫助我們整理機器學(xué)習(xí)問題。

詢問用例

雖然 GPT-3 中的數(shù)據(jù)僅截至2021年，但它仍然可以幫助我們構(gòu)建與今天仍然相關(guān)的 ML 用例。讓我們詢問“社交媒體中有哪些新興的機器學(xué)習(xí)用例？”:

completion = openai.Completion.create(engine="text-davinci-003", prompt="What are some emerging machine learning use-cases in social media?", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. Automated Content Curation: Automatically curating content from social media platforms to create personalized content feeds for users.
2. Sentiment Analysis: Analyzing user sentiment from social media posts to gain insights into customer opinions and preferences.
3. Social Media Monitoring: Using machine learning algorithms to monitor social media conversations and detect potential issues or trends.
4. Social Media Advertising: Leveraging machine learning to optimize social media advertising campaigns and target the right audience.
5. Social Media Recommendations: Using machine learning to recommend content to users based on their interests and preferences.

我們也可以看到在醫(yī)療領(lǐng)域的回復(fù)：

completion = openai.Completion.create(engine="text-davinci-003", prompt="What are some emerging machine learning use-cases in healthcare?", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. Automated Diagnosis: Machine learning algorithms can be used to analyze patient data and medical images to detect and diagnose diseases.
2. Personalized Medicine: Machine learning algorithms can be used to analyze patient data and medical images to create personalized treatment plans for each patient.
3. Drug Discovery: Machine learning algorithms can be used to analyze large datasets of chemical compounds to identify potential new drugs.
4. Clinical Decision Support: Machine learning algorithms can be used to analyze patient data and medical images to provide clinicians with real-time decision support.
5. Predictive Analytics: Machine learning algorithms can be used to analyze patient data and medical images to predict future health outcomes.

詢問研究項目

現(xiàn)在讓我們看看它是否可以提出一些研究問題。讓我們詢問：“使用深度學(xué)習(xí)進(jìn)行圖像檢測有哪些好的研究項目？”

completion = openai.Completion.create(engine="text-davinci-003", prompt="What are some good research questions on using deep learning for image detection?", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. What are the most effective deep learning architectures for image detection?
2. How can deep learning be used to improve the accuracy of image detection?
3. What are the most effective methods for training deep learning models for image detection?
4. How can deep learning be used to detect objects in images with varying levels of complexity?
5. How can deep learning be used to detect objects in images with varying levels of illumination?
6. How can deep learning be used to detect objects in images with varying levels of noise?
7. How can deep learning be used to detect objects in images with varying levels of resolution?
8. How can deep learning be used to detect objects in images with varying levels of occlusion?
9. How can deep learning be used to detect objects in images with varying levels of background clutter?
10. How can deep learning be used to detect objects in images with varying levels of rotation?

再問一下 NLP 方向：

completion = openai.Completion.create(engine="text-davinci-003", prompt="What are some good research questions related to NLP transformer models?", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. How can transformer models be used to improve the accuracy of natural language processing tasks?
2. What are the most effective methods for training transformer models for natural language processing tasks?
3. How can transformer models be used to improve the efficiency of natural language processing tasks?
4. What are the most effective methods for optimizing transformer models for natural language processing tasks?
5. How can transformer models be used to improve the interpretability of natural language processing tasks?
6. What are the most effective methods for deploying transformer models for natural language processing tasks?
7. How can transformer models be used to improve the scalability of natural language processing tasks?
8. What are the most effective methods for combining transformer models with other natural language processing techniques?
9. How can transformer models be used to improve the robustness of natural language processing tasks?
10. What are the most effective methods for evaluating transformer models for natural language processing tasks?

本文所有代碼都發(fā)布在 GitHub 上。

以上就是Python 調(diào)用GPT-3 API實現(xiàn)過程詳解的詳細(xì)內(nèi)容，更多關(guān)于Python調(diào)用GPT-3 API的資料請關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: