Maintainability index
- Function of 4 other metrics:
- Halstead volume (V)
- Cyclomatic complexity (CC)
- Source lines of code (L)
- Percent of comment lines (converted to radians) (PC)
MI= max[0,100×171−5.2ln(V)−0.23CC−16.2ln(L)+50sin(√2.4PC))171]
Professor, Department of Psychology
Associate Director, Stanford Data Science
Stanford University
How many of you have used AI to help with coding?
https://arxiv.org/abs/2304.13187
https://jalammar.github.io/how-gpt3-works-visualizations-animations/
Vaswani et al., 2017
Bubeck et al., 2023
HumanEval: 164 coding problems with 8 tests each https://paperswithcode.com/sota/code-generation-on-humaneval
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import make_pipeline
# 1. Generate synthetic data
np.random.seed(42)
mean = [0, 0]
cov = [[1, 0.5], [0.5, 1]]
train_data = np.random.multivariate_normal(mean, cov, 32)
test_data = np.random.multivariate_normal(mean, cov, 32)
X_train = train_data[:, 0].reshape(-1, 1)
y_train = train_data[:, 1]
X_test = test_data[:, 0].reshape(-1, 1)
y_test = test_data[:, 1]
# 2. Fit three models
linear_model = LinearRegression().fit(X_train, y_train)
second_order_model = make_pipeline(PolynomialFeatures(2), LinearRegression()).fit(X_train, y_train)
ninth_order_model = make_pipeline(PolynomialFeatures(9), LinearRegression()).fit(X_train, y_train)
# 3. Compute errors
models = [linear_model, second_order_model, ninth_order_model]
train_errors = [mean_squared_error(y_train, model.predict(X_train)) for model in models]
test_errors = [mean_squared_error(y_test, model.predict(X_test)) for model in models]
print("Training errors:", train_errors)
print("Test errors:", test_errors)
# 4. Plot fitted lines
plt.scatter(X_train, y_train, color='blue', label='Training data')
X_line = np.linspace(X_train.min(), X_train.max(), 100).reshape(-1, 1)
colors = ['green', 'red', 'purple']
labels = ['Linear', '2nd-order', '9th-order']
for i, model in enumerate(models):
plt.plot(X_line, model.predict(X_line), color=colors[i], label=labels[i])
plt.legend()
plt.xlabel('Variable 1')
plt.ylabel('Variable 2')
plt.title('Fitted lines for different models')
plt.show()
Training errors: [0.5941676382175597, 0.5933518294151625, 0.5129192853729938]
Test errors: [0.7419748200389997, 0.7229125040492909, 0.9712589627709035]
GPT-4 is quite good at explaining the conceptual intent of code
self.continuous_model = LinearRegression()
ValueError: This solver needs samples of at least 2 classes in the data, but the data contains only one class: 1
$ python hurdle4.py
Mean squared error: 7.750558995547416
$ Rscript hurdle.R
[1] "Mean squared error: 108.965835640358"
$ python hurdle5.py
Mean squared error: 5.198941931090807
$ Rscript hurdle_insample.R
[1] "Mean squared error: 5.04963049849389"
The EZ-diffusion model for two-choice response time tasks takes mean response time, the variance of response time, and response accuracy as inputs. The model transforms these data via three simple equations to produce unique values for the quality of information, response conservativeness, and nondecision time. (Wagenmakers et al., 2007)
result = minimize(ez_diffusion_loss, initial_guess, bounds=bounds)
result = minimize(ez_diffusion_loss, initial_guess, bounds=bounds)
def ez_diffusion(response_times, decisions):
accuracy = np.mean(decisions > 0)
rt_mean = np.mean(response_times)
rt_var = np.var(response_times)
v = np.sqrt(np.pi) * (4 * accuracy * (1 - accuracy) / (rt_var / rt_mean ** 2 - 1)) ** (1 / 4)
a = rt_mean * (1 - 2 * accuracy) * v / (4 * accuracy * (1 - accuracy))
z = a / 2
Correct equation
GPT-4 equation
mass_jupiter = 1.8982e27
radius_jupiter = 6.9911e7
result = escape_velocity(mass_jupiter, radius_jupiter)
assert pytest.approx(result, rel=1e-3) == 59564.97
E assert 60202.716344497014 ± 6.0e+01 == 59564.97
E comparison failed
E Obtained: 59564.97
E Expected: 60202.716344497014 ± 6.0e+01
from pandas import *
from numpy import *
from scipy.stats import *
maxD = 12
hc = ['Nervous', 'Hopeless', 'RestlessFidgety', 'Depressed', 'EverythingIsEffort', 'Worthless', ]
h=read_csv('https://raw.githubusercontent.com/poldrack/clean_coding/master/data/health.csv',index_col=0)[hc].dropna().mean(1)
data=read_csv('https://raw.githubusercontent.com/poldrack/clean_coding/master/data/meaningful_variables_clean.csv',index_col=0)
sc=[]
for i in range(data.shape[1]):
if data.columns[i].split('.')[0][-7:] == '_survey':
sc=sc+[data.columns[i]]
data=data[sc]
gs=[]
for i in range(data.shape[0]):
if sum(isnan(data.iloc[i, :])) > 0:
pass
else:
gs=gs+[i]
data=data.iloc[gs,:]
from sklearn.preprocessing import scale
data_sc = scale(data)
from sklearn.decomposition import FactorAnalysis
bicv=zeros(maxD)
for i in range(1,maxD+1):
fa=FactorAnalysis(i)
fa.fit(data_sc)
bicv[i-1]=i*2 - 2*fa.score(data_sc)
npD=argmin(bicv)+1
fa=FactorAnalysis(npD)
f=fa.fit_transform(data_sc)
for i in range(npD):
print(pearsonr(f[:,i],h[gs]))
idx=argsort(abs(fa.components_[i, :]))[::-1]
for j in range(3):
print(data.columns[idx[j]], fa.components_[i, idx[j]])
https://github.com/poldrack/clean_coding
selector = SelectKBest(score_func=f_classif, k=k)
X_selected = selector.fit_transform(X, y)
# Initialize k-fold cross-validation
kf = KFold(n_splits=5, shuffle=True, random_state=42)
accuracies = []
# Cross-validation loop
for train_index, test_index in kf.split(X_selected):
X_train, X_test = X_selected[train_index], X_selected[test_index]
y_train, y_test = y[train_index], y[test_index]
https://youtu.be/0O3dZUEcN4I
AI coding tools will have major implications for science and education in the coming years
client.chat.completions.create(
model='gpt-4',
seed=seed,
messages=[
{'role': 'system', 'content': f'You are a helpful assistant'},
{'role': 'user', 'content': f'Output a random vegetable.'},],)
# Seed: None
Counter({'Broccoli': 73, 'Carrot': 13, 'Cabbage': 6, 'Cucumber': 4, 'Cauliflower': 3, 'Spinach': 1})
# Seed: 1234
Counter({'Cucumber': 100})
Finnie-Ansley et al, 2022
https://martinfowler.com/articles/2023-chatgpt-xu-hao.html
The Poldrack Lab
Collaborators
Gasper Begus and Thomas Lu, UC Berkeley
How good is the code generated by GPT-4?
“Clean code is simple and direct. Clean code reads like well-written prose.” - Grady Booch (from Martin, Clean Code)
With Gašper Beguš and Thomas Lu, UC Berkeley
Reduced # of logical code lines
Reduced cyclomatic complexity
Both FDR p < .01
MI= max[0,100×171−5.2ln(V)−0.23CC−16.2ln(L)+50sin(√2.4PC))171]
https://poldrack.github.io/talks-AIAssistedCoding/