Utils
Utility functions used by other roboduck modules.
Functions
colored(text, color)
Add tags to color text and then reset color afterwards. Note that this does NOT actually print anything.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
Text that should be colored. |
required |
color |
str
|
Color name, e.g. "red". Must be available in the colorama lib. If None or empty str, just return the text unchanged. |
required |
Returns:
Type | Description |
---|---|
str
|
Note that you need to print() the result for it to show up in the desired color. Otherwise it will just have some unintelligible characters appended and prepended. |
Source code in lib/roboduck/utils.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
|
colordiff_new_str(old, new, color='green')
Given two strings, return the new one with new parts in green. Note that deletions are ignored because we want to retain only characters in the new string. Remember colors are only displayed correctly when printing the resulting string - otherwise it just looks like we added extra junk characters.
Idea is that when displaying a revised code snippet from gpt, we want to draw attention to the new bits.
Adapted from this gist + variations in comments: https://gist.github.com/ines/04b47597eb9d011ade5e77a068389521
Parameters:
Name | Type | Description | Default |
---|---|---|---|
old |
str
|
This is what |
required |
new |
str
|
Determines content of output str. |
required |
color |
str
|
Text color for new characters. |
'green'
|
Returns:
Type | Description |
---|---|
str
|
Same content as |
Source code in lib/roboduck/utils.py
40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
type_annotated_dict_str(dict_, func=repr)
String representation (or repr) of a dict, where each line includes an inline comment showing the type of the value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dict_ |
dict
|
The dict to represent. |
required |
func |
function
|
The function to apply to each key and value in the dict to get some kind of str representation. Note that it is applied to each key/value as a whole, not to each item within that key/value. See examples. |
repr
|
Returns:
Type | Description |
---|---|
str
|
|
Examples:
Notice below how foo and cat are not in quotes but ('bar',) and ['x'] do contain quotes.
>>> d = {'foo': 'cat', ('bar',): ['x']}
>>> type_annotated_dict_str(d, str)
{
foo: cat, # type: str
('bar',): ['x'], # type: list
}
Source code in lib/roboduck/utils.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 |
|
is_array_like(obj)
Hackily check if obj is a numpy array/torch tensor/pd.Series or similar without requiring all those libraries as dependencies (notably, pd.DataFrame is not considered array_like - it has useful column names unlike these other types). Instead of checking for specific types here, we just check that the obj has certain attributes that those objects should have. If obj is the class itself rather than an instance, we return False.
Source code in lib/roboduck/utils.py
112 113 114 115 116 117 118 119 120 121 122 123 |
|
qualname(obj, with_brackets=True)
Similar to type(obj).qualname() but that method doesn't always
include the module(s). e.g. pandas Index has qualname "Index" but
this function returns "
Set with_brackets=False to skip the leading/trailing angle brackets.
Source code in lib/roboduck/utils.py
126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
|
format_listlike_with_metadata(array, truncated_data=None)
Format a list-like object with metadata.
This function creates a string representation of a list-like object, including its class name, truncated data (if provided), and additional metadata such as shape, dtype, or length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
array |
object
|
The list-like object to be formatted. |
required |
truncated_data |
object
|
A truncated version of the array's data. If provided, it will be included in the formatted string. |
None
|
Returns:
Type | Description |
---|---|
str
|
A formatted string representation of the array with metadata. |
Examples:
>>> import numpy as np
>>> arr = np.array([1, 2, 3, 4, 5])
>>> format_listlike_with_metadata(arr, arr[:3])
'<numpy.ndarray, truncated_data=[1, 2, 3, ...], shape=(5,), dtype=int64>'
>>> import pandas as pd
>>> series = pd.Series(['a', 'b', 'c', 'd', 'e'])
>>> format_listlike_with_metadata(series, series[:2])
"<pandas.core.series.Series, truncated_data=['a', 'b', ...], len=5>"
Source code in lib/roboduck/utils.py
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
|
fallback(*, default=None, default_func=None)
Decorator to provide a default value (or function that produces a value) to return when the decorated function's execution fails.
You must specify either default OR default_func, not both. If default_func is provided, it should accept the same args as the decorated function.
Source code in lib/roboduck/utils.py
214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 |
|
truncated_repr(obj, max_len=400)
Return an object's repr, truncated to ensure that it doesn't take up more characters than we want. This is used to reduce our chances of using up all our available tokens in a gpt prompt simply communicating that a giant data structure exists, e.g. list(range(1_000_000)). Our use case doesn't call for anything super precise so the max_len should be thought of as more of guide than an exact max. I think it's enforced but I didn't put a whole lot of thought or effort into confirming that.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
obj |
any
|
|
required |
max_len |
int
|
Max number of characters for resulting repr. Think of this more as an estimate than a hard guarantee - precision isn't important in our use case. The result will likely be shorter than this because we want to truncate in a readable place, e.g. taking the repr of the first k items of a list instead of taking the repr of all items and then slicing off the end of the repr. |
400
|
Returns:
Type | Description |
---|---|
str
|
Repr for obj, truncated to approximately max_len characters or fewer. When possible, we insert ellipses into the repr to show that truncation occurred. Technically there are some edge cases we don't handle (e.g. if obj is a class with an insanely long name) but that's not a big deal, at least at the moment. I can always revisit that later if necessary. |
Source code in lib/roboduck/utils.py
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 |
|
load_yaml(path, section=None)
Load a yaml file. Useful for loading prompts.
Borrowed from jabberwocky.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str or Path
|
|
required |
section |
str or None
|
I vaguely recall yaml files can define different subsections. This lets you return a specific one if you want. Usually leave as None which returns the whole contents. |
None
|
Returns:
Type | Description |
---|---|
dict
|
|
Source code in lib/roboduck/utils.py
333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 |
|
update_yaml(path, delete_if_none=True, **kwargs)
Update a yaml file with new values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str or Path
|
Path to yaml file to update. If it doesn't exist, it will be created. Any necessary intermediate directories will be created too. |
required |
delete_if_none |
bool
|
If True, any k-v pairs in kwargs where v is None will be treated as an
instruction to delete key k from the yaml file. If False, we will
actually set |
True
|
kwargs |
any
|
Key-value pairs to update the yaml file with. |
{}
|
Source code in lib/roboduck/utils.py
355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 |
|
extract_code(text, join_multi=True, multi_prefix_template='\n\n# {i}\n')
Extract code snippet from a GPT response (e.g. from our debug
chat
prompt. See Examples
for expected format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
|
required |
join_multi |
bool
|
If multiple code snippets are found, we can either choose to join them
into one string or return a list of strings. If the former, we prefix
each snippet with |
True
|
multi_prefix_template |
str
|
If join_multi=True and multiple code snippets are found, we prepend
this to each code snippet before joining into a single string. It
should accept a single parameter {i} which numbers each code snippet
in the order they were found in |
'\n\n# {i}\n'
|
Returns:
Type | Description |
---|---|
str or list
|
Code snippet from |
Examples:
text = '''Appending to a tuple is not allowed because tuples are immutable.
However, in this code snippet, the tuple b contains two lists, and lists
are mutable. Therefore, appending to b[1] (which is a list) does not raise
an error. To fix this, you can either change b[1] to a tuple or create a
new tuple that contains the original elements of b and the new list.
```python
# Corrected code snippet
a = 3
b = ([0, 1], [2, 3])
b = (b[0], b[1] + [a])
```'''
print(extract_code(text))
# Extracted code snippet
'''
a = 3
b = ([0, 1], [2, 3])
b = (b[0], b[1] + [a])
'''
Source code in lib/roboduck/utils.py
386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 |
|
parse_completion(text)
This function is called on the gpt completion text in roboduck.debug.DuckDB.ask_language_model (i.e. when the user asks a question during a debugging session, or when an error occurs when in auto-explain errors mode).
Users can define their own custom function as a replacement (mostly useful when defining custom prompts too). The only requirements are that the function must take 1 string input and return a dict containing the keys "explanation" and "code", with an optional key "extra" that can be used to store any additional information (probably in a dict). For example, if you wrote a prompt that asked gpt to return valid json, you could potentially use json.loads() as your drop-in replacement (ignoring validation/error handling, which you might prefer to handle via a langchain chain anyway).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
GPT completion. This should contain both a natural language explanation and code. |
required |
Returns:
Type | Description |
---|---|
dict[str]
|
|
Source code in lib/roboduck/utils.py
451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 |
|
available_models()
Show user available values for model_name parameter in debug.DuckDB class/ debug.duck function/errors.enable function etc.
Returns:
Type | Description |
---|---|
dict[str, list[str]]
|
Maps provider name (e.g. 'openai') to list of valid model_name values. Provider name should correspond to a langchain or roboduck class named like ChatOpenai (i.e. Chat{provider.title()}). Eventually would like to support other providers like anthropic but never got off API waitlist. |
Source code in lib/roboduck/utils.py
488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 |
|
make_import_statement(cls_name)
Given a class name like 'roboduck.debug.DuckDB', construct the import statement (str) that should likely be used to import that class (in this case 'from roboduck.debug import DuckDB'.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
cls_name |
str
|
Class name including module (essentially qualname?), e.g. roboduck.DuckDB. (Note that this would need to be roboduck.debug.DuckDB if we didn't include DuckDB in roboduck's init.py.) |
required |
Returns:
Type | Description |
---|---|
str
|
E.g. "from roboduck import DuckDB" |
Source code in lib/roboduck/utils.py
514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 |
|