A Data-Driven AMPL Model

A Data-Driven AMPL Model#

In this notebook, we’ll revisit the production planning example. However, this time we’ll demonstrate how Python’s data structures combine with AMPL’s ability to separate model and data, to create an optimization model that scales with the size of the data tables. This enables the model to adjust to new products, varying prices, or changing demand. We refer to this as “data-driven” modeling.

This notebook introduces two new AMPL model components that describe the data in a general way:

Sets
Parameters

These components enable the model to specify variables, constraints, and summations that are indexed over sets. The combination of sets and indices is essential to building scalable and maintainable models for more complex applications.

We will begin this analysis by examining the production planning data sets to identify the underlying problem structure. Then we will reformulate the mathematical model in a more general way that is valid for any data scenario. Finally we show how the same formulation carries over naturally into AMPL, providing a clear, data-driven formulation of the production planning application.

# install dependencies and select solver
%pip install -q amplpy pandas

Data representations#

We begin by revisiting the data tables and mathematical model developed for the basic production planning problem presented in the previous notebook. The original data values were given as follows:

Product	Material required	Labor A required	Labor B required	Market Demand	Price
U	10 g	1 hr	2 hr	$\leq$ 40 units	$270
V	9 g	1 hr	1 hr	unlimited	$210

Resource	Amount Available	Cost
M	unlimited	$10 / g
A	80 hours	$50 / hour
B	100 hours	$40 / hour

Two distinct sets of objects are evident from these tables. The first is the set of products, comprising $U$ and $V$. The second is the set of resources used to produce those products, comprising raw materials and two labor types, which we have abbreviated as $M$, $A$, and $B$.

Having identified these sets, the data for this application be factored into three simple tables. The first two tables list attributes of the products and attributes of the resources. The third table summarizes the processes used to create the products from the resources, which requires providing a value for each combination of product and resource:

Table: Products

Product	Demand	Price
U	$\leq$ 40 units	$270
V	unlimited	$210

Table: Resources

Resource	Available	Cost
M	?	$10 / g
A	80 hours	$50 / hour
B	100 hours	$40 / hour

Table: Processes

Product	M	A	B
U	10 g	1 hr	2 hr
V	9 g	1 hr	1 hr

How does a Python-based AMPL application work with this data? We can think of the data as being handled in three steps:

Import the data into Python, in whatever form is convenient for the application.
Convert the data to the forms required by the optimization model.
Send the data to AMPL.

For our example, we implement step 1 by use of Python nested dictionaries that closely resemble the above three tables:

In the products data structure, the product abbreviations serve as keys for outermost dictionary, and the product-related attribute names (demand and price) as keys for the inner dictionaries.
In the resources data structure, the resource abbreviations serve as keys for outermost dictionary, and the resource-related attribute names (available and cost) as keys for the inner dictionaries.
In the processes data structure, there is a value corresponding to each combination of a product and a resource; the product abbreviations serve as keys for outermost dictionary, and resource abbreviations as keys for the inner dictionaries.

Where demand or availability is “unlimited”, we use an expression that Python interprets as an infinite value.

You will see a variety of data representation in this book, chosen in each case to be most efficient and convenient for the application at hand. Some will use Python packages, particularly numpy and pandas, that are designed for large-scale data handling.

Inf = float("inf")

products = {
    "U": {"demand": 40, "price": 270},
    "V": {"demand": Inf, "price": 210},
}

resources = {
    "M": {"available": Inf, "cost": 10},
    "A": {"available": 80, "cost": 50},
    "B": {"available": 100, "cost": 40},
}

processes = {
    "U": {"M": 10, "A": 2, "B": 1},
    "V": {"M": 9, "A": 1, "B": 1},
}

Mathematical model#

Once the problem data is rearranged into tables like this, the structure of the production planning problem becomes evident. Along with the two sets, we have a variety of symbolic parameters that specify the model’s costs, limits, and processes in a general way.Compared to the previous notebook, these abstractions allow us to create mathematical models that can adapt and scale with the supplied data.

Let $\cal{P}$ and $\cal{R}$ be the set of products and resources, respectively, and let $p$ and $r$ be representative elements of those sets. We use indexed decision variables $x_r$ to denote the amount of resource $r$ that is consumed in production, and $y_p$ to denote the amount of product $p$ produced.

The model specifies lower and upper bounds on the values of the variables. We represent these as

\[\begin{split} \begin{aligned} 0 \leq x_r \leq b^x_r & & \forall r\in\cal{R} \\ 0 \leq y_p \leq b^y_p & & \forall p\in\cal{P} \\ \end{aligned} \end{split}\]

where the upper bounds, $b^x_r$ and $b^y_p$, are data taken from the tables of attributes.

The objective is given as before,

\[\begin{split} \begin{aligned} \text{profit} & = \text{revenue} - \text{cost} \\ \end{aligned} \end{split}\]

but now the expressions for revenue and cost are expressed more generally as sums over the product and resource sets,

\[\begin{split} \begin{aligned} \text{revenue} & = \sum_{p\in\cal{P}} c^y_p y_p \\ \text{cost} & = \sum_{r\in\cal{R}} c^x_r x_r \\ \end{aligned} \end{split}\]

where parameters $c^y_p$ and $c^x_r$ represent the selling prices for products and the costs for resources, respectively. The limits on available resources can be written as

\[ \begin{aligned} \sum_{p\in\cal{P}} a_{rp} y_p & \leq x_r & \forall r\in\cal{R} \end{aligned} \]

where $a_{rp}$ is the amount of resource $r$ needed to make 1 unit of product $p$. Putting these pieces together, we have the following symbolic model for the production planning problem.

\[\begin{split} \begin{align} {\rm maximize} \quad & \sum_{p\in\cal{P}} c^y_p y_p - \sum_{r\in\cal{R}} c^x_r x_r \\ \text{subject to} \quad & \sum_{p\in\cal{P}} a_{rp} y_p \leq x_r & \forall r\in\cal{R} \nonumber \\ & 0 \leq x_r \leq b^x_r & \forall r\in\cal{R} \nonumber \\ & 0 \leq y_p \leq b^y_p & \forall p\in\cal{P} \nonumber \\ \end{align} \end{split}\]

When formulated this way, the model can be applied to any problem with the same structure, regardless of the number of products or resources. This flexibility is possible due to the use of sets to describe the products and resources for a particular problem instance, indices like $p$ and $r$ to refer to elements of those sets, and data tables that hold the relevant parameter values.

Generalizing mathematical models in this fashion is a feature of all large-scale optimization applications. Next we will see how this type of generalization carries over naturally into formulating and solving the model in AMPL.

The production model in AMPL#

As before, we begin the construction of an AMPL model by importing the needed components into the AMPL environment.

from amplpy import AMPL, ampl_notebook

ampl = ampl_notebook(
    modules=["highs"],  # modules to install
    license_uuid="default",  # license to use
)  # instantiate AMPL object and register magics

Next we use AMPL set statements to define the product and resource sets. Notice that at this point, we are only telling AMPL about the two sets will be used in the model. The members of these sets will be sent from Python to AMPL later, as part of the problem data.

In mathematical formulations, it is customary to keep the names of all components short. But when writing the model in AMPL, we are free to use longer, more meaningful names that make the model statements easier to read. Thus, for example, here we use PRODUCTS and RESOURCES as the AMPL names of the sets that are are called $\cal P$ and $\cal R$ in the mathematical model.

%%ampl_eval
# define sets

set PRODUCTS;
set RESOURCES;

The next step is to introduce parameters that will be used as data in the objective function and the constraints.

A statement that defines and AMPL parameter begin with the param keyword and a unique name. Then between braces { and } it specifies the index sets for the parameter. For example:

param demand {PRODUCTS} >= 0; states that there is a “product demanded” value for each member of the set PRODUCTS.
param need {RESOURCES,PRODUCTS} >= 0; states that there is a “resource needed” value for each combination of a resource and a product.

At the end of each param statement, we specify that the values for the parameter must be nonnegative or positive, as appropriate. These specifications will be used to later to check that the actual data values are appropriate for the problem.

There are 5 different param statements in all, corresponding to the 5 different kinds of data in tables, and the 5 different symbolic parameters $b_p^y$, $c_p^y$, $b_r^x$, $c_r^x$, and $a_{rp}$ in the mathematical model.

%%ampl_eval
# define parameters

param demand {PRODUCTS} >= 0;
param price {PRODUCTS} > 0;

param available {RESOURCES} >= 0;
param cost {RESOURCES} > 0;

param need {RESOURCES,PRODUCTS} >= 0;

AMPL defines the decision variables in much the same way as the parameters, but with var as the keyword starting the statement. We name the variables Use for resource use, and Sell for product sales.

To express the bounds on the variables in the same way as the mathematical formulation, a more general form of the AMPL statement is needed. In the case of the Use variables, for example:

The indexing expression is written {r in RESOURCES} to say that there is a variable for each member of the resource set, and also to associate the index r with members of the set for purpose of this statement. This is the AMPL equivalent of $\forall r\in\cal{R}$ in the mathematical statement.
The upper bound is written <= available[r] to say that for each member r of the resource set, the variable’s upper bound is given by the corresponding value from the availability table. This is the AMPL equivalent of $\leq b^x_r$ in the mathematical statement.

An expression in brackets [...] is called an AMPL subscript because it plays the same role as a mathematical subscript like $r$ in $\leq b^x_r$. Anywhere that the model refers to particular values of an indexed parameter or variables, you will see subscript expressions. For example,

need[r,p] will be the amount of resource r needed to make one unit of product p.
Use[r] will be the total amount of resource r used.

%%ampl_eval
# define variables

var Use {r in RESOURCES} >= 0, <= available[r];
var Sell {p in PRODUCTS} >= 0, <= demand[p];

Just as in the previous notebook, the AMPL statement for the objective function begins with maximize Profit. But now, as in the mathematical formulation, AMPL uses general summation expressions:

sum {p in PRODUCTS} price[p] * Sell[p] is the sum, over all products, of the price per unit time the number sold. It corresponds to $\sum_{p\in\cal{P}} c^y_p y_p$ in the mathematical formulation.
sum {r in RESOURCES} cost[r] * Use[r] is the sum, over all resources, of the cost per unit time the amount used. It corresponds to $\sum_{r\in\cal{R}} c^x_r x_r$ in the mathematical formulation.

The full expression for the objective function is simply the first of these expressions minus the second one.

%%ampl_eval
# define objective function

maximize Profit:
   sum {p in PRODUCTS} price[p] * Sell[p] -
   sum {r in RESOURCES} cost[r] * Use[r];

The previous AMPL model had 3 constraints, each defined by a subject to statement. But the data-driven mathematical formulation recognizes that there is only one different kind of constraint — resources needed must be less than or equal to resources used — repeated 3 times, once for each resource. The AMPL version combines expressions that have already appeared in earlier parts of the model:

subject to ResourceLimit {r in RESOURCES} says that the model will have one constraint corresponding to each member r of the resource set.
sum {p in PRODUCTS} need[r,p] * Sell[p] <= Use[r] says that the total of resource r needed, summed over all produces sold, must be <= the total of resource r used. This corresponds to $\sum_{p\in\cal{P}} a_{rp} y_p \leq x_r$ in the mathematical formulation.

%%ampl_eval
# create indexed constraint

subject to ResourceLimit {r in RESOURCES}:
   sum {p in PRODUCTS} need[r,p] * Sell[p] <= Use[r];

The production data in AMPL#

Now that the AMPL model is defined, we can carry out step 2 of data handling, which is to convert the data to the forms that the model requires:

For the two sets, Python lists of the set members.
For the two parameters indexed over products, Python dictionaries whose keys are the product names.
For the two parameters indexed over resources, Python dictionaries whose keys are the resource names.
For the parameter indexed over resource-product pairs, a Python dictionary whose keys are tuples consisting of a a resource and a product.

Using Python’s powerful expression forms, all of these lists and dictionaries are readily extracted from the nested dictionaries that our application set up in step 1. To avoid having too many different names, we assign each list and dictionary to a Python program variable that has the same name as the corresponding AMPL set or parameter:

# set data
PRODUCTS = products.keys()
RESOURCES = resources.keys()

# product data
demand = {k: v["demand"] for k, v in products.items()}
price = {k: v["price"] for k, v in products.items()}

# resource data
available = {k: v["available"] for k, v in resources.items()}
cost = {k: v["cost"] for k, v in resources.items()}

need = {(r, p): value for p in processes.keys() for r, value in processes[p].items()}

print(PRODUCTS, RESOURCES)
print(demand, price)
print(available, cost)
print(need)

dict_keys(['U', 'V']) dict_keys(['M', 'A', 'B'])
{'U': 40, 'V': inf} {'U': 270, 'V': 210}
{'M': inf, 'A': 80, 'B': 100} {'M': 10, 'A': 50, 'B': 40}
{('M', 'U'): 10, ('A', 'U'): 2, ('B', 'U'): 1, ('M', 'V'): 9, ('A', 'V'): 1, ('B', 'V'): 1}

Solving the production problem#

Now the Python data can be sent to AMPL, and AMPL can invoke a solver. For this simple model, we can make the Python data correspond exactly to the AMPL data, and thus the statements for sending the data to AMPL are particularly easy to write.

The statements for selecting a solver and for initiating the solver process are the same as we used with the basic production planning example. When the solver is finished, it displays a few lines of output to confirm that a solution has been found.

# load set data
ampl.set["PRODUCTS"] = PRODUCTS
ampl.set["RESOURCES"] = RESOURCES

# load parameter data
ampl.param["price"] = price
ampl.param["demand"] = demand
ampl.param["cost"] = cost
ampl.param["available"] = available
ampl.param["need"] = need

# set solver and solve
ampl.option["solver"] = "highs"
ampl.solve()

HiGHS 1.5.1: HiGHS 1.5.1: optimal solution; objective 2400
2 simplex iterations
0 barrier iterations

Reporting the results#

It remains to retrieve the solution from AMPL, after which Python’s extensive features and ecosystem can be used to present the results in any way desired. For this first example we use one of the simplest Python features, the print statement.

An AMPL entity is referenced in Python code via its name in the AMPL model. For example, the objective function Profit is ampl.obj['Profit'], and the collection of Sell variables is ampl.var['Sell'].

For an entity that is not indexed, the value() method returns the associated value. Thus the first print statement refers to ampl.obj['Profit'].value().

For an indexed entity, we use the to_dict() method to return the values in a Python dictionary, with the set members as keys. Then a for loop can use the items() method to iterate over the dictionary and print a line for each member.

# create a solution report
print(f"Profit = {ampl.obj['Profit'].value()}")

print("\nProduction Report")
for product, Sell in ampl.var["Sell"].to_dict().items():
    print(f" {product} produced = {Sell}")

print("\nResource Report")
for resource, Use in ampl.var["Use"].to_dict().items():
    print(f" {resource} consumed = {Use}")

Profit = 2400.0

Production Report
 U produced = 0
 V produced = 80

Resource Report
 A consumed = 80
 B consumed = 80
 M consumed = 720

For Python experts: Creating subclasses of `AMPL`#

Some readers of these notebooks may be more experienced Python developers who wish to apply AMPL in more specialized, data driven applications. The following cell shows how the AMPL class can be extended to create specialized model classes. Here we create a subclass called ProductionModel that accepts a particular representation of the problem data to produce a production model object. The production model object inherits all of the methods associated with any AMPL model, such as .display() and .solve(), but can be extended with additional methods.

%%writefile production_planning.mod

# define sets
set PRODUCTS;
set RESOURCES;

# define parameters
param demand {PRODUCTS} >= 0;
param price {PRODUCTS} > 0;
param available {RESOURCES} >= 0;
param cost {RESOURCES} > 0;
param need {RESOURCES,PRODUCTS} >= 0;

# define variables
var Use {r in RESOURCES} >= 0, <= available[r];
var Sell {p in PRODUCTS} >= 0, <= demand[p];

# define objective function
maximize Profit:
   sum {p in PRODUCTS} price[p] * Sell[p] -
   sum {r in RESOURCES} cost[r] * Use[r];

# create indexed constraint
subject to ResourceLimit {r in RESOURCES}:
   sum {p in PRODUCTS} need[r,p] * Sell[p] <= Use[r];

Overwriting production_planning.mod

import pandas as pd


class ProductionModel(AMPL):
    """
    A class representing a production model using AMPL.
    """

    def __init__(self, products, resources, processes):
        """
        Initialize ProductionModel as an AMPL instance.

        :param products: A dictionary containing product information.
        :param resources: A dictionary containing resource information.
        :param processes: A dictionary containing process information.
        """
        super(ProductionModel, self).__init__()

        # save data in the model instance
        self.products = products
        self.resources = resources
        self.processes = processes

        # flag to monitor solution status
        self.solved = False

    def load_data(self):
        """
        Prepare the data and pass the information to AMPL.
        """
        # convert the data dictionaries into pandas data frames
        products = pd.DataFrame(self.products).T
        resources = pd.DataFrame(self.resources).T
        processes = pd.DataFrame(self.processes).T

        # display the generated data frames
        display(products)
        display(resources)
        display(processes)

        # pass data to AMPL
        self.set_data(products, "PRODUCTS")
        self.set_data(resources, "RESOURCES")
        self.param["need"] = processes.T

    def solve(self, solver="highs"):
        """
        Read the model, load the data, set the solver and solve the optimization problem.
        """
        self.read("production_planning.mod")
        self.load_data()
        self.option["solver"] = solver
        super(ProductionModel, self).solve()
        self.solved = True

    def report(self):
        """
        Solve, if necessary, then report the model solution.
        """
        if not self.solved:
            self.solve()

        print(f"Profit = {self.obj['Profit'].value()}")

        print("\nProduction Report")
        Sell = self.var["Sell"].to_pandas()
        Sell.rename(columns={Sell.columns[0]: "produced"}, inplace=True)
        Sell.index.rename("PRODUCTS", inplace=True)
        display(Sell)

        print("\nResource Report")
        Use = self.var["Use"].to_pandas()
        Use.rename(columns={Use.columns[0]: "consumed"}, inplace=True)
        Use.index.rename("RESOURCES", inplace=True)
        display(Use)


m = ProductionModel(products, resources, processes)
m.report()

	demand	price
U	40.0	270.0
V	inf	210.0

	available	cost
M	inf	10.0
A	80.0	50.0
B	100.0	40.0

	M	A	B
U	10	2	1
V	9	1	1

HiGHS 1.5.1: HiGHS 1.5.1: optimal solution; objective 2400
2 simplex iterations
0 barrier iterations
Profit = 2400.0

Production Report

	produced
PRODUCTS
U	0
V	80

Resource Report

	consumed
RESOURCES
A	80
B	80
M	720