Friday, November 02, 2012

An introduction to vectors in C++

For several years, your uncle Bob has been keeping track of his finances in the following format:
    4
    potatoes       1800
    carrots        1300
    beer           2500
    tomato_sauce    500
The first line always has the number of items, and every next line has a single-word product, some spaces, and then the amount it costs. However, being a little eccentric, uncle Bob decided that he'd like to sort all his files, and that's where you come in.

You've been given the task of writing a program that reads a file, sorts the contents, and then writes them back in a similar format. For simplicity, we'll use std::cin and std::cout for reading and writing, and we won't worry about how many spaces there are between the item name and the cost. We'll start by reading the number of items and printing it right back out:
#include <iostream>

int main() {
    int num_items;
    std::cin >> num_items;
    std::cout << num_items << '\n';
}
Nothing new here. We'd also like to have a data structure to store the item name and cost in:
struct Item {
    std::string name;
    int cost;
};
We now know our program will look roughly like:
#include <iostream>
#include <string>

struct Item {
    std::string name;
    int cost;
};

int main() {
    int num_items;
    std::cin >> num_items;
    // A: Where do we put the items?
    for (int i = 0; i < num_items; ++i) {
        Item item;
        std::cin >> item.name;
        std::cin >> item.cost;
        // B: How do we store the item?
    }
    // C: How do we sort the items?
}


While we now have a program that reads the input just fine, we still need to find something to store the items in. We can't use an array, because we don't know how many items there will be until we run the program. We want something that's almost an array, but with a variable size.

Fortunately for us, the standard library has something that fits exactly: std::vector. By using a vector, we can store as many of something as we want. However, we still have to specify what we want to store, so the following won't work:
std::vector items; // Error!
In order to fix this, we need to say what kind of vector this is: namely, a vector of Items. The fancy term is "providing template arguments", and it looks like this:
std::vector<Item> items; // Works!
This creates an empty vector of Items. We can then use the resize function of the vector to make it the size we want it to be. Our main function looks like:
int main() {
    int num_items;
    std::cin >> num_items;

    std::vector<Item> items;
    items.resize(num_items);

    for (int i = 0; i < num_items; ++i) {
        Item item;
        std::cin >> item.name;
        std::cin >> item.cost;
        // B: How do we store the item?
    }
    // C: How do we sort the items?
}
We can access the things in the vector just like we'd normally access things in an array, so all we have to do in the for loop is assign our new item to an element in the vector.
for (int i = 0; i < num_items; ++i) {
    Item item;
    std::cin >> item.name;
    std::cin >> item.cost;
    items[i] = item;
}
We now have working code for filling a vector with data! In order to sort it, we need to write a function that determines whether one item is "less than" another or not. We are sorting the items based on price, so we the cheapest item to be the "smallest". Let's write this function, which we will call compare_items:
bool compare_items(Item const& left, Item const& right) {
    return left.cost < right.cost;
}
If the Item const& notation is new, don't worry about it; in this case, it's almost the same thing as if we'd write Item, but it may run faster. Now we need to sort the vector. That's surprisingly easy in C++:
std::sort(items.begin(), items.end(), compare_items);
This is like saying "sort items from beginning to end, using compare_items to find out the order we want". We don't have to worry about how exactly the sorting is done; as long as compare_items is correct, items will sorted correctly. Now all we need to do is print out the number of items, which goes just like printing out an array. The code, from beginning to end, is:
#include <iostream>
#include <string>
#include <vector>
#include <algorithm> // needed for std::sort

struct Item {
    std::string name;
    int cost;
};

bool compare_items(Item const& left, Item const& right) {
    return left.cost < right.cost;
}

int main() {
    int num_items;
    std::cin >> num_items;

    std::vector<Item> items;
    items.resize(num_items);

    for (int i = 0; i < num_items; ++i) {
        Item item;
        std::cin >> item.name;
        std::cin >> item.cost;
        items[i] = item;
    }

    std::sort(items.begin(), items.end(), compare_items);

    std::cout << num_items << '\n';

    for (int i = 0; i < num_items; ++i) {
        std::cout << items[i].name << ' ' << items[i].cost << '\n';
    }
}
This program compiles and does what uncle Bob asked us to do. If you want a bit more of a challenge, you could try to do one of the following:


  1. How do you make the most expensive item come first?
  2. Could you make the output be neatly formatted, just like the input example was?
  3. Is the item object we create inside the for loop necessary? Can you rewrite the code without it?



While processing his records, uncle Bob discovered that some were missing the number of items! He wants us to write a second program that works without that number.

Fortunately for us, the standard library again does almost all of the work. If we have a vector, we can add a copy of something to the end of it using push_back. This doesn't overwrite any of the existing elements: instead, it makes the vector one larger, to fit whatever we added. Still, it's only a copy we add: the old value still exists independently.

However, we now don't know how many items we should read. Instead of using a for loop, we use "while (std::cin)" — roughly, this means "while reading from std::cin is working correctly". If we encounter bad input or reach the end of the file, we'll stop reading. A similar situation occurs with the read itself. The code used is
Item item;
if (std::cin >> item.name && std::cin >> item.cost) {
    items.push_back(item);
}
You should read the if statement as "if we successfully read item.name and item.cost". The rest of the program is very similar. We don't have a num_items variable any more, so we can't use that in the second for loop, but we're lucky again: we can find out the size of a vector using the size function. The resulting code is:
int main() {
    std::vector<Item> items;

    while (std::cin) {
        Item item;
        if (std::cin >> item.name && std::cin >> item.cost) {
            items.push_back(item);
        }
    }

    std::sort(items.begin(), items.end(), compare_items);

    for (std::size_t i = 0; i < items.size(); ++i) {
        std::cout << items[i].name << ' ' << items[i].cost << '\n';
    }
}
The usage of std::size_t instead of int may be confusing; size_t is an unsigned type that is often used as indices. In fact, it makes somewhat more sense semantically, so you could say that using int to loop over arrays is more laziness than anything.

No comments:

Post a Comment