pythonalgorithms.pdf下载原版高清
本书用Python语言来讲解算法的分析和设计 分别介绍了树、图、计数问题、归纳递归、遍历、分解合并、贪心算法、复杂依赖、Dijkstra算法、匹配切割问题以及困难问题及其稀释等内容Python Algorithms explains the Python approach to algorithm analysis and design. Written by Magnus Lie Hetland, author of Beginning Python, this book is sharply focused on classical algorithms, but it also givAPPENDIXA PEDAL TO THE METAL: ACCELERATING PYTHONalso lets you manipulate and efficiently run code objects, although it works at the assembly level. (Notethat ctypes is part of the Python standard library.Weave, Cinpy, and Pylnline These three packages let you use C (or some other languages) directly inyour Python code. This is done quite cleanly, by keeping the C code in multiline Python strings, whichare then compiled on the fly. The resulting code object is then available to your Python code, usingfacilities such as ctypes for the interfacingTable A-1 URLS for Acceleration Tool Web SitesTool webSiteBoost. Pythonhttp://boost.orgCinpyWww.cs.Tut. fi/ask/cinpyCoreYhttp://corepy.orgctypeshttp://docs.python.org/library/ctypes.htmlthonhttp://cython.orghttp://cens.ioc.ee/projects/f2py2eGPULibhttp://txcorp.com/products/gpuliblIvm-pyhttp://mdevan.org/llvm-pyNumPyhttp://numpy.scipy.orgPsycohttp://psycho.sf.netPyInlinehttp://pyinline.sf.netPhttp://codespeak.net/pypyPyrexWww.CosC.canterbury. ac. nz/greg. ewing/python/pyrexPyStreamhttp://code.googlecom/p/pystreamSagehttp://sagemath.orgSciphttp://scipy.orgShedskinhttp://code.google.com/p/shedskinSWIGhttp://swig.orgUnladenSwallowhttp://code.googlecom/p/unladen-swallowWeavehttp://scipy.org/weave273APPENDIX BList of Problems and algorithmsIf you're having hull problems, I feel bad for you, sonIue got 99 problems, but a breach aint one-AnonymousThis appendix does not list every problem and algorithm mentioned in the book, because somealgorithms are only discussed to illustrate a principle and some problems serve only as examples forertain algorithms. The most important problems and algorithms, however, are sketched out here, withsome references to the main text. If you're unable to find what you're looking for by consulting thisappendix, take a look in the indexIn most descriptions in this appendix, n refers to the problem size(such as the number of elementsin a sequence). For the special case of graphs, though, n refers to the number of nodes, and m refers tothe number of edgesProblemsCliques and independent sets. a clique is a graph where there is an edge between every pair of nodesThe main problem of interest here is finding a clique in a larger graph(that is, identifying a clique as asubgraph). An independent set in a graph is a set of nodes where no pair is connected by an edge. Inother words, finding an independent set is equivalent to taking the complement of the edge set andfinding a clique. Finding a k-clique(a clique of k nodes)or finding the largest clique in a graph(the max-clique problem)is NP-hard. (For more information, see Chapter 11.)Closest pair. Given a set of points in the Euclidean plane, find the two points that are closest to eachother. This can be solved in loglinear time using the divide-and-conquer strategy(see Chapter 6)Compression and optimal decision trees. A Huffman tree is a tree whose leaves have weights(frequencies), and the sum of their weights multiplied by their depth is as small as possible. This makesuch trees useful for constructing compression codes and as decision trees when a probabilitydistribution is known for the outcomes. Huffman trees can be built using Huffman's algorithmdescribed in Chapter 7(Listing 7-1)Facetiously attributed to Lt Cdr. Geordi La Forge of Star Trek: The Next generation275APPENDIX B LIST OF PROBLEMS AND ALGORITHMSConnected and strongly connected components. An undirected graph is connected if there is a pathfrom every node to every other. a directed graph is connected if its underlying undirected graph isconnected. A connected component is a maximal subgraph that is connected Connected componentscan be found using traversal algorithms such as DFS (Listing 5-5)or BFS (Listing 5-9), for example. Ifthere is a(directed) path from every node to every other in a directed graph, it is called strongconnected. A strongly connected component(SCC)is a maximal subgraph that is strongly connectedSCCs can be found using Kosaraju's algorithm (Listing 5-10)Convex hulls. A convex hull is the minimum convex region containing a set of points in the euclideanplane Convex hulls can be found in loglinear time using the divide-and-conquer strategy(see Chapter 6)Finding the minimum/maximum/median. Finding the minimum and maximum of a sequence can befound in linear time by a simple scan Repeatedly finding and extracting the maximum or minimum inonstant time, given linear-time preparation, can be done using a binary heap. It is also possible to findthe kth smallest element of a sequence(the median for k= n/2)in linear(or expected linear) time, usingthe select or randomized select (For more information, see Chapter 6)Flow and cut problems. How many units of flow can be pushed through a network with flow capacitieson the edges? That is the max-flow problem. an equivalent problem is finding the set of edge capacitiesthat most constrain the flow; this is the min-cut problem. There are several versions of these problemsFor example, you could add costs to the edges, and find the cheapest of the maximum flows. You couldadd lower bound on each edge and look for a feasible flow. You could even add separate supplies anddemands in each node. These problems are dealt with in detail in Chapter 10Graph coloring. Try to color the nodes of a graph so that no neighbors share a color. Now try to do thiswith a given number of colors, or even to find the lowest such number(the chromatic number of thegraph). This is an NP-hard problem in general. If, however, you're asked to see if a graph is two-colorable(or bipartite), the problem can be solved in linear time using simple traversal. The problem of finding aclique cover is equivalent to finding an independent set cover, which is an identical problem to graphcoloring.(See Chapter ll for more on graph coloring.Hamilton cycles/paths and TSP. and Euler tours. Several path and subgraph problems can be solvedefficiently. If, however, you want to visit every node exactly once, you're in trouble. Any probleminvolving this constraint is NP-hard, including finding a Hamilton cycle(visit every node once andreturn), a Hamilton path(visit every node once, without returning), or a shortest tour of a completegraph(the Traveling Salesman/Salesrep problem). The problems are NP-hard both for the directed andundirected case(see Chapter 11). The related problem of visiting every edge exactly once, thoughfinding a so-called Euler tour-is solvable in polynomial time (see Chapter 5). The TSP problem is NPhard even for special cases such as using Euclidean distances in the plane, but it can be efficientlyapproximated to within a factor of 1.5 for this case, and for any other metric distance Approximating theSP problem in general, though, is NP-hard.(See Chapter ll for more information.Longest increasing subsequence. Find the longest subsequence of a given sequence whose elements arein increasing order. This can be solved in loglinear time using dynamic programming(see Chapter 8), eMatching. There are many matching problems, all of which involve linking some object to others. Theproblems discussed in this book are bipartite matching and min-cost bipartite matching( Chapter 10)and the stable marriage problem(Chapter 7). Bipartite matching (or maximum bipartite matchinginvolves finding the greatest subset of edges in a bipartite graph so that no two edges in the subset sharstable marriage problem is a bit different; there, all men and women have preference rankings of the oa node. The min-cost version does the same but minimizes the sum of edge costs over this subset themembers of the opposite sex. A stable set of marriages is characterized by the fact that you cant find apair that would rather have each other than their current mates276APPENDIX B LIST OF PROBLEMS AND ALGORITHMSMinimum spanning trees. A spanning tree is a subgraph whose edges form a tree over all the nodes ofthe original graph. A minimum spanning tree is one that minimizes the sum of edge costs. Minimumspanning trees can be found using Kruskal's algorithm ( Listing 7-4)or Prims algorithm (Listing 7-5), forexample. Because the number of edges is fixed, a maximum spanning tree can be found by simplynegating the edge weightsPartitioning and bin packing. Partitioning involves dividing a set of numbers into two sets with equalsums, while the bin packing problem involves packing a set of numbers into a set of"bins"so that thesum in each bin is below a certain limit and so that the number of bins is as small as possible bothproblems are NP-hard. (See Chapter 11.)SAT, Circuit-SAT, k-CNF-SAT These are all varieties of the satisfaction problem (SAT), which asks you todetermine whether a given logical(Boolean)formula can ever be true, if you're allowed to set theariables to whatever truth values you want. The circuit-SAT problem simply uses logical circuits ratherthan formulas, and k-CNF-SAT involves formulas in conjunctive normal form, where each clauseconsists of k literals. The latter can be solved in polynomial time for k= 2. The other problems, as well ask-CNF-SAT for k> 2, are NP-complete. (See Chapter 11.)Searching. This is a very common and extremely important problem. You have a key and want to find anassociated value. This is, for example, how variables work in dynamic languages such as Python It's alsohow you find almost anything on the Internet these days. Two important solutions are hash tables(seeChapter 2)and binary search or search trees(see Chapter 6). Given a probability distribution for theobjects in the data set, optimal search trees can be constructed using dynamic programming(seeChapter 8Sequence comparison. You may want to compare two sequences to know how similar (or dissimilar)they are One way of doing this is to find the longest subsequence the two have in common (longestcommon subsequence) or to find the minimum number of basic edit operations to go from onesequence to the other(so-called edit distance, or Levenshtein distance). These two problems are more orless equivalent; see Chapter 8 for more informationSequence modification Inserting an element into the middle of a linked list is cheap(constant timebut finding a given location is costly (linear time); for an array, the opposite is true(constant lookuplinear insert, because all later elements must be shifted). Appending can be done cheaply for bothstructures, though(see the black box sidebar on list in Chapter 2)Set and vertex covers. a vertex cover is a set of vertices that cover(that is are adjacent to) all the edges ofthe graph. a set cover is a generalization of this idea, where the nodes are replaced with subsets, and youwant to cover the entire set. The problem lies in constraining or minimizing the number ofnodes/subsets. Both problems are NP-hard(see Chapter 11)Shortest paths. This problem involves finding the shortest path from one node to another, from onenode to all the others (or vice versa), or from all nodes to all others The one-to-one, one-to-all, and allto-one cases are solved the same way, normally using bFS for unweighted graphs, DAG shortest path forDAGS, Dijkstra's algorithm for nonnegative edge weights, and Bellman-Ford in the general case Tospeed up things in practice(although without affecting the worst-case running time), one can also usebidirectional Dijkstra, or the a* algorithm For the all pairs shortest paths problem, the algorithms ofchoice are probably Floyd-Warshall or (for sparse graphs) Johnson's algorithm If the edges arenonnegative, Johnson's algorithm is(asymptotically) equivalent to running Dijkstra's algorithm fromevery node (which may be more effective). (For more information on shortest path algorithms, seeChapters 5 and 9. ) Note that the longest path problem(for general graphs )can be used to find Hamiltonpaths, which means that it is NP-hard. This, in fact, means that the shortest path problem is also np277APPENDIX B LIST OF PROBLEMS AND ALGORITHMShard in the general case. If we disallow negative cycles in the graph, however, our polynomial algorithmswill workSorting and element uniqueness. Sorting is an important operation and an essential subroutine forseveral other algorithms In Python, you would normally sort by using the list sort method or thesorted function, both of which use a highly efficient implementation of the timsort algorithm. Otheralgorithms include insertion sort, selection sort, and gnome sort(all of which have a quadratic runningtime), as well as heapsort, mergesort, and quicksort(which are loglinear, although this holds only in theaverage case for quicksort). For the info on the quadratic sorting algorithms, see Chapter 5; for theloglinear(divide-and-conquer)algorithms, see Chapter 6. Deciding whether a set of real numberscontains duplicates cannot (in the worst case) be solved with a running time better than loglinear. Breduction, neither can sortingThe halting problem. Determine whether a given algorithm will terminate with a given input. Theproblem is undecidable(that is, unsolvable)in the general case(see Chapter 11)The knapsack problem and integer programming The knapsack problem involves choosing a valuablesubset of a set of items, under certain constraints In the (bounded) fractional case, you have a certainamount of some substances each of which has a unit value(value per unit of weight). You also have aknapsack that can carry a certain maximum weight. The(greedy) solution is to take as much as you canof each substance, starting with the one with the highest unit value. For the integral knapsack problem,you can only take entire items-fractions aren't allowed Each item has a weight and a value For thebounded case(also known as 0-1 knapsack) you have a limited number of objects of each type (Anotherperspective would be that you have a fixed set of objects that you either take or not. )In the unboundedcase, you can take as many as you want from each of a set of object types(still respecting your carryingcapacity, of course). A special case known as the subset sum problem involves selecting a subset of a setof numbers so that the subset has a given sum. These problems are all NP-hard (see Chapter 11), butadmit pseudopolynomial solutions based on dynamic programming(see Chapter 8). The fractional7) Integer programming is, in some ways, a generalization of the knapsack problem (and is thereforeknapsack case, as explained, can even be solved in polynomial time using a greedy strategy(see Chapterobviously NP-hard). It is simply linear programming where the variables are constrained to be integersTopological sorting Order the nodes of a dag so that all the edges point in the same direction. If theedges represent dependencies, a topological sorting represents an ordering that respects thedependencies. This problem can be solved by a form of reference counting(see Chapter 4)or by usingDFS (see Chapter 5)Traversal. The problem here is to visit all the objects in some connected structure, usually representedas nodes in a graph or tree The idea can be either to visit every node or to visit only those needed tosolve some problem. The latter strategy of ignoring parts of the graph or tree is called pruning and isused (for example) in search trees and in the branch and bound strategy. For a lot on traversal,seeChapter 5Algorithms and Data Structures2-3-trees. Balanced tree structure, allowing insertions, deletions, and search in worst-case o(g n) timeInternal nodes can have two or three children, and the tree is balanced during insertion by splittingnodes, as needed. (See Chapter 6A*. Heuristically guided single source shortest path algorithm. Suitable for large search spaces. Insteadof choosing the node with the lowest distance estimate (as in Dijkstras), the node with the lowest278APPENDIX B LIST OF PROBLEMS AND ALGORITHMSheuristic value(sum of distance estimate and guess for remaining distance) is used. Worst-case runningtime identical to Dijkstra's algorithm (See Listing 9-10.)AA-tree. 2-3-trees simulated using node rotations in a binary tree with level-numbered nodes Worstcase running times ofo(g n) for insertions, deletions, and search. (See Listing 6-6.)Bellman-Ford Shortest path from one node to all others in weighted graphs. Looks for a shortcut alongevery edge n times. Without negative cycles, correct answer guaranteed after n-l iterations. If there'simprovement in the last round, a negative cycle is detected, and the algorithm gives up Running timeo(nm). (See listing 9-2.)Bidirectional Dijkstra Dijkstras algorithm run from start and end node simultaneous with alternatingiterations going to each of the two algorithms. The shortest path is found when the two meet up in themiddle (although some care must be taken at this point). The worst-case running time is just like forDijkstras algorithm (See Listings 9-8 and 9-9.)Binary search trees. a binary tree structure where each node has a key (and usually an associated valueDescendant keys are partitioned by the node key: smaller keys go in the left subtree, and greater keys goin the right On the average, the depth of any node is logarithmic, giving an expected insertion andsearch time of o(g n). Without extra balancing, though(such as in the AA-tree), the tree can becomeunbalanced, giving linear running times. (See Listing 6-2Bisection, binary search. A search procedure that works in a manner similar to search trees, by repeatedhalving the interval of interest in a sorted sequence. The halving is performed by inspecting the middleelement, and deciding whether the sought value must lie to the left or right. Ruefficient implementation can be found in the bisect module (See Chapter/ anning time o(g n). A veryBranch and bound. a general algorithmic design approach Searches a space of solutions in a depth-firstor best-first order by building and evaluating partial solutions. A conservative estimate is kept for theoptimal value, while an optimistic estimate is computed for a partial solution. If the optimistic estimateis worse than the conservative one, the partial solution is not extended, and the algorithm backtracksOften used to solve nP-hard problems. ( See listing 11-2 for a branch-and-bound solution to the 0-1knapsack problem.Breadth-First Search(BFS). Traversing a graph(possibly a tree) level by level, thereby also identifying(unweighted)shortest path. Implemented by using a Fifo queue to keep track of discovered nodesRunning time o(n+m). ( See Listing 5-9Bucket sort. Sort numerical values that are evenly (uniformly) distributed in a given interval by dividingthe interval into n equal-sized buckets and placing the values in them. Expected bucket size is constantso they can be sorted with (for example) insertion sort. Total running time o(n).(See Chapter 4.Busacker-Gowen Finds the cheapest max-flow(or the cheapest flow with a given flow value)in anetwork by using the cheapest augmenting paths in the Ford-Fulkerson approach. These paths arefound using Bellman-Ford or (with some weight adjustments)Dijkstra's algorithm The running time ingeneral depends on the maximum flow value and so is pseudopolynomial. For a maximum flow of k, therunning time is(assuming Dijkstra's algorithm is used)O(km Ig n).(See listing 10-5.)279APPENDIX B LIST OF PROBLEMS AND ALGORITHMSChristofides'algorithm. An approximation algorithm(with an approximation ratio bound of 1. 5)for themetric TSP problem Finds a minimum spanning tree and then a minimum matching among the odddegree nodes of the tree, short-circuiting as needed to make a valid tour of the graph. (See Chapter 11.Counting sort Sort integers with a small value range(with at most o(n) contiguous values) in o(n) timeWorks by counting occurrences and using the cumulative counts to directly place the numbers in theresult, updating the counts as it goes. (See Chapter 4.)DAG Shortest Path Finds the shortest path from one node to all others in a DAG. Works by finding atopological sorting of the nodes and then relaxing all out-edges(or, alternatively, all in-edges) at everynode from left to right Can(because of the lack of cycles) also be used to find longest paths Runningtime o(n +m). (See Listing 8-4.Depth-First Search(DFS) Traversing a graph(possibly a tree) by going in depth and then backtrackingImplemented by using a LiFo queue to keep track of discovered nodes. By keeping track of discover-and finish-times, DFS can also be used as a subroutine in other algorithms(such as topological sortingor Kosaraju's algorithm). Running time o(n +m).(See Listings 5-4, 5-5, and 5-6.there are no negative edge weights. Traverses the graph, repeatedly selecting the next node using a s asDijkstras algorithm. Find the shortest paths from one node to all others in a weighted graph, as longpriority queue(a heap). The priority is the current distance estimate of the node. These estimates areupdated whenever a shortcut is found from a visited node. The running time is o((m+n)Ig n), which issimply o(mIg n) if the graph is connectedDouble-ended queues. FIFO queues implemented using linked lists (or linked lists of arrays), so thatinserting and extracting objects at either end can be done in constant time. An efficient implementationcan be found in the collections. deque class. (See Chapter 5.)Dynamic arrays, vectors. The idea of having extra capacity in an array, so appending is efficient. byrelocating the contents to a bigger array, growing it by a constant factor, when it fills up, appends can beconstant in average(amortized) time. (See ChapterEdmonds-Karp. The concrete instantiation of the Floyd-Warshall method where traversal is performedusing BFS Finds min-cost flow in O(nm2)time.(See Listing 10-4.)Floyd-Warshall Finds shortest paths from each nodes to all others In iteration k, only the first k nodes(in some ordering) are allowed as intermediate nodes along the paths. Extending from k-l involveschecking whether the shortest paths to and from k via the first k-l nodes is shorter than simply goingdirectly via these nodes. ( That is, node k is either used or not, for every shortest path. Running time iso(n). See listing 9-6.)Ford-Fulkerson. A general approach to solving max-flow problems. The method involves repeatedlytraversing the graph to find a so-called augmenting path, a path along which the flow can be increased(augmented). The flow can be increased along an edge if it has extra capacity, or it can be increasedbackward across an edge(that is, canceled) if there is flow along the edge. Thus, the traversal can moveboth forward and backward along the directed edges, depending on the flow across them The runningtime depends on the traversal strategy used. (See listing 10-4)Gale-Shapley Finds a stable set of marriages given preference rankings for a set of men and womenAny unengaged men propose to the most preferred woman they havent proposed to. each woman will2 Note that finding matchings in general (possibly nonbipartite) graphs is not covered in this book280APPENDIX B LIST OF PROBLEMS AND ALGORITHMSchoose her favorite among her current suitors(possibly staying with her fiance) Can be implementedwith quadratic running time. (See Chapter 7.Gnome sort. A simple sorting algorithm with quadratic running time. Probably not an algorithm you'lluse in practice. (See Listing 3-1.)Hashing, hash tables. Look up a key to get the corresponding value, just like in a search tree. Entries arestored in an array, and their positions are found by computing a(pseudorandom, sort of hash value ofthe key. Given a good hash function and enough room in the array, the expected running time ofinsertion, deletion and lookup is o(1).(See Chapter 2.Heaps, heapsort. Heaps are efficient priority queues With linear-time preprocessing, a min-(max-)heap will let you find the smallest (largest)element in constant time and extract or replace it inlogarithmic time. Adding an element can also be done in logarithmic time. Conceptually, a heap is a fullbinary tree where each node is smaller (larger) than its children When modifications are made thisproperty can be repaired with o(g n)operations. In practice, heaps are usually implemented usingarrays(with nodes encoded as array entries). A very efficient implementation can be found in the heapqmodule Heapsort is like selection sort, except that the unsorted region is a heap, so finding the largestelement n times gives a total running time of @(n Ig n).(See Chapter 6.)Huffmans algorithm Builds Huffman trees, which can be used for building optimal prefix codes, forexample. Initially, each element(for example, character in an alphabet) is made into a single-node tree,with a weight equal to its frequency In each iteration, the two lightest trees are picked, combining themwith a new root, giving the new tree a weight equal to the sum of the original two tree weights. This canbe done in loglinear time(or, in fact, in linear time if the frequencies are presorted). (See Listing 7-1)Insertion sort. A simple sorting algorithm with quadratic running time. It works by repeatedly insertinghe next unsorted element in an initial sorted segment of the array. For small data sets, it can actually bethough, you should use list sort or sorted if at all possible. )(See Listing 4-3 cksort (In Pythonpreferable to more advanced(and optimal) algorithms such as merge sort or quInterpolation search. Similar to ordinary binary search, but linear interpolation between the intervalendpoints is used to guess the correct position, rather than simply looking at the middle element. Theworst-case running time is still o(g n), but the average-case running time is o( lg n) for uniformlydistributed data.(Mentioned in the"If You're Curious. " section of Chapter 6Iterative deepening DFS. Repeated runs of DFS, where each run has a limit to how far it can traverse. Forstructures with some fanout, the running time will be the same as for DFS or BfS (that is, o(n+m)). Thepoint is that it has the advantages of BFS (it finds shortest paths, and explores large state spacesconservatively), with the smaller memory footprint of DFS. (See Listing 5-8.Johnsons algorithm Finds shortest paths from every node to all others. basically runs dijkstras fromevery node. However, it uses a trick so that it also works with negative edge weights: it first runsBellman-Ford from a new start node (with added edges to all existing nodes)and then uses the resultingdistances to modify the edge weights of the graph. The modified weights are all nonnegative but are setso that the shortest paths in the original graph will also be the shortest paths in the modified graph.Running time o(mn Ig n). (See Listing 9-4.)Kosaraju's algorithm Finds strongly connected components, using DFS. First, nodes are ordered bytheir finish times. Then the edges are reversed, and another dFS is run, selecting start nodes using thefirst ordering. Running time O(n +m).(See Listing 5-10.281
下载地址
用户评论
只有目录,这也能叫原版高清?