PythonforDataAnalysis2ndEditionpdf
Title: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython, 2nd EditionAuthor: Wes McKinneyLength: 550 pagesEdition: 2Language: EnglishPublisher: O'Reilly MediaPublication Date: 2017-09-25ISBN-10: 1491957662ISBN-13: 9781491957660Table of ContentsChapter 1 PreliminariREILLY°Python for Data analysisGet complete instructions for manipulating, processing, cleaning, and "Already a classic of thecrunching datasets in Python. Updated for Python 3.6, the second editionPython data ecosystemof this hands-on guide is packed with practical case studies that show youhow to solve a broad set of data analysis problems effectively. You'll learnthis new edition isthe latest versions of pandas, NumPy, I Python, and Jupyter in the process. updated in key areasWritten by Wes Mckinney the creator of the Python pandas project, this that enhance its uniquebook is a practical, modern introduction to data science tools in Python. value, from Python 3.6It's ideal for analysts new to Python and for Python programmers new to to the latest features indata science and scientific computing. Data files and related material areavailable on githubpandas. By explainingthe why and how ofUse the iPython shell and Jupyter Notebook for exploratoryPythons data toolscomputingthis book helps thea Learn basic and advanced features in NumPy(numericalreader learn to use themPython)effectively in new andGet started with data analysis tools in the pandas librarycreative ways. It is ana Use flexible tools to load clean, transform, merge, and reshapedataessential part of anya Create informative visualizations with matplotlibmodern library of dataApply the pandas groupby facility to slice, dice, and summarizeIntensive computing”datasets-Fernando perezAssistant Professor of statisticsAnalyze and manipulate regular and irregular time series dataBerkeley, IPython creator anda Learn how to solve real-world data analysis problems withcofounder of Project Jupyterthorough, detailed examplesWes Mckinney is the creator of pandas, the popular open source Python libraryfor data analysis. He is an active public speaker and open source Python andC++ developer in the python data science community and the apache softwareFoundation. He works as a software architect in New york city口成囗 Twitter: @oreillymedi0Us$4999CAN$6599ISBN:978-1-491-95766-09781491957660SECOND EDITIONPython for Data AnalysisData Wrangling with Pandas, NumPyand IpythonWes McKinneyBeijing. Boston. Farnham. Sebastopol. Tokyo OREILLYPython for Data Analysisby Wes McKinneyCopyright@ 2018 William McKinney. All rights reservedPrinted in the united states of americaPublished by O reilly Media, InC,1005 Gravenstein Highway North, Sebastopol, CA 95472O Reilly books may be purchased for educational, business, or sales promotional use. Online editions arealsoavailableformosttitles(http://oreilly.com/safari).Formoreinformationcontactourcorporate/institutionalsalesdepartment800-998-9938orcorporate@oreilly.comEditor: Marie BeaugureauIndexer: Lucie haskinsProduction editor Kristen brownInterior Designer: David FutatoCopyeditor: Jasmine KwitynCover Designer: Karen MontgomeryProofreader: Rachel MonaghanIllustrator rebecca demarestOctober 2012First editionOctober 2017:Second editionRevision History for the Second edition2017-09-25: First ReleaseSeehttp://oreilly.com/catalog/errata.csp?isbn=9781491957660forreleasedetailsThe O Reilly logo is a registered trademark of o reilly Media, Inc. Python for data analysis, the coverimage and related trade dress are trademarks of o reilly media, IncThile the publisher and the author have used good faith efforts to ensure that the information andinstructions contained in this work are accurate, the publisher and the author disclaim all responsibilityfor errors or omissions, including without limitation responsibility for damages resulting from the use ofor reliance on this work. Use of the information and instructions contained in this work is at your ownrisk. If any code samples or other technology this work contains or describes is subject to open sourcelicenses or the intellectual property rights of others, it is your responsibility to ensure that your usethereof complies with such licenses and/or rights978-1-491-95766-0Table of contentsPrefaceXI1. Preliminaries1.1 What Is This book about?What Kinds of data?1.2 Why Python for Data analysisPython as glueSolving the Two-Language ProblemWhy not python?1.3 Essential Python LibrariespandasmatplotlibIPython and JupyterSciPscikit-learnstatsmodels11122334445667889991.4 Installation and SetupWindowspple(os x,, macOSGNU/LinuxInstalling or Updating python PackagesPython 2 and Python 3Integrated Development Environments(IDEs)and Text Editors111.5 Community and Conferences1.6 Navigating This bookCode examples2233Data for ExamplesImport conventions14rgon142. Python Language BasicS, IPython, and Jupyter NotebookS ............ 152.1 The Python Interpreter162.2 IPython BasicsRunning the ipython shellRunning the Jupyter Notebook778Tab Completion21Introspection23The % run Command25Executing Code from the Clipboard26Terminal Keyboard Shortcuts27About Magic CommandsMatplotlib Integration292.3 Python Language BasicsLanguage semanticsScalar types38Control flow3. Built-in data Structures functions and file513. 1 Data Structures and sequences51Tuple5154Built-in Sequence functions59dict61setList, Set, and Dict Comprehensions673.2 Functions69Namespaces, Scope, and Local Functions70Returning multiple values71Functions Are ObjectsAnonymous (Lambda) FunctionsCurrying: Partial Argument Application74Generators75Errors and Exception Handling3.3 Files and the Operating SystemBytes and Unicode with Files833.4 Conclusion844. NumPy Basics: Arrays and Vectorized Computation.854.1 The Num Py ndarray: A Multidimensional array object87iv Table of ContentsCreating ndarrays88Data Types for ndarraysArithmetic with Num Py arrays93Basic Indexing and Slicing94Boolean IndexingFancy Indexing102Transposing Arrays and Swapping Axes1034.2 Universal Functions: Fast Element-Wise Array Functions1054.3 Array-Oriented Programming with Arrays108Expressing Conditional Logic as Array Operations109Mathematical and Statistical Methods111Methods for Boolean arrays113Sorting113Unique and Other Set Logic1144.4 File input and output with arrays1154.5 Linear Algebra4.6 Pseudorandom Number generation1184.7 Example: Random Walks119Simulating Many Random Walks at Once1214.8 Conclusion1225. Getting Started with pandas5. 1 Introduction to pandas Data Structures124Series124Data frame128Index objects1345.2 Essential Functionalit136Reindexing136Dropping Entries from an Axis138Indexing, Selection, and Filtering140Integer Indexes145Arithmetic and Data alignment146Function Application and Mapping151Sorting and Ranking153Axis Indexes with Duplicate Labels5.3 Summarizing and Computing Descriptive Statistics158Correlation and Covariance160Unique Values, Value Counts, and Membership1625.4 Conclusion1656. Data Loading Storage and file formats,1676.1 Reading and Writing Data in Text Format167Table of Contents vReading Text Files in Pieces173Writing Data to Text Format175Working with Delimited Formats176JSON Data178XML and hTML: Web Scraping1806.2 Binary Data Formats183Using HDF5 Format184Reading Microsoft Excel Files1866.3 Interacting with Web aPis1876.4 Interacting with Databases1886.5 Conclusion1907. Data Cleaning and Preparation1917. 1 Handling missing Data191Filtering Out Missing Data193Filling In Missing Data1957.2 Data Transformation197Removing duplicates197Transforming Data USing a Function or Mapping198Replacing values200Renaming Axis Indexes201Discretization and Binning203Detecting and Filtering Outliers205Permutation and Random Sampling206Computing Indicator/Dummy variables2087.3 String Manipulation211String object methods211Regular expressions213Vectorized String Functions in pandas2167. 4 Conclusion2198. Data Wrangling: Join, Combine, and Reshape ........8.1 Hierarchical Indexing221Reordering and Sorting levels224Summary statistics by level225Indexing with a Data Frames columns2258.2 Combining and merging datasets227Database-Style Data Frame Joins227Merging on index232Concatenating Along an Axis236Combining Data with Overlap2418.3 Reshaping and Pivoting242Table of contentsReshaping with Hierarchical Indexing243Pivoting"Long"Wide Format246Pivoting to"Long"Format2498.4 Conclusion2519. Plotting and visualization2539. 1 A Brief matplotlib API Primer253Figures and Subplots255Colors, Markers, and Line styles259Ticks, Labels, and Legends261Annotations and Drawing on a Subplot265Saving Plots to File267matplotlib Configuration2689.2 Plotting with pandas and seaborn268Line plots269Bar plots272Histograms and Density Plots277Scatter or point plots280Facet Grids and Categorical Data2839. 3 Other Python visualization Tools2859.4 Conclusion28610. Data Aggregation and group operations28710. 1 Group by mechanics288Iterating Over groups291Selecting a Column or Subset of Columns293grouping with Dicts and series294Grouping with Functions295Grouping by Index levels29510.2 Data aggregation296Column-Wise and Multiple Function Application298Returning Aggregated Data Without Row Indexes30110.3 Apply: General split-apply-combine302Suppressing the group Keys304Quantile and bucket analysis305Example: Filling Missing Values with Group-Specific Values306Example: Random Sampling and permutation308Example: Group Weighted Average and Correlation310Example: Group-Wise Linear Regression31210.4 Pivot Tables and Cross-Tabulation313Cross-Tabulations: Crosstab31510.5 Conclusion316Table of contents|ⅶi
下载地址
用户评论
重要的书。