数据科学初学者必知的NumPy基础知识

JusPnf 7年前
   <p>本文介绍了一些 NumPy 基础知识，适合数据科学初学者学习掌握。</p>    <p>NumPy（Numerical Python）是 Python 中的一个线性代数库。对每一个数据科学或机器学习 Python 包而言，这都是一个非常重要的库，SciPy（Scientific Python）、Mat-plotlib（plotting library）、Scikit-learn 等都在一定程度上依赖 NumPy。</p>    <p>对数组执行数学运算和逻辑运算时，NumPy 是非常有用的。在用 Python 对 n 维数组和矩阵进行运算时，NumPy 提供了大量有用特征。</p>    <p>这篇教程介绍了数据科学初学者需要了解的 NumPy 基础知识，包括如何创建 NumPy 数组、如何使用 NumPy 中的广播机制、如何获取值以及如何操作数组。更重要的是，大家可以通过本文了解到 NumPy 在 Python 列表中的优势：更简洁、更快速地读写项、更方便、更高效。</p>    <p>本教程将使用 Jupyter notebook 作为编辑器。</p>    <p>让我们开始吧！</p>    <p>安装 NumPy</p>    <p>如果你已经装有 Anaconda，那么你可以使用以下命令通过终端或命令提示符安装 NumPy：</p>    <pre>  <code class="language-python">conda install numpy</code></pre>    <p>如果你没有 Anaconda，那么你可以使用以下命令从终端上安装 NumPy：</p>    <pre>  <code class="language-python">pip install numpy</code></pre>    <p>安装好 NumPy 后，你就可以启动 Jupyter notebook 开始学习了。接下来从 NumPy 数组开始。</p>    <p>NumPy 数组</p>    <p>NumPy 数组是包含相同类型值的网格。NumPy 数组有两种形式：向量和矩阵。严格地讲，向量是一维数组，矩阵是多维数组。在某些情况下，矩阵只有一行或一列。</p>    <p>首先将 NumPy 导入 Jupyter notebook：</p>    <pre>  <code class="language-python">import numpy as np</code></pre>    <p>从 Python 列表中创建 NumPy 数组</p>    <p>我们先创建一个 Python 列表：</p>    <pre>  <code class="language-python">my_list = [1, 2, 3, 4, 5]</code></pre>    <p>通过这个列表，我们可以简单地创建一个名为 my_numpy_list 的 NumPy 数组，显示结果：</p>    <pre>  <code class="language-python">my_numpy_list = np.array(my_list)  my_numpy_list  #This line show the result of the array generated</code></pre>    <p>刚才我们将一个 Python 列表转换成一维数组。要想得到二维数组，我们要创建一个元素为列表的列表，如下所示：</p>    <pre>  <code class="language-python">second_list = [[1,2,3], [5,4,1], [3,6,7]]  new_2d_arr = np.array(second_list)  new_2d_arr  #This line show the result of the array generated</code></pre>    <p>我们已经成功创建了一个有 3 行 3 列的二维数组。</p>    <p>使用 arange() 内置函数创建 NumPy 数组</p>    <p>与 Python 的 range() 内置函数相似，我们可以用 arange() 创建一个 NumPy 数组。</p>    <pre>  <code class="language-python">my_list = np.arange(10)  #OR  my_list = np.arange(0,10)</code></pre>    <p>这产生了 0~10 的十个数字。</p>    <p>要注意的是 arange() 函数中有三个参数。第三个参数表示步长。例如，要得到 0~10 中的偶数，只需要将步长设置为 2 就可以了，如下所示：</p>    <pre>  <code class="language-python">my_list = np.arange(0,11,2)</code></pre>    <p>还可以创建有 7 个 0 的一维数组：</p>    <pre>  <code class="language-python">my_zeros = np.zeros(7)</code></pre>    <p>也可以创建有 5 个 1 的一维数组：</p>    <pre>  <code class="language-python">my_ones = np.ones(5)</code></pre>    <p>同样，我们可以生成内容都为 0 的 3 行 5 列二维数组：</p>    <pre>  <code class="language-python">two_d = np.zeros((3,5))</code></pre>    <p>使用 linspace() 内置函数创建 NumPy 数组</p>    <p>linspace() 函数返回的数字都具有指定的间隔。也就是说，如果我们想要 1 到 3 中间隔相等的 15 个点，我们只需使用以下命令：</p>    <pre>  <code class="language-python">lin_arr = np.linspace(1, 3, 15)</code></pre>    <p>该命令可生成一维向量。</p>    <p>与 arange() 函数不同，linspace() 的第三个参数是要创建的数据点数量。</p>    <p>在 NumPy 中创建一个恒等矩阵</p>    <p>处理线性代数时，恒等矩阵是非常有用的。一般而言，恒等矩阵是一个二维方矩阵，也就是说在这个矩阵中列数与行数相等。有一点要注意的是，恒等矩阵的对角线都是 1，其他的都是 0。恒等矩阵一般只有一个参数，下述命令说明了要如何创建恒等矩阵：</p>    <pre>  <code class="language-python">my_matrx = np.eye(6)    #6 is the number of columns/rows you want</code></pre>    <p>用 NumPy 创建一个随机数组成的数组</p>    <p>我们可以使用 rand()、randn() 或 randint() 函数生成一个随机数组成的数组。</p>    <ul>     <li> <p>使用 random.rand()，我们可以生成一个从 0~1 均匀产生的随机数组成的数组。</p> </li>    </ul>    <p>例如，如果想要一个由 4 个对象组成的一维数组，且这 4 个对象均匀分布在 0~1，可以这样做：</p>    <pre>  <code class="language-python">my_rand = np.random.rand(4)</code></pre>    <p>如果我们想要一个有 5 行 4 列的二维数组，则：</p>    <pre>  <code class="language-python">my_rand = np.random.rand(5, 4)  my_rand</code></pre>    <ul>     <li> <p>使用 randn()，我们可以从以 0 为中心的标准正态分布或高斯分布中产生随机样本。例如，我们这样生成 7 个随机数：</p> </li>    </ul>    <pre>  <code class="language-python">my_randn = np.random.randn(7)  my_randn</code></pre>    <p>绘制结果后会得到一个正态分布曲线。</p>    <p>同样地，如需创建一个 3 行 5 列的二维数组，这样做即可：</p>    <pre>  <code class="language-python">np.random.randn(3,5)</code></pre>    <ul>     <li> <p>最后，我们可以使用 randint() 函数生成整数数组。randint() 函数最多可以有三个参数：最小值（包含），最大值（不包含）以及数组的大小。</p> </li>    </ul>    <pre>  <code class="language-python">np.random.randint(20) #generates a random integer exclusive of 20  np.random.randint(2, 20) #generates a random integer including 2 but excluding 20  np.random.randint(2, 20, 7) #generates 7 random integers including 2 but excluding 20</code></pre>    <p>将一维数组转换成二维数组</p>    <p>先创建一个有 25 个随机整数的一维数组：</p>    <pre>  <code class="language-python">arr = np.random.rand(25)</code></pre>    <p>然后使用 reshape() 函数将其转换为二维数组：</p>    <pre>  <code class="language-python">arr.reshape(5,5)</code></pre>    <p>注意：reshape() 仅可转换成行列数目相等，且行列数相乘后要与元素数量相等。上例中的 arr 包含 25 个元素，因此只能重塑为 5*5 的矩阵。</p>    <p>定位 NumPy 数组中的最大值和最小值</p>    <p>使用 max() 和 min() 函数，我们可以得到数组中的最大值或最小值：</p>    <pre>  <code class="language-python">arr_2 = np.random.randint(0, 20, 10)   arr_2.max() #This gives the highest value in the array   arr_2.min() #This gives the lowest value in the array</code></pre>    <p>使用 argmax() 和 argmin() 函数，我们可以定位数组中最大值和最小值的索引：</p>    <pre>  <code class="language-python">arr_2.argmax() #This shows the index of the highest value in the array   arr_2.argmin() #This shows the index of the lowest value in the array</code></pre>    <p>假设存在大量数组，而你需要弄清楚数组的形态，你想知道这个数组是一维数组还是二维数组，只需要使用 shape 函数即可：</p>    <pre>  <code class="language-python">arr.shape</code></pre>    <p>从 NumPy 数组中索引／选择多个元素（组）</p>    <p>在 NumPy 数组中进行索引与 Python 类似，只需输入想要的索引即可：</p>    <pre>  <code class="language-python">my_array = np.arange(0,11)  my_array[8]  #This gives us the value of element at index 8</code></pre>    <p>为了获得数组中的一系列值，我们可以使用切片符「:」，就像在 Python 中一样：</p>    <pre>  <code class="language-python">my_array[2:6] #This returns everything from index 2 to 6(exclusive)  my_array[:6] #This returns everything from index 0 to 6(exclusive)  my_array[5:] #This returns everything from index 5 to the end of the array.</code></pre>    <p>类似地，我们也可以通过使用 [ ][ ] 或 [,] 在二维数组中选择元素。</p>    <p>使用 [ ][ ] 从下面的二维数组中抓取出值「60」：</p>    <pre>  <code class="language-python">two_d_arr = np.array([[10,20,30], [40,50,60], [70,80,90]])  two_d_arr[1][2] #The value 60 appears is in row index 1, and column index 2</code></pre>    <p>使用 [,] 从上面的二维数组中抓取出值「20」：</p>    <pre>  <code class="language-python">two_d_arr[0,1]</code></pre>    <p>也可以用切片符抓取二维数组的子部分。使用下面的操作从数组中抓取一些元素：</p>    <pre>  <code class="language-python">two_d_arr[:1, :2]           # This returns [[10, 20]]  two_d_arr[:2, 1:]           # This returns ([[20, 30], [50, 60]])  two_d_arr[:2, :2]           #This returns ([[10, 20], [40, 50]])</code></pre>    <p>我们还可以索引一整行或一整列。只需使用索引数字即可抓取任意一行：</p>    <pre>  <code class="language-python">two_d_arr[0]    #This grabs row 0 of the array ([10, 20, 30])  two_d_arr[:2] #This grabs everything before row 2 ([[10, 20, 30], [40, 50, 60]])</code></pre>    <p>还可以使用 &、|、<、> 和 == 运算符对数组执行条件选择和逻辑选择，从而对比数组中的值和给定值：</p>    <pre>  <code class="language-python">new_arr = np.arange(5,15)  new_arr > 10 #This returns TRUE where the elements are greater than 10 [False, False, False, False, False, False,  True,  True,  True, True]</code></pre>    <p>现在我们可以输出符合上述条件的元素：</p>    <pre>  <code class="language-python">bool_arr = new_arr > 10  new_arr[bool_arr]  #This returns elements greater than 10 [11, 12, 13, 14]  new_arr[new_arr>10] #A shorter way to do what we have just done</code></pre>    <p>组合使用条件运算符和逻辑运算符，我们可以得到值大于 6 小于 10 的元素：</p>    <pre>  <code class="language-python">new_arr[(new_arr>6) & (new_arr<10)]</code></pre>    <p>预期结果为：([7, 8, 9])</p>    <p>广播机制</p>    <p>广播机制是一种快速改变 NumPy 数组中的值的方式。</p>    <pre>  <code class="language-python">my_array[0:3] = 50  #Result is:   [50, 50, 50, 3, 4,  5,  6,  7,  8,  9, 10]</code></pre>    <p>在这个例子中，我们将索引为 0 到 3 的元素的初始值改为 50。</p>    <p>对 NumPy 数组执行数学运算</p>    <pre>  <code class="language-python">arr = np.arange(1,11)  arr * arr              #Multiplies each element by itself   arr - arr              #Subtracts each element from itself  arr + arr              #Adds each element to itself  arr / arr              #Divides each element by itself</code></pre>    <p>我们还可以对数组执行标量运算，NumPy 通过广播机制使其成为可能：</p>    <pre>  <code class="language-python">arr + 50              #This adds 50 to every element in that array</code></pre>    <p>NumPy 还允许在数组上执行通用函数，如平方根函数、指数函数和三角函数等。</p>    <pre>  <code class="language-python">np.sqrt(arr)     #Returns the square root of each element   np.exp(arr)     #Returns the exponentials of each element  np.sin(arr)     #Returns the sin of each element  np.cos(arr)     #Returns the cosine of each element  np.log(arr)     #Returns the logarithm of each element  np.sum(arr)     #Returns the sum total of elements in the array  np.std(arr)     #Returns the standard deviation of in the array</code></pre>    <p>我们还可以在二维数组中抓取行或列的总和：</p>    <pre>  <code class="language-python">mat = np.arange(1,26).reshape(5,5) mat.sum()         #Returns the sum of all the values in mat mat.sum(axis=0)   #Returns the sum of all the columns in mat mat.sum(axis=1)   #Returns the sum of all the rows in mat</code></pre>    <p>现在，这篇 NumPy 教程进入了尾声！希望对大家有所帮助。</p>    <p> </p>    <p>来自：https://www.jiqizhixin.com/articles/2018-04-21-7</p>    <p> </p>
数据科学初学者必知的NumPy基础知识

相关经验

目录