Series与DataFrame索引、选取和过滤

it2022-05-05  201

Series

from pandas import Series import numpy as np obj = Series(np.arange(4),index=['a','b','c','d']) obj a 0 b 1 c 2 d 3 dtype: int64 obj['b'] 1 obj[1] 1 obj[2:4] c 2 d 3 dtype: int64 obj[['b','a','d']] b 1 a 0 d 3 dtype: int64 obj[[1,3]] b 1 d 3 dtype: int64 obj[obj < 2] a 0 b 1 dtype: int64

DataFrame

索引列

from pandas import DataFrame data = DataFrame(np.arange(16).reshape((4,4,)), index=['Ohio','Colorado','Utah','New York'],columns=['one','two','three','four']) data onetwothreefourOhio0123Colorado4567Utah891011New York12131415 data['two'] Ohio 1 Colorado 5 Utah 9 New York 13 Name: two, dtype: int64 data[['three','one']] threeoneOhio20Colorado64Utah108New York1412

索引行

data[:2] # 返回行 onetwothreefourOhio0123Colorado4567 data['three'] > 5 Ohio False Colorado True Utah True New York True Name: three, dtype: bool data[data['three'] > 5] onetwothreefourColorado4567Utah891011New York12131415 类型说明obj[val]选取DataFrame的单个列或一组列. 在一些特殊情况下会比较便利:布尔型数组(过滤行)、切片(行切片)、布尔型DataFrame(根据条件设置值)obj.ix[val]选取DataFrame的单个行或一组行obj.ix[:,val]选取单个列或列子集obj.ix[val1,val2]同时选取行和列reindex方法将一个或多个轴匹配到新索引xs方法根据标签选取单行或单列,并返回一个Seriesicol、irow方法根据整数位置选取单行或单列,并返回一个Seriesget_value,set_value方法根据行标签和列标签选取单个值 data.ix['Colorado',['two','three']] two 5 three 6 Name: Colorado, dtype: int64 data.ix[['Colorado','Utah'],[3,0,1]] fouronetwoColorado745Utah1189 data.ix[2] #返回列 one 8 two 9 three 10 four 11 Name: Utah, dtype: int64 data.ix[:'Utah','two'] Ohio 1 Colorado 5 Utah 9 Name: two, dtype: int64 data.ix[data.three > 5, :3] onetwothreeColorado456Utah8910New York121314

赋值

data < 5 onetwothreefourOhioTrueTrueTrueTrueColoradoTrueFalseFalseFalseUtahFalseFalseFalseFalseNew YorkFalseFalseFalseFalse data[data < 5] = 0 data onetwothreefourOhio0000Colorado0567Utah891011New York12131415

带有重复值的轴索引

from pandas import Series,DataFrame import numpy as np obj = Series(range(5),index=['a','a','b','b','c']) obj a 0 a 1 b 2 b 3 c 4 dtype: int64

index.is_unique 属性 判断索引是否唯一

obj.index.is_unique False obj['a'] a 0 a 1 dtype: int64 df = DataFrame(np.random.randn(4,3),index=['a','a','b','b']) df 012a-0.9881050.6624671.778395a-1.0214170.4701860.754296b0.0355190.598257-1.034743b0.1197802.0947300.799680 df.ix['b'] 012b0.0355190.598257-1.034743b0.1197802.0947300.799680

最新回复(0)