简体   繁体   English

熊猫中的数据框列的索引

[英]Indices of dataframe columns in Pandas

The main problem is to create a list full of categorical factor's indices 主要问题是创建一个充满分类因子索引的列表

There is dataframe with lots of columns types of which were detemined before importing file with pd.read_csv() . 在使用pd.read_csv()导入文件之前,已经确定了具有许多列类型的数据pd.read_csv()

dtypes = {
    ...
    'Format_type': 'category',
    'Geo_new': 'category',
    'Age_min': 'int16',
    'Age_max': 'int16',
    'Sex': 'category',
    ...}

So I made a table with columns names and their indices , and than take categorical colums by myself 因此,我用列名及其索引创建了一个表,然后我自己处理了分类

col_list = [i for i in (df.columns.get_values())]
idx_list = [i for i in range(len(df.columns.get_values()))]
column_num = pd.DataFrame(data = {'column_name': col_list,
                                  'idx_list': idx_list})
column_num

Than get table of columns name column_name and indices idx_list 比获取列名column_name和索引idx_list

column_name idx_list
...
Format_type 5
Geo_new     6
Age_min     7
Age_max     8
Sex         9
...

and insert categorical columns indices in the list: 并在列表中插入分类列索引:

categorical_features = [...5, 6, 9...]

Thus, i fill list by myself. 因此,我自己填写清单。 Is there the way to create list of columns, which values are calegory automatically? 是否有创建的列清单的方式,它的值是calegory自动?

I believe you need DataFrame.select_dtypes with Index.get_indexer for indices: 我相信你需要DataFrame.select_dtypesIndex.get_indexer的指标:

df = pd.DataFrame({
        'A':list('abcdef'),
         'B':pd.Categorical([4,5,4,5,5,4]),
         'C':[7,8,9,4,2,3],
         'D': pd.Categorical([1,3,5,7,1,0]),
         'E':[5,3,6,9,2,4],
         'F':list('aaabbb')
})

c = df.select_dtypes('category').columns
print (c)
Index(['B', 'D'], dtype='object')

i = df.columns.get_indexer(df.select_dtypes('category').columns)
print (i)
[1 3]

Also your code should be simlify: 另外,您的代码应简化:

col_list = df.columns.tolist()
idx_list = range(len(col_list))
column_num = pd.DataFrame(data = {'column_name': col_list, 'idx_list': idx_list})

还有另一种方法!

categorical_list = list(np.where(df.dtypes == 'category')[0])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM