简体   繁体   English

Pandas read_csv dtype 指定除一列之外的所有列

[英]Pandas read_csv dtype specify all columns but one

I've a CSV file.我有一个 CSV 文件。 Most of it's values I want to read as string, but I want to read a column as bool if the column with the given title exists..我想将其中的大部分值读为字符串,但是如果具有给定标题的列存在,我想将列读为 bool。

Because the CSV file has a lots of columns, I don't want to specify on each column the datatype directly and give something like this:因为 CSV 文件有很多列,我不想在每一列上直接指定数据类型并给出如下内容:

data = read_csv('sample.csv', dtype={'A': str, 'B': str, ..., 'X': bool})

Is it possible to define the string type on each column but one and read an optional column as a bool at the same time?是否可以在每一列上定义字符串类型,但同时将可选列作为 bool 读取?

My current solution is the following (but it's very unefficient and slow):我目前的解决方案如下(但它非常低效且缓慢):

data = read_csv('sample.csv', dtype=str) # reads all column as string
if 'X' in data.columns:
    l = lambda row: True if row['X'] == 'True' else False if row['X'] == 'False' else None
    data['X'] = data.apply(l, axis=1)

UPDATE: Sample CSV:更新:示例 CSV:

A;B;C;X
a1;b1;c1;True
a2;b2;c2;False
a3;b3;c3;True

Or the same can ba without the 'X' column (because the column is optional):或者同样可以 ba 没有 'X' 列(因为该列是可选的):

A;B;C
a1;b1;c1
a2;b2;c2
a3;b3;c3

You can first filter columns contains value X with boolean indexing and then replace : 您可以先使用boolean indexing过滤containsX列,然后replace

cols = df.columns[df.columns.str.contains('X')]
df[cols] = df[cols].replace({'True': True, 'False': False})

Or if need filter column X : 或者如果需要过滤列X

cols = df.columns[df.columns == 'X']
df[cols] = df[cols].replace({'True': True, 'False': False})

Sample: 样品:

import pandas as pd

df = pd.DataFrame({'A':['a1','a2','a3'],
                   'B':['b1','b2','b3'],
                   'C':['c1','c2','c3'],
                   'X':['True','False','True']})

print (df)
    A   B   C      X
0  a1  b1  c1   True
1  a2  b2  c2  False
2  a3  b3  c3   True
print (df.dtypes)
A    object
B    object
C    object
X    object
dtype: object

cols = df.columns[df.columns.str.contains('X')]
print (cols)

Index(['X'], dtype='object')

df[cols] = df[cols].replace({'True': True, 'False': False})

print (df.dtypes)
A    object
B    object
C    object
X      bool
dtype: object
print (df)

    A   B   C      X
0  a1  b1  c1   True
1  a2  b2  c2  False
2  a3  b3  c3   True

why not use bool() data type. 为什么不使用bool()数据类型。 bool() evaluates to true if a parameter is passed and the parameter is not False, None, '', or 0 如果传递参数且参数不是False,None,''或0,则bool()计算结果为true

if 'X' in data.columns:
    try:
        l = bool(data.columns['X'].replace('False', 0))
    except:
        l = None
    data['X'] = data.apply(l, axis=1)

Actually you don't need any special handling when using read_csv from pandas (tested on version 0.17). 实际上,当您从pandas使用read_csv(在版本0.17上测试)时,您不需要任何特殊处理。 Using your example file with X: 将您的示例文件与X一起使用:

import pandas as pd

df = pd.read_csv("file.csv", delimiter=";")
print(df.dtypes)

A    object
B    object
C    object
X      bool
dtype: object

For those looking for an answer to the question in the title, (in this case, set all to string except for the index as int) you can do something like this, if you know the amount of columns you have:对于那些在标题中寻找问题答案的人(在这种情况下,将所有设置为字符串,除了索引为 int)您可以执行以下操作,如果您知道您拥有的列数:

dtype = dict(zip(range(9),np.int16 + [str for _ in range(8)]))
dframe = pd.read_csv('../files/file.csv', dtype=dtype)

Credit to Anton vBR in this question .在这个问题中归功于 Anton vBR

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM