[英]Dataframe and read_csv function - Python
I'm using the pandas library to make a simple program. 我正在使用pandas库来制作一个简单的程序。
First of all I have a .csv file called small.csv, which contains the following structure. 首先,我有一个名为small.csv的.csv文件,它包含以下结构。
1,4.0,?,?,none,?
2,2.0,3.0,?,none,38
2,2.5,2.5,?,tc,39
On my main function I have the following code: 在我的主要功能上,我有以下代码:
def main():
# my code here
fname = "/home/sergio/PycharmProjects/practica2/small.csv"
sep = ","
vars = ["x1", "x2", "x3", "x4", "x5", "x6"]
na_values = ["?", "none"]
prefix = "col_"
df = da.load_data(fname, delimiter=sep, nan=na_values,
header=False, pref=prefix)
print df
The explanation of the main function is the following, depending on the parameters that I pass to the load_data function, you will have to load the data from my .csv file in one way or another. 主函数的解释如下,根据我传递给load_data函数的参数,您必须以某种方式从我的.csv文件加载数据。
These are the possible arguments and the function that they develop: 这些是可能的参数和它们开发的功能:
My load_data function: 我的load_data函数:
def load_data(inputFile, delimiter=",", nan=None, header=True,
varNames=None, pref="var_"):
data = DataFrame()
if header == False:
if not varNames:
print "header=false and varNames not defined"
data = pd.read_csv(inputFile, sep=delimiter, na_values=nan, prefix=pref, header=None)
listaNum = list(range(len(data.columns)))
data.columns = listaNum
else: # varNames defined
data = pd.read_csv(inputFile, sep=delimiter, na_values=nan, prefix=pref)
else:
return data
This function is responsible for displaying the data based on the parameters we have entered, varying the output depending on the case 此功能负责根据我们输入的参数显示数据,根据具体情况改变输出
One of the cases that I have to evaluate is the following. 我必须评估的一个案例如下。
if header = False and the variable varsNames, which indicates the name of the column is not passed to that function (Null), I have to assign a numerical value from 0 to the number of columns that have, that is, 0 1 2 ... up to max columns. 如果header = False并且变量varsNames(表示列的名称未传递给该函数(Null)),我必须将数值从0分配给具有的列数,即0 1 2。 ..最多列数。
Also in this case I would have to add the prefix that we passed to that number that defines the column, in this case it would be "col_". 同样在这种情况下,我必须添加我们传递给定义列的那个数字的前缀,在这种情况下它将是“col_”。
The result woulb be the following one: 结果如下:
col_0 col_1 col_2 col_3 col_4 col_5
0 1 4.0 NaN NaN NaN NaN
1 2 2.0 3.0 NaN NaN 38.0
2 2 2.5 2.5 NaN tc 39.0
Here is my problem, in the case I have commented that we have to add a prefix to each of the numeric columns, with the variable prefix, I could do it by hand, that is, to each of the elements of my column list, add the string "col_". 这是我的问题,在我评论过的情况下,我们必须为每个数字列添加一个前缀,使用变量前缀,我可以手动完成,也就是说,我的列列表中的每个元素,添加字符串“col_”。
However I think it is wrong, since I do not use the "prefix" option that can be passed through the read_csv function, I have tried it nevertheless and it does not do it correctly. 但是我认为这是错误的,因为我没有使用可以通过read_csv函数传递的“前缀”选项,但我已经尝试了它并且它没有正确地执行它。
This is my result, and as you can see although I pass the prefix argument to read_csv function, it ignores it. 这是我的结果,正如您所看到的,虽然我将前缀参数传递给read_csv函数,但它忽略了它。
0 1 2 3 4 5
0 1 4.0 NaN NaN NaN NaN
1 2 2.0 3.0 NaN NaN 38.0
2 2 2.5 2.5 NaN tc 39.0
In addition another one of the doubts, is that since I am calculating the numerical value that I have to assign to the columns, I do it modifying the dataframe that already has been generated and I believe that it is not the most optimal form to realize it. 另外一个疑问是,因为我正在计算我必须分配给列的数值,所以我会修改已经生成的数据帧,并且我认为它不是实现的最佳形式它。
This works well for me on v0.21
. 这对我来说非常适合
v0.21
。
import io
text = \
'''1,4.0,?,?,none,?
2,2.0,3.0,?,none,38
2,2.5,2.5,?,tc,39'''
buf = io.StringIO(text)
df = pd.read_csv(buf, na_values=['?', 'none'], header=None, prefix='col_')
df
col_0 col_1 col_2 col_3 col_4 col_5
0 1 4.0 NaN NaN NaN NaN
1 2 2.0 3.0 NaN NaN 38.0
2 2 2.5 2.5 NaN tc 39.0
Another trick (if this still doesn't work) would be to use add_prefix
: 另一个技巧(如果这仍然不起作用)将是使用
add_prefix
:
df
0 1 2 3 4 5
0 1 4.0 NaN NaN NaN NaN
1 2 2.0 3.0 NaN NaN 38.0
2 2 2.5 2.5 NaN tc 39.0
df = df.add_prefix('col_')
df
col_0 col_1 col_2 col_3 col_4 col_5
0 1 4.0 NaN NaN NaN NaN
1 2 2.0 3.0 NaN NaN 38.0
2 2 2.5 2.5 NaN tc 39.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.