I'm using the pandas library to make a simple program.
First of all I have a .csv file called small.csv, which contains the following structure.
1,4.0,?,?,none,?
2,2.0,3.0,?,none,38
2,2.5,2.5,?,tc,39
On my main function I have the following code:
def main():
# my code here
fname = "/home/sergio/PycharmProjects/practica2/small.csv"
sep = ","
vars = ["x1", "x2", "x3", "x4", "x5", "x6"]
na_values = ["?", "none"]
prefix = "col_"
df = da.load_data(fname, delimiter=sep, nan=na_values,
header=False, pref=prefix)
print df
The explanation of the main function is the following, depending on the parameters that I pass to the load_data function, you will have to load the data from my .csv file in one way or another.
These are the possible arguments and the function that they develop:
My load_data function:
def load_data(inputFile, delimiter=",", nan=None, header=True,
varNames=None, pref="var_"):
data = DataFrame()
if header == False:
if not varNames:
print "header=false and varNames not defined"
data = pd.read_csv(inputFile, sep=delimiter, na_values=nan, prefix=pref, header=None)
listaNum = list(range(len(data.columns)))
data.columns = listaNum
else: # varNames defined
data = pd.read_csv(inputFile, sep=delimiter, na_values=nan, prefix=pref)
else:
return data
This function is responsible for displaying the data based on the parameters we have entered, varying the output depending on the case
One of the cases that I have to evaluate is the following.
if header = False and the variable varsNames, which indicates the name of the column is not passed to that function (Null), I have to assign a numerical value from 0 to the number of columns that have, that is, 0 1 2 ... up to max columns.
Also in this case I would have to add the prefix that we passed to that number that defines the column, in this case it would be "col_".
The result woulb be the following one:
col_0 col_1 col_2 col_3 col_4 col_5
0 1 4.0 NaN NaN NaN NaN
1 2 2.0 3.0 NaN NaN 38.0
2 2 2.5 2.5 NaN tc 39.0
Here is my problem, in the case I have commented that we have to add a prefix to each of the numeric columns, with the variable prefix, I could do it by hand, that is, to each of the elements of my column list, add the string "col_".
However I think it is wrong, since I do not use the "prefix" option that can be passed through the read_csv function, I have tried it nevertheless and it does not do it correctly.
This is my result, and as you can see although I pass the prefix argument to read_csv function, it ignores it.
0 1 2 3 4 5
0 1 4.0 NaN NaN NaN NaN
1 2 2.0 3.0 NaN NaN 38.0
2 2 2.5 2.5 NaN tc 39.0
In addition another one of the doubts, is that since I am calculating the numerical value that I have to assign to the columns, I do it modifying the dataframe that already has been generated and I believe that it is not the most optimal form to realize it.
This works well for me on v0.21
.
import io
text = \
'''1,4.0,?,?,none,?
2,2.0,3.0,?,none,38
2,2.5,2.5,?,tc,39'''
buf = io.StringIO(text)
df = pd.read_csv(buf, na_values=['?', 'none'], header=None, prefix='col_')
df
col_0 col_1 col_2 col_3 col_4 col_5
0 1 4.0 NaN NaN NaN NaN
1 2 2.0 3.0 NaN NaN 38.0
2 2 2.5 2.5 NaN tc 39.0
Another trick (if this still doesn't work) would be to use add_prefix
:
df
0 1 2 3 4 5
0 1 4.0 NaN NaN NaN NaN
1 2 2.0 3.0 NaN NaN 38.0
2 2 2.5 2.5 NaN tc 39.0
df = df.add_prefix('col_')
df
col_0 col_1 col_2 col_3 col_4 col_5
0 1 4.0 NaN NaN NaN NaN
1 2 2.0 3.0 NaN NaN 38.0
2 2 2.5 2.5 NaN tc 39.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.