I'm writing a program in python to replace some values of a data frame, the idea is that I have a file called file.txt and looks like this:
A:s:Y:0.1:0.1:0.1:0.2:0.1
B:r:D:0.3:0.5:0.1:0.2:0.2
C:f:C:0.3:0.4:0.2:-0.1:0.4
D:f:C:0.1:0.2:0.1:0.1:0.1
F:f:C:0.1:-0.1:-0.1:0.1:0.1
G:f:C:0.0:-0.1:0.1:0.3:0.4
H:M:D:0.1:0.4:0.1:0.0:0.4
and I want to use as separator the ':::', I want to replace the values of the four column for some strings following this rules:
All the values who belong's to the range1 are going to be replaced for 'N':
range1=[-0.2,-0.1,0,0.1,0.2] -> 'N'
All the values who belong to the range2 are going to be replaced for 'L':
range2=[-0.5,-0.4,-0.3] -> 'L'
All the values who belong to the range3 are going to be replaced with 'H':
range3=[0.3,0.4,0.5]
In order to achieve this I tried the following:
import pandas as pd
df= pd.read_csv('file.txt', sep=':',header=None)
labels=df[3]
range1=[-0.2,-0.1,0,0.1,0.2]
range2=[-0.5,-0.4,-0.3]
range3=[0.3,0.4,0.5]
lookup = {'N': range1, 'L': range2, 'H': range3}
for k, v in lookup.items():
df.loc[df[3].isin(v), 3] = k
for k, v in lookup.items():
df.loc[df[4].isin(v), 4] = k
for k, v in lookup.items():
df.loc[df[5].isin(v), 5] = k
for k, v in lookup.items():
df.loc[df[6].isin(v), 6] = k
for k, v in lookup.items():
df.loc[df[7].isin(v), 7] = k
print(df)
And it works well but i want to avoid the usage of so many fors, I would like to appreciate any suggestion of how to achieve this.
You can use where
instead:
for k, v in lookup.items():
df = df.where(~df.isin(v), k)
This says to retain the values of df
when those values are not contained in v
. Otherwise, replace them with the value k
. The assignment overwrites df
at each iteration to accumulate the categorical labels.
This method works on all columns in one operation, so it only works if you want to replace every instance of a given numeric value with its categorical coded letter.
There is another option for where
that specifies in-place modification, but unfortunately it cannot be used with DataFrames that have mixed column types. In your example, columns 0, 1, and 2 have type object
while the rest have type float
. Thus, pandas
conservatively (and inefficiently) assumes it would have to convert everything to object
to do the in-place overwrite, and raises a TypeError
rather than checking further to see if only same-typed columns are actually affected by the mutation.
For example, this:
df.where(~df.isin(v), k, inplace=True)
will raise TypeError
.
This limitation with Pandas is fairly frustrating. For example, you also cannot use regular pandas assignment to work around it either, as the following also gives the same TypeError
:
for k, v in lookup.items():
df.where(~df.isin(v), inplace=True)
df[df.isnull()] = k # <-- same TypeError
and amazingly setting the try_cast
keyword argument to True
and/or setting the raise_on_error
keyword argument to False
do not affect whether the TypeError
is raised, so you cannot disable this type safety check when using where
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.