I'm creating an initial df from a csv file like the following:
knobs_df = pd.read_csv(knobs_container)
name type values
0 algorithm string one;two;three
1 threads int32_t 1;2;3;4;5;6;7;8;9;10;11;12;13;14;15
For every row I extract into k_values
and k_type
the type column and the values column as dictionaries.
k_values = {}
k_types = {}
for row in knobs_df.itertuples(index=False):
k_values[row[0]] = row[2].split(';')
k_types[row[0]] = row[1]
{'algorithm': ['one', 'two', 'three'], 'threads': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15']}
{'algorithm': 'string', 'threads': 'int32_t'}
From the k_values
dictionary I generate a full grid containing all the possible combinations.
algorithm threads
0 one 1
1 two 1
2 three 1
3 one 2
4 two 2
.. ... ...
88 two 14
89 three 14
90 one 15
91 two 15
92 three 15
Having a list of constraints (Python expressions) like the following
['threads < 20', 'algorithm != "two"']
I'd like to filter the full-grid dataframe using the query
method from pandas.DataFrame
. Is there a way to assign each column with its coresponding dtype based on the k_types
dictionary? I need to do this because every column has potentially an independent type and, for instance, the query method fails in filtering the 'threads' column since all columns are inferred by default to 'str' during creation. Problem is that since the types are C++ datatypes originally, I don't know if there's a way to achieve this.
Possible k_types are:
[string, short int, int8_t, int16_t, int32_t, int64_t, uint8_t, uint16_t, uint32_t, uint64_t, char, int, long int, long long int, int_fast8_t, int_fast16_t, int_fast32_t, int_fast64_t, int_least8_t, int_least_16_t, int_least32_t, int_least64_t, unsigned short int, unsigned char, unsigned int, unsigned long int, unsigned long long int, uint_fast8_t, uint_fast16_t, uint_fast32_t, uint_fast64_t, uint_least8_t, uint_least16_t, uint_least32_t, uint_least64_t, intmax_t, intptr_t, uintmax_t, uintptr_t, float, double, long double]
i managed to find an incomplete solution due to some misunderstanding. please let me know how to make this solution fit your needs:
t_df = df.T
names = t_df.loc['name']
dtypes = t_df.loc['type']
t_df.columns = names
t_df = t_df.iloc[2:]
dtype_conv = {'string':str,'int32_t':int}
for dtype,name in zip(dtypes,names):
t_df[name] = t_df[name].str.split(';')
t_df=t_df.explode(name)
t_df[name] =t_df[name].astype(dtype_conv[dtype])
t_df.sort_values('threads').reset_index(drop=True)
output:
algorithm threads
0 one 1
1 two 1
2 three 1
3 one 2
4 two 2
5 three 2
6 one 3
7 two 3
...
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.