I have a dataframe that I imported from a csv. It goes something like this:
df
A.1 B.1 A.2 B.2
1 1 1 1
2 2 2 2
My question is, what would be an efficient way to turn this into seperate data frames just comprised of A's and B's
df_a
A.1 A.2
1 1
2 2
df_b
B.1 B.2
1 1
2 2
I am not picky as far as the column names, would be fine with having them just be stripped to 1 and 2 etc but haven't been able to find a good way to do this. I am also open to other/better ways to accomplish what I am trying to do in case this doesn't make sense to someone more knowledgable. Thanks!
You could use df.filter
with regex patterns:
df_a, df_B = df.filter(regex=r'^A'), df.filter(regex=r'^B')
or
df_a, df_B = df.filter(like='A'), df.filter(like='B')
Note that if you use like='A'
then all columns whose name contains 'A'
will be selected. If you use regex=r'^A'
then only those columns whose name begins with an A
will be selected.
In [7]: df
Out[7]:
A.1 B.1 A.2 B.2
0 1 1 1 1
1 2 2 2 2
In [8]: df_a, df_B = df.filter(regex=r'^A'), df.filter(regex=r'^B')
In [9]: df_a
Out[9]:
A.1 A.2
0 1 1
1 2 2
In [10]: df_B
Out[10]:
B.1 B.2
0 1 1
1 2 2
Ok, if I understand correctly you just need N new dataframes according to their column name.
dfa = df[[col for col in df.columns if col.startswith("A")]].copy()
# same for dfb, dfc...
Note that copy()
is required if you wish to later apply changes to that new dataframe dfa
. Otherwise, if I remember correctly, you would be applying changes by pointer, much like when using dictionaries.
To select the columns:
dfa = df[['A.1', 'A.2']]
To change the name of the columns:
dfa.reindex=["a1","a2"]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.