I've been working on this for a while I have dataframe that looks like this
tables columns
tab1 col001
tab1 col002
tab1 col003
tab2 col01
tab2 col02
tab2 col03
the real one has 1500 total tables, some column names are duplicated and the entire thing is 80,000 rows by 2 columns, I am trying to get it formatted like this
tables columns
tab1 col001,col002,col003
tab2 col01,col02,col03
I tried a crosstab like so
cross_table = pd.crosstab(df['tables'],
df['columns']).fillna('n/a')
but that's not exactly what I am going for it ends up with all columns as 1's and 0's and is a large sparse matrix
I also tried this, but the error of allocating 2 GiB makes me think this is incorrect
df.pivot(columns=['tables', 'columns'], values=['columns'])
I also tried pandas melt but that doesn't seem right either
then I tried to cast the columns to a list like so
cols = list(df['columns'].unique())
df['cols'] = df['columns'].str.findall(f'({"|".join(cols)})')
I tried that because it worked before for extracting text, but in a different context, as it is written it just splits each column name into individual characters
df = pd.DataFrame({'tables': {0: 'tab1', 1: 'tab1', 2: 'tab1', 3: 'tab2', 4: 'tab2', 5: 'tab2'},
'columns': {0: 'col001',
1: 'col002',
2: 'col003',
3: 'col01',
4: 'col02',
5: 'col03'}})
groupby
: df = df.groupby('tables').agg(', '.join).reset_index() # Almost same as the answer in the post's comment section via @Psidom
pivot_table
: df = df.pivot_table(index = 'tables', values = 'columns', aggfunc = ', '.join).reset_index()
list comprehension
: df = pd.DataFrame([(i, ', '.join(df[df['tables'] == i]['columns']))
for i in df['tables'].unique()], columns=df.columns)
Set_index/unstack
option: df = df.set_index('tables', append = True).unstack(0).apply(lambda x: ', '.join(x.dropna()), 1).reset_index(name = 'columns')
pd.get_dummies
df = pd.get_dummies(df.tables).mul(df['columns'], 0).agg(', '.join).str.strip(
', ').reset_index(name='columns').rename({'index': 'tables'})
tables columns
0 tab1 col001, col002, col003
1 tab2 col01, col02, col03
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.