I am creating a Pandas dataframe to learn about handling missing data. What I want is to add the Row and Column names to the DataFrame when creating it, instead of passing them later with 'df.index.name =' and 'df.columns.name ='. How can I do this?
# Program to generate a m x n DataFrame with random NaN values scattered in:
import random
def df_maker(m, n):
df = pd.DataFrame(np.random.randint(1, 100, (m*n)).reshape(m, n), index = [f'Row {i+1}' for i in range(m)], columns = [f'Col {j+1}' for j in range(n)] )
for i in range(m):
df.iloc[[i],[random.randrange(n)]] = np.nan
return df
df = df_maker(10, 10)
df.index.name = 'Rows'
df.columns.name = 'Columns'
df
I tried looking up the doc for pandas.DataFrame , pandas.DataFrame.rename_axis and some other methods, but can't find what i am looking for. So how can I create the above dataframe with 1 line of code, without using df.index.name = 'Rows'
and df.columns.name = 'Columns'
? Thanks.
Create the Index
objects representing the rows and columns separately:
def df_maker(m, n):
index = pd.Index([f'Row {i + 1}' for i in range(m)], name='Rows')
columns = pd.Index([f'Col {i + 1}' for i in range(n)], name='Columns')
df = pd.DataFrame(np.random.randint(1, 100, size=(m, n)), index=index, columns=columns)
# rest of your code here
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.