简体   繁体   中英

Pandas dataframe narrow to wide with pivot table no aggregation

I have a pandas dataframe that contains the iris dataset. I want to subset this dataframe to only include sepal_length and species , and then reshape it so that the columns are the unique values for species and the values are the values for that species.

# load data into a dataframe
df = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')

head(df)
+----+---------------+--------------+---------------+--------------+---------+
|    | sepal_length  | sepal_width  | petal_length  | petal_width  | species |
+----+---------------+--------------+---------------+--------------+---------+
| 0  |          5.1  |         3.5  |          1.4  |         0.2  | setosa  |
| 1  |          4.9  |         3.0  |          1.4  |         0.2  | setosa  |
| 2  |          4.7  |         3.2  |          1.3  |         0.2  | setosa  |
| 3  |          4.6  |         3.1  |          1.5  |         0.2  | setosa  |
| 4  |          5.0  |         3.6  |          1.4  |         0.2  | setosa  |
+----+---------------+--------------+---------------+--------------+---------+

I can do this is I take the data out of Pandas as use a dictionary to reshape the data, but I can't figure out how to do it within pandas.

data = df.to_dict('records')

e = {}
for line in data:
    e[line['species']] = []

for line in data:
    e[line['species']].append(line['sepal_length'])

new = pd.DataFrame(e)

This is what I want to end up with:

+----+---------+-------------+-----------+
|    | setosa  | versicolor  | virginica |
+----+---------+-------------+-----------+
| 0  |    5.1  |        7.0  |       6.3 |
| 1  |    4.9  |        6.4  |       5.8 |
| 2  |    4.7  |        6.9  |       7.1 |
| 3  |    4.6  |        5.5  |       6.3 |
| 4  |    5.0  |        6.5  |       6.5 |
+----+---------+-------------+-----------+

I've tried using pd.crosstab(df['sepal_length'], df['species']) but that doesn't get me what I want. I've also tried using df.pivot_table('sepal_length', columns='species') and that also isn't it.

What am I missing here?

IIUC you can use grouby.cumcount on species col and set index, then use pivot instead of pivot_table which does not requires an agg func.

df1 = df.set_index(df.groupby('species').cumcount())

df1 = df1.pivot(columns='species', values='sepal_length').rename_axis(None,axis=1)

print (df1)

   setosa  versicolor  virginica
0     5.1         7.0        6.3
1     4.9         6.4        5.8
2     4.7         6.9        7.1
3     4.6         5.5        6.3
4     5.0         6.5        6.5

What you're trying to do will take a few steps. (The code below assumes use of the standard "Iris dataset" ).

  1. First, let's subset your DataFrame by only the columns we need.

     df_subset = df[['sepal_length','species']] 
  2. Next, use pandas.pivot (intead of pandas.pivot_table ) to convert your DataFrame from "long" to "flat".

     df_pivot = df_subset.pivot(columns='species',values='sepal_length') 
  3. Now, we're close to what you wanted but because your three species columns run along the same index, the pivoted DataFrame returns NaN s for two of the three columns for any given row. We can work around this by column-wise concatenating the DataFrame while re-indexing it. (Essentially creating three DataFrames - one for each species - and joining them along a new index). We can do this one of two ways:

    • The compact solution:

       names = ['setosa','versicolor','virginica'] df_final = pd.concat(map(lambda name: df_pivot[name].dropna().reset_index().drop('index',axis=1), names), axis=1) 
    • Which is equivalent to:

       df_final = pd.concat([ df_pivot['setosa'].dropna().reset_index().drop('index',axis=1), df_pivot['versicolor'].dropna().reset_index().drop('index',axis=1), df_pivot['virginica'].dropna().reset_index().drop('index',axis=1)],axis=1) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM