简体   繁体   English

Python Pandas中的DataFrame转换

[英]DataFrame transformation in Python Pandas

I am trying to transform a Pandas DataFrame into a new one with every item from a certain column given its own row. 我正在尝试将Pandas DataFrame转换为一个新的Pandas DataFrame,其中每个项目都来自某一列给定自己的行。 For example: 例如:

Before: 之前:

   ID             Name        Date   Location
0   0       John, Dave  01/01/1992     Mexico
1   1              Tim  07/07/1997  Australia
2   2       Mike, John  12/24/2012     Zambia
3   3  Bob, Rick, Tony  05/17/2007       Cuba
4   4            Roger  04/05/2000    Iceland
5   5           Carlos  05/24/1995       Guam

Current Solution: 当前解决方案

new_df = pd.DataFrame(columns = df.columns)
for index,row in df.iterrows():
    new_row = pd.DataFrame(df.loc[index]).transpose()
    target_info = df.loc[index,'Name']
    if (len(target_info.split(',')) > 1):
        for item in target_info.split(','):
            new_row.loc[index,'Name'] = item
           new_df = new_df.append(new_row)
    else:
        new_df = new_df.append(new_row)  

After: 后:

  ID    Name        Date   Location
0  0    John  01/01/1992     Mexico
0  0    Dave  01/01/1992     Mexico
1  1     Tim  07/07/1997  Australia
2  2    Mike  12/24/2012     Zambia
2  2    John  12/24/2012     Zambia
3  3     Bob  05/17/2007       Cuba
3  3    Rick  05/17/2007       Cuba
3  3    Tony  05/17/2007       Cuba
4  4   Roger  04/05/2000    Iceland
5  5  Carlos  05/24/1995       Guam

Surely there is something more elegant? 当然有更优雅的东西?

You could get the split names as a Series, drop your existing Name column, then join the split names. 您可以将拆分名称作为系列,删除现有的名称列,然后加入拆分名称。

# Split the 'Name' column as a Series, setting the appropriate name and index.
split_names = df['Name'].str.split(', ', expand=True).stack()
split_names.name = 'Name'
split_names.index = split_names.index.get_level_values(0)

# Drop the existing 'Name' column, and join the split names.
df.drop('Name', axis=1, inplace=True)
df = df.join(split_names)

The resulting output is the same as in your example, but with the Name column last. 结果输出与示例中的输出相同,但最后是Name列。 You can reorder the columns if you want the original order. 如果您想要原始订单,可以对列重新排序。

   ID        Date   Location    Name
0   0  01/01/1992     Mexico    John
0   0  01/01/1992     Mexico    Dave
1   1  07/07/1997  Australia     Tim
2   2  12/24/2012     Zambia    Mike
2   2  12/24/2012     Zambia    John
3   3  05/17/2007       Cuba     Bob
3   3  05/17/2007       Cuba    Rick
3   3  05/17/2007       Cuba    Tony
4   4  04/05/2000    Iceland   Roger
5   5  05/24/1995       Guam  Carlos

you can do it this way: 你可以这样做:

nm = df.Name.str.split(',\s*', expand=True)
cols=list(set(df.columns) - set(['Name']))

pd.melt(df[cols].join(nm),
        id_vars=cols,
        value_vars=nm.columns.tolist(),
        value_name='Name') \
  .dropna() \
  .drop(['variable'], axis=1) \
  .sort_values('ID')

Step by step: 一步步:

In [128]: nm = df.Name.str.split(',\s*', expand=True)

In [129]: nm
Out[129]:
        0     1     2
0    John  Dave  None
1     Tim  None  None
2    Mike  John  None
3     Bob  Rick  Tony
4   Roger  None  None
5  Carlos  None  None

In [130]: cols=list(set(df.columns) - set(['Name']))

In [131]: cols
Out[131]: ['Date', 'ID', 'Location']

In [133]: pd.melt(df[cols].join(nm),
   .....:         id_vars=cols,
   .....:         value_vars=nm.columns.tolist(),
   .....:         value_name='Name') \
   .....:   .dropna() \
   .....:   .drop(['variable'], axis=1) \
   .....:   .sort_values('ID')
Out[133]:
          Date  ID   Location    Name
0   01/01/1992   0     Mexico    John
6   01/01/1992   0     Mexico    Dave
1   07/07/1997   1  Australia     Tim
2   12/24/2012   2     Zambia    Mike
8   12/24/2012   2     Zambia    John
3   05/17/2007   3       Cuba     Bob
9   05/17/2007   3       Cuba    Rick
15  05/17/2007   3       Cuba    Tony
4   04/05/2000   4    Iceland   Roger
5   05/24/1995   5       Guam  Carlos

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM