繁体   English   中英

在 Python 中转换 pandas dataframe

[英]Transform a pandas dataframe in Python

我需要像这样对 dataframe 应用自定义转换:

import pandas as pd

df = pd.DataFrame({
    'value': ['a'],
    'measure':[['b', 'c']]
})

transformed_df = pd.DataFrame({
    'measure': ['b', 'c'],
    'value': ['a', 'a']
})

dftransformed_df的有效方法是什么?

试试pd.DataFrame.explode

df.explode('measure').reset_index(drop=True)

Output:

  value measure
0     a       b
1     a       c

解决该问题的一种方法是将其视为构建 MultiIndex:

value =  ['a']
measure = ['b','c']
idx = pd.MultiIndex.from_product([value,measure], names = ['value','measure'])

df = pd.DataFrame(index=idx).reset_index()

其中 df 是:

  value measure
0     a       b
1     a       c

之前没见过explode方法,好奇的做一些时序测试:

def test_multi(value, measure):
    idx = pd.MultiIndex.from_product([value,measure], names = ['value','measure'])

    df = pd.DataFrame(index=idx).reset_index()
    
    return df

def test_explode(df):
    return df.explode('measure').reset_index(drop=True)


value =  ['a']*10000
measure = ['b','c']*10000

%timeit test_multi(value, measure)
#13 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

value =  ['a']*10000
measure = [['b','c']]*10000


df = pd.DataFrame({
    'value': value,
    'measure':measure
})

%timeit test_explode(df)
#16.9 ms ± 199 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM