[英]Transform a pandas dataframe in Python
我需要像这样对 dataframe 应用自定义转换:
import pandas as pd
df = pd.DataFrame({
'value': ['a'],
'measure':[['b', 'c']]
})
transformed_df = pd.DataFrame({
'measure': ['b', 'c'],
'value': ['a', 'a']
})
从df
到transformed_df
的有效方法是什么?
df.explode('measure').reset_index(drop=True)
Output:
value measure
0 a b
1 a c
解决该问题的一种方法是将其视为构建 MultiIndex:
value = ['a']
measure = ['b','c']
idx = pd.MultiIndex.from_product([value,measure], names = ['value','measure'])
df = pd.DataFrame(index=idx).reset_index()
其中 df 是:
value measure
0 a b
1 a c
之前没见过explode
方法,好奇的做一些时序测试:
def test_multi(value, measure):
idx = pd.MultiIndex.from_product([value,measure], names = ['value','measure'])
df = pd.DataFrame(index=idx).reset_index()
return df
def test_explode(df):
return df.explode('measure').reset_index(drop=True)
value = ['a']*10000
measure = ['b','c']*10000
%timeit test_multi(value, measure)
#13 s ± 116 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
value = ['a']*10000
measure = [['b','c']]*10000
df = pd.DataFrame({
'value': value,
'measure':measure
})
%timeit test_explode(df)
#16.9 ms ± 199 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.