在处理熊猫数据框时遇到麻烦

Question

I occasionally am delivered a dataframe with many N/A values. 我偶尔会收到一个包含许多N / A值的数据框。

In these cases, there are reduntant rows. 在这些情况下，会有多余的行。 For every X value there is only one Y value. 对于每个X值，只有一个Y值。 Therefore, I would like to merge the two "example1" rows into 1 row (as shown in the image), by combining the "context" column with measurement column names (M1,M2,..Mn). 因此，我想通过将“上下文”列与测量列名称（M1，M2，.. Mn）合并，将两个“ example1”行合并为1行（如图中所示）。

How might one do this with pandas dataframe functions? 如何用pandas数据框功能做到这一点？

Thanks. 谢谢。

Answer 1

df = pd.DataFrame(
    [
        ['a', .1, np.nan, np.nan, .5],
        ['b', np.nan, .2, .3, .5],
    ],
    ['example1', 'example1'],
    ['context', 'M1', 'M2', 'M3', 'Y']
)

d1 = df.set_index('context', append=True).stack().unstack([1, 2])

d1.columns = d1.columns.map('_'.join)

d1

Answer 2

You could use a join. 您可以使用联接。 It takes in rsuffix and lsuffix parameters, so it would be easier to use those, but if you needed to use a prefix you could change it manually. 它rsuffix和lsuffix参数，因此使用它们会更容易，但是如果需要使用前缀，则可以手动更改它。

Create your DataFrame 创建您的DataFrame

df = pd.DataFrame({'X':['example1', 'example1'], 'context':['a',  'b'], 'M1':[0.1, np.nan], 'M2':[np.nan,0.2], 'M3':[np.nan, 0.3], 'Y':[0.5, 0.5]}, columns=['X', 'context', 'M1', 'M2', 'M3', 'Y'])

Solution 解

dfa = df[df['context'] == 'a'].set_index(['X', 'Y']).drop('context', axis=1)
dfb = df[df['context'] == 'b'].set_index(['X', 'Y']).drop('context', axis=1)

dfa.join(dfb, how='left', lsuffix='_a', rsuffix='_b').dropna(axis=1)

在处理熊猫数据框时遇到麻烦

问题描述

2 个解决方案

解决方案1
1 2016-11-15 20:17:49

解决方案2
1 2016-11-15 20:32:10

在处理熊猫数据框时遇到麻烦

问题描述

2 个解决方案

解决方案1 1 2016-11-15 20:17:49

解决方案2 1 2016-11-15 20:32:10

解决方案1
1 2016-11-15 20:17:49

解决方案2
1 2016-11-15 20:32:10