Python Pandas数据框：使用列中的值创建新列

Question

I've searched several books and sites and I can't find anything that quite matches what I'm trying to do. 我搜索了几本书和网站，但找不到与我要尝试的内容完全匹配的内容。 I would like to create itemized lists from a dataframe and reconfigure the data like so: 我想从一个数据框中创建逐项列出的列表，然后像这样重新配置数据：

      A     B                A     B     C     D  
0     1     aa          0    1     aa  
1     2     bb          1    2     bb  
2     3     bb          2    3     bb    aa  
3     3     aa     --\  3    4     aa    bb    dd  
4     4     aa     --/  4    5     cc  
5     4     bb  
6     4     dd  
7     5     cc

I've experimented with grouping, stacking, unstacking, etc. but nothing that I've attempted has produced the desired result. 我已经尝试过分组，堆叠，拆堆等操作，但是没有任何尝试产生想要的结果。 If it's not obvious, I'm very new to python and a solution would be great but an understanding of the process I need to follow would be perfect. 如果不是很明显，那么我对python还是很陌生，一个解决方案会很棒，但是对我需要遵循的过程的理解将是完美的。

Thanks in advance 提前致谢

Answer 1

Using pandas you can query all results eg where A=4. 使用熊猫，您可以查询所有结果，例如A = 4。

A crude but working method would be to iterate through the various index values and gather all 'like' results into a numpy array and convert this into a new dataframe. 一种粗略但可行的方法是迭代各种索引值，并将所有“ like”结果收集到一个numpy数组中，然后将其转换为新的数据帧。

Pseudo code to demonstrate my example: (will need rewriting to actually work) 伪代码演示我的示例：（将需要重写才能真正起作用）

l= [0]*df['A'].max() 
for item in xrange(df['A'].max() ):
    l[item] = df.loc[df['A'].isin(item)]

df = pd.DataFrame(l)
# or something of the sort

I hope that helps. 希望对您有所帮助。

Update from comments: 评论更新：

animal_list=[]

for animal in ['cat','dog'...]:
    newdf=df[[x.is('%s'%animal) for x in df['A']]]

    body=[animal]    
    for item in newdf['B']
        body.append(item)

    animal_list.append(body)

df=pandas.DataFrame(animal_list)

Answer 2

A quick and dirty method that will work with strings. 一种适用于字符串的快速而肮脏的方法。 Customize the column naming as per needs. 根据需要自定义列命名。

data =  {'A': [1, 2, 3, 3, 4, 4, 4, 5],
         'B': ['aa', 'bb', 'bb', 'aa', 'aa', 'bb', 'dd', 'cc']}
df = pd.DataFrame(data)

maxlen = df.A.value_counts().values[0]  # this helps with creating 
                                    # lists of same size

newdata = {}
for n, gdf in df.groupby('A'):
    newdata[n]= list(gdf.B.values) + [''] * (maxlen - len(gdf.B)) 

# recreate DF with Col 'A' as index; experiment with other orientations
newdf = pd.DataFrame.from_dict(newdict, orient='index') 

# customize this section
newdf.columns = list('BCD')
newdf['A'] = newdf.index
newdf.index = range(len(newdf))
newdf = newdf.reindex_axis(list('ABCD'), axis=1) # to set the desired order

print newdf

The result is: 结果是：

A   B   C   D
0  1  aa        
1  2  bb        
2  3  bb  aa    
3  4  aa  bb  dd
4  5  cc

Python Pandas数据框：使用列中的值创建新列

问题描述

2 个解决方案

解决方案1
0 2015-02-05 15:33:32

解决方案2
0 2015-02-06 18:29:53

Python Pandas数据框：使用列中的值创建新列

问题描述

2 个解决方案

解决方案1 0 2015-02-05 15:33:32

解决方案2 0 2015-02-06 18:29:53

解决方案1
0 2015-02-05 15:33:32

解决方案2
0 2015-02-06 18:29:53