如何將熊貓系列的“行”轉換為DataFrame的列？

Question

我有以下熊貓系列，形狀為ser1 （100）。

import pandas as pd
ser1 = pd.Series(...)
print(len(ser1)) 
##  prints (100,)

該系列中每個ndarray的長度為150000，其中每個元素都是一個字符。

len(print(ser1[0]))
##  prints 150000

ser1.head()
sample1       xhtrcuviuvjhgfsrexvuvhfgshgckgvghfsgfdsdsg...
sample2       jhkjhgkjvkjgfjyqerwqrbxcvmkoshfkhgjknlkdfk...
sample3       sdfgfdxcvybnjbvtcyuikjhbgfdftgyhujhghjkhjn...
sample4       bbbbbbadfashdwkjhhguhoadfopnpbfjhsaqeqjtyi...
sample5       gfjyqedxcvrexvuvcvmkoshdftgyhujhgcvmkoshfk...
dtype: object

我想將此pandas系列轉換為pandas DataFrame，以便此pandas系列“行”的每個元素都是一個DataFrame列。 也就是說，該Series數組的每個元素都是一個單獨的列。 在這種情況下， ser1將具有150000列。

print(type(df_ser1)) # DataFrame of ser1
## outputs <class 'pandas.core.frame.DataFrame'>
df_ser1.head()
     samples    char1    char2    char3    char4    char5    char6
0    sample1    x        h        t        r        c        u
1    sample2    j        h        k        j        h        g
2    sample3    s        d        f        g        f        d
3    sample4    b        b        b        b        b        b
........

如何將熊貓系列以這種方式轉換為DataFrame？

最明顯的想法是

df_ser = ser1.to_frame

但這不會將元素分成單獨的Dataframe列：

df_ser = ser1.to_frame
df_ser.head()
                                                       0
sample1       xhtrcuviuvjhgfsrexvuvhfgshgckgvghfsgfdsdsg...
sample2       jhkjhgkjvkjgfjyqerwqrbxcvmkoshfkhgjknlkdfk...
sample3       sdfgfdxcvybnjbvtcyuikjhbgfdftgyhujhghjkhjn...
......

盡管我不確定在計算上如何可行，但還是會以某種方式遍歷“系列行”的每個元素並創建一列。 （不是很pythonic。）

一個人怎么做？

Answer 1

考慮樣本系列ser1

ser1 = pd.Series(
    'abc def ghi'.split(),
    'sample1 sample2 sample3'.split())

將字符串pd.Series字符列表后，請與pd.Series應用。

ser1.apply(lambda x: pd.Series(list(x))) \
    .rename(columns=lambda x: 'char{}'.format(x + 1))

        char1 char2 char3
sample1     a     b     c
sample2     d     e     f
sample3     g     h     i

Answer 2

我的方法是將數據作為numpy數組使用，然后將最終產品存儲在pandas DataFrame中。 但是總的來說，在數據框中創建100k +列似乎很慢。

與piRSquareds解決方案相比，我的並沒有什么更好的選擇，但我認為無論如何我都會發布它，因為這是另一種方法。

樣本數據

import pandas as pd
from timeit import default_timer as timer

# setup some sample data
a = ["c"]
a = a*100
a = [x*10**5 for x in a]
a = pd.Series(a)
print("shape of the series = %s" % a.shape)
print("length of each string in the series = %s" % len(a[0]))

輸出：

shape of the series = 100
length of each string in the series = 100000

解

# get a numpy array representation of the pandas Series
b = a.values
# split each string in the series into a list of individual characters
c = [list(x) for x in b]
# save it as a dataframe
df = pd.DataFrame(c)

運行

piRSquared已經發布了解決方案，因此我應該包括運行時分析。

execTime=[]
start = timer()
# get a numpy array representation of the pandas Series
b = a.values
end = timer()
execTime.append(end-start)

start = timer()
# split each string in the series into a list of individual characters
c = [list(x) for x in b]
end = timer()
execTime.append(end-start)

start = timer()
# save it as a dataframe
df = pd.DataFrame(c)
end = timer()
execTime.append(end-start)

start = timer()
a.apply(lambda x: pd.Series(list(x))).rename(columns=lambda x: 'char{}'.format(x + 1))
end = timer()
execTime.append(end-start)
print("get numpy array                      = %s" % execTime[0])
print("Split each string into chars runtime = %s" % execTime[1])
print("Save 2D list as Dataframe runtime    = %s" % execTime[2])
print("piRSquared's solution runtime        = %s" % execTime[3])

輸出：

get numpy array                      = 7.788003131281585e-06
Split each string into chars runtime = 0.17509693499960122
Save 2D list as Dataframe runtime    = 12.092364584001189
piRSquareds solution runtime         = 13.954442440001003

如何將熊貓系列的“行”轉換為DataFrame的列？

問題描述

2 個解決方案

解決方案1
2 已采納 2017-02-25 06:54:48

解決方案2
2 2017-02-25 07:48:00

樣本數據

解

運行

如何將熊貓系列的“行”轉換為DataFrame的列？

問題描述

2 個解決方案

解決方案1 2 已采納 2017-02-25 06:54:48

解決方案2 2 2017-02-25 07:48:00

樣本數據

解

運行

解決方案1
2 已采納 2017-02-25 06:54:48

解決方案2
2 2017-02-25 07:48:00