简体   繁体   English

在dataframe中分割每个单元格(pandas / python)

[英]split each cell in dataframe (pandas/python)

I have a large pandas dataframe consisting of many rows and columns containing binary data like '0|1', '0|0','1|1','1|0' which i would like to split either in 2 dataframes, and/or expand so that this (both are useful to me): 我有一个大型的pandas数据帧,包含许多行和列,包含二进制数据,如'0 | 1','0 | 0','1 | 1','1 | 0',我想在2个数据帧中拆分,和/或扩展以便这个(两者对我都有用):

        a   b   c   d
rowa    1|0 0|1 0|1 1|0
rowb    0|1 0|0 0|0 0|1
rowc    0|1 1|0 1|0 0|1

becomes

        a   b   c   d
rowa1   1   0   0   1
rowa2   0   1   1   0
rowb1   0   0   0   0
rowb2   1   0   0   1
rowc1   0   1   1   0
rowc2   1   0   0   1

and/or 和/或

    df1:    a   b   c   d
    rowa    1   0   0   1
    rowb    0   0   0   0
    rowc    0   1   1   0


    df2:    a   b   c   d
    rowa    0   1   1   0
    rowb    1   0   0   1
    rowc    1   0   0   1

currently i'm trying to do something like the following, but believe this is not very effective, any guidance would be helpful. 目前我正在尝试做类似以下的事情,但相信这不是很有效,任何指导都会有所帮助。

Atmp_dict=defaultdict(list)
Btmp_dict=defaultdict(list)

for index,row in df.iterrows():
    for columnname in list(df.columns.values):
        Atmp_dict[columnname].append(row[columnname].split('|')[0])
        Btmp_dict[columnname].append(row[columnname].split('|')[1])

user2734178 is close, but his or her answer has some issues. user2734178已关闭,但他或她的回答有一些问题。 Here is a slight variation that works 这是一个有点微小的变化

import pandas as pd

df1 = pd.DataFrame()
df2 = pd.DataFrame()

# df is your original DataFrame
for col in df.columns:
    df1[col] = df[col].apply(lambda x: x.split('|')[0])
    df2[col] = df[col].apply(lambda x: x.split('|')[1])

Here is another option that is slightly more elegant. 这是另一个更优雅的选择。 Replace the loop with: 将循环替换为:

for col in df.columns:
    df1[col] = df[col].str.extract("(\d)\|")
    df2[col] = df[col].str.extract("\|(\d)")

This is pretty compact, but it seems like there should be an even easier and more compact way. 这非常紧凑,但似乎应该有一种更简单,更紧凑的方式。

df1 = df.applymap( lambda x: str(x)[0] ) 
df2 = df.applymap( lambda x: str(x)[2] )

Or loop over the columns as in the other answers. 或者像其他答案一样循环遍历列。 I don't think it matters. 我认为这不重要。 Note that because the question specified binary data, it is OK (and simpler) to just do str[0] and str[2] rather than using split or extract . 请注意,因为问题指定了二进制数据,所以只需执行str[0]str[2]而不是使用splitextract就可以(并且更简单)。

Or you could do this, which seems almost silly, but there's nothing actually wrong with it and it is fairly compact. 或者你可以做到这一点,这看起来几乎是愚蠢的,但它没有任何实际的错误,它相当紧凑。

df1 = df.stack().str[0].unstack()
df2 = df.stack().str[2].unstack()

stack just converts it to a series so you can use str and then unstack converts it back to a dataframe. stack只是将它转换为一个系列,这样你就可以使用str然后unstack将它转换回数据帧。

Since it looks like all of your values are strings, you can use the .str accessor to split up everything using the pipe as your delimiter, comme ca, 由于看起来你的所有值都是字符串,你可以使用.str访问器将管道拆分为分隔符,例如,

import pandas as pd

df1 = pd.DataFrame()
df2 = pd.DataFrame()

#df is defined as in your first example
for col in df.columns:
    df1[col] = df[col].str[0]
    df2[col] = df[col].str[-1]

You'll then probably want to recast your df1 and df2 as int columns using astype(int) . 然后,您可能希望使用astype(int)df1df2重铸为int列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM