[英]split each cell in dataframe (pandas/python)
I have a large pandas dataframe consisting of many rows and columns containing binary data like '0|1', '0|0','1|1','1|0' which i would like to split either in 2 dataframes, and/or expand so that this (both are useful to me): 我有一个大型的pandas数据帧,包含许多行和列,包含二进制数据,如'0 | 1','0 | 0','1 | 1','1 | 0',我想在2个数据帧中拆分,和/或扩展以便这个(两者对我都有用):
a b c d
rowa 1|0 0|1 0|1 1|0
rowb 0|1 0|0 0|0 0|1
rowc 0|1 1|0 1|0 0|1
becomes 变
a b c d
rowa1 1 0 0 1
rowa2 0 1 1 0
rowb1 0 0 0 0
rowb2 1 0 0 1
rowc1 0 1 1 0
rowc2 1 0 0 1
and/or 和/或
df1: a b c d
rowa 1 0 0 1
rowb 0 0 0 0
rowc 0 1 1 0
df2: a b c d
rowa 0 1 1 0
rowb 1 0 0 1
rowc 1 0 0 1
currently i'm trying to do something like the following, but believe this is not very effective, any guidance would be helpful. 目前我正在尝试做类似以下的事情,但相信这不是很有效,任何指导都会有所帮助。
Atmp_dict=defaultdict(list)
Btmp_dict=defaultdict(list)
for index,row in df.iterrows():
for columnname in list(df.columns.values):
Atmp_dict[columnname].append(row[columnname].split('|')[0])
Btmp_dict[columnname].append(row[columnname].split('|')[1])
user2734178 is close, but his or her answer has some issues. user2734178已关闭,但他或她的回答有一些问题。 Here is a slight variation that works
这是一个有点微小的变化
import pandas as pd
df1 = pd.DataFrame()
df2 = pd.DataFrame()
# df is your original DataFrame
for col in df.columns:
df1[col] = df[col].apply(lambda x: x.split('|')[0])
df2[col] = df[col].apply(lambda x: x.split('|')[1])
Here is another option that is slightly more elegant. 这是另一个更优雅的选择。 Replace the loop with:
将循环替换为:
for col in df.columns:
df1[col] = df[col].str.extract("(\d)\|")
df2[col] = df[col].str.extract("\|(\d)")
This is pretty compact, but it seems like there should be an even easier and more compact way. 这非常紧凑,但似乎应该有一种更简单,更紧凑的方式。
df1 = df.applymap( lambda x: str(x)[0] )
df2 = df.applymap( lambda x: str(x)[2] )
Or loop over the columns as in the other answers. 或者像其他答案一样循环遍历列。 I don't think it matters.
我认为这不重要。 Note that because the question specified binary data, it is OK (and simpler) to just do
str[0]
and str[2]
rather than using split
or extract
. 请注意,因为问题指定了二进制数据,所以只需执行
str[0]
和str[2]
而不是使用split
或extract
就可以(并且更简单)。
Or you could do this, which seems almost silly, but there's nothing actually wrong with it and it is fairly compact. 或者你可以做到这一点,这看起来几乎是愚蠢的,但它没有任何实际的错误,它相当紧凑。
df1 = df.stack().str[0].unstack()
df2 = df.stack().str[2].unstack()
stack
just converts it to a series so you can use str
and then unstack
converts it back to a dataframe. stack
只是将它转换为一个系列,这样你就可以使用str
然后unstack
将它转换回数据帧。
Since it looks like all of your values are strings, you can use the .str
accessor to split up everything using the pipe as your delimiter, comme ca, 由于看起来你的所有值都是字符串,你可以使用
.str
访问器将管道拆分为分隔符,例如,
import pandas as pd
df1 = pd.DataFrame()
df2 = pd.DataFrame()
#df is defined as in your first example
for col in df.columns:
df1[col] = df[col].str[0]
df2[col] = df[col].str[-1]
You'll then probably want to recast your df1
and df2
as int
columns using astype(int)
. 然后,您可能希望使用
astype(int)
将df1
和df2
重铸为int
列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.