[英]How to concatenate pandas.DataFrames columns
I have a DataFrame called raw_df
: 我有一个称为raw_df
:
columns = ['force0', 'distance0', 'force1', 'distance1']
raw_data = [{'force0': 1.2, 'distance0': 0.0, 'force1': 0.5, 'distance1': 0.0},
{'force0': 1.3, 'distance0': 0.1, 'force1': 0.6, 'distance1': 0.0},
{'force0': 1.4, 'distance0': 0.2, 'force1': 0.7, 'distance1': 0.3},
{'force0': 1.5, 'distance0': 0.5, 'force1': 0.8, 'distance1': 0.6}]
raw_df = pd.DataFrame(raw_data, columns=columns)
raw_df
looks like this: raw_df
看起来像这样:
force0 distance0 force1 distance1
0 1.2 0.0 0.5 0.0
1 1.3 0.1 0.6 0.0
2 1.4 0.2 0.7 0.3
3 1.5 0.5 0.8 0.6
At the moment there is no index but I would like the distance columns to be combined into one index so the columns are then: 目前没有索引,但我希望将distance列合并为一个索引,因此这些列为:
force0 force1
distance
0.0 1.2 0.5
0.0 NaN. 0.6
0.1 1.3 NaN
0.2 1.4 NaN
0.3 NaN 0.7
0.5 1.5 NaN
0.6 NaN 0.8
Note that there were 2 entries in force1 for distance1 = 0.0. 请注意,在force1中,距离1 = 0.0有2个条目。
The index (distances) should NOT be sorted: they increase then decrease variably and the original order for each test is important. 索引(距离)不应排序:它们先升后降,而每个测试的原始顺序很重要。
Stefan posted an amazing answer to my poorly-described question but it seemed to fill in any missing forces with other numbers (which would be misleading because there were no force measurements for those distances in those tests). 斯特凡(Stefan)对我的问题描述得不好的问题发表了一个惊人的答案,但似乎用其他数字填补了所有缺失的力(这会产生误导,因为在那些测试中没有针对这些距离的力测量值)。 I have used np.nan
for missing values as I think this is what pandas
does. 我使用np.nan
来缺少值,因为我认为这是pandas
所做的。
I think that merge
or join
might do what I need but couldn't understand the docs . 我认为merge
或join
可能会满足我的需要,但无法理解文档 。
Perhaps pandas.DataFrame
was not designed for such data, and I should use numpy.genfromtxt
instead and just select the columns I need on the fly: I don't see any advantage to using a pandas.DataFrame
if I'm selecting columns on the fly (because I'm not using an index in that case). 也许pandas.DataFrame
不是为此类数据而设计的,我应该改用numpy.genfromtxt
并随便选择我需要的列:如果我要选择pandas.DataFrame
列,我看不出任何好处飞(因为在这种情况下我不使用索引)。
Thanks for any help. 谢谢你的帮助。
If I'm understanding correctly, you are starting from a situation similar to this: 如果我理解正确,那么您是从类似于以下情况开始的:
columns = list(sum(list(zip(['Forces{}'.format(i) for i in range(4)], ['Distances{}'.format(i) for i in range(4)])), ()))
df = pd.DataFrame(np.random.randint(1, 11, size=(100, 8)), columns=columns)
Forces0 Distances0 Forces1 Distances1 Forces2 Distances2 Forces3 \
0 3 5 8 3 7 4 2
1 1 4 10 9 9 3 6
2 10 3 1 3 3 7 8
3 2 1 3 6 10 10 10
4 4 2 9 1 3 10 8
Distances3
0 8
1 5
2 3
3 8
4 8
and you are aiming to have the various Distance
columns form a single index
while the respective Force
columns remain in place. You could
并且您的目标是让各种Distance
列构成一个index
而相应的Force
columns remain in place. You could
columns remain in place. You could
stack` the frame like so: columns remain in place. You could
像这样堆叠框架:
df.set_index([c for c in df.columns if c.startswith('Force')], inplace=True)
df = df.stack().reset_index(level=-1, drop=True).reset_index().rename(columns={0: 'Distance'})
df.set_index(['Distance'], inplace=True)
to get: 要得到:
Forces0 Forces1 Forces2 Forces3
Distance
9 7 4 6 7
9 7 4 6 7
1 7 4 6 7
6 7 4 6 7
5 1 2 3 1
I solved the problem using a MultiIndex DataFrame : 我使用MultiIndex DataFrame解决了问题:
pd.read_csv()
使用pd.read_csv()
将每个测试读入单独的DataFrame中 df = pd.concat(frame_list, keys=test_names)
使用df = pd.concat(frame_list, keys=test_names)
将DataFrames合并为一个 Rather than write a long description here, I wrote a Jupyter notebook on the subject comparing the MultiIndex method against just keeping a standard Python list of DataFrames. 我没有在这里写详细说明,而是在主题上写了一个Jupyter笔记本 ,将MultiIndex方法与仅保留标准Python DataFrames列表进行了比较。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.