将多个 csv 文件中的字符串沿 x 和 y 轴连接成一个数据帧（在 pandas 中）

Question

I have a folder with many csv files.我有一个包含许多 csv 个文件的文件夹。 They all look similar, they all have the same names for columns and rows.它们看起来都很相似，它们的列和行都具有相同的名称。 They all have strings as values in their cells.它们的单元格中都有字符串作为值。 I want to concatenate them along columns AND rows so that all the strings are concatenated into their respective cells.我想沿着列和行连接它们，以便所有字符串都连接到它们各自的单元格中。

Example:例子：

file1.csv

0 0	1 1个	2 2个	3 3个	4 4个
b1 b1	peter彼得	house屋	ash灰	plane飞机
b2 b2	carl卡尔	horse马	paul保罗	knife刀
b3 b3	mary玛丽	apple苹果	linda琳达	carrot萝卜
b4 b4	hank汉克	car车	herb草本植物	beer啤酒

file2.csv

0 0	1 1个	2 2个	3 3个	4 4个
b1 b1	mark标记	green绿色	hello你好	band乐队
b2 b2	no不	phone电话	spoon勺子	goodbye再见
b3 b3	red红色的	cherry樱桃	charly查理	hammer锤子
b4 b4	good好的	yes是的	ok行	simon西蒙

What I want is this result with no delimiter between the string values:我想要的是字符串值之间没有分隔符的结果：

concatenated.csv

0 0	1 1个	2 2个	3 3个	4 4个
b1 b1	peter mark彼得马克	house green房子绿色	ash hello灰你好	plane band平面带
b2 b2	carl no卡尔不	horse phone马电话	paul spoon保罗汤匙	knife goodbye刀再见
b3 b3	mary red玛丽红	apple cherry苹果樱桃	linda charly琳达·查理	carrot hammer胡萝卜锤
b4 b4	hank good谢谢	car yes车是的	herb ok香草还行	beer simon啤酒西蒙

And I don't know how to do this in pandas in a jupyter notebook.而且我不知道如何在 pandas 中在 jupyter 笔记本中执行此操作。

I have tried a couple of things but all of them either kept a seperate set of rows or of columns.我已经尝试了几件事，但所有这些都保留了一组单独的行或列。

Answer 1

If these are your dataframes:如果这些是您的数据框：

df1_data = {
    1 : ['peter', 'carl', 'mary', 'hank'],
    2 : ['house', 'horse','apple', 'car']
}
df1 = pd.DataFrame(df1_data)

df2_data = {
    1 : ['mark', 'no',   'red',   'good'],
    2 : ['green','phone','cherry','yes' ]
}
df2 = pd.DataFrame(df2_data)

df1: df1:

       1      2
0  peter  house
1   carl  horse
2   mary  apple
3   hank    car

df2: df2:

      1       2
0  mark   green
1    no   phone
2   red  cherry
3  good     yes

You can reach your requested dataframe like this:您可以像这样联系您请求的 dataframe：

df = pd.DataFrame()
df[1] = df1[1] + ' ' + df2[1]
df[2] = df1[2] + ' ' + df2[2]
print(df)

Output: Output：

            1             2
0  peter mark   house green
1     carl no   horse phone
2    mary red  apple cherry
3   hank good       car yes

Loop for csv files:循环 csv 个文件：

Now, if you have a lot of csv files with names like file1.csv and file2.csv and so on, you can save them all in d like this:现在，如果你有很多 csv 文件，名称如file1.csv和file2.csv等等，你可以将它们全部保存在d中，如下所示：

d = {}
for i in range(1,#N): 
  d[i] = pd.read_csv('.../file'+str(i)+'.csv')
#N is the number of csv files. (because I started from 1, you have to add 1 to N)

And build the dataframe you want like this:并像这样构建您想要的 dataframe：

concatenated_df = pd.DataFrame()

for i in range(1,#N):
  concatenated_df[i] = d[1].iloc[:,i] + ' ' + d[2].iloc[:,i] + ...
#N is the number of columns here.

Answer 2

If performance is not an issue, you can use pandas.DataFrame.applymap with pandas.Series.add :如果性能不是问题，您可以使用pandas.DataFrame.applymap和pandas.Series.add ：

out = df1[[0]].join(df1.iloc[:, 1:].applymap(lambda v: f"{v} ").add(df2.iloc[:, 1:]))

Or, for a large dataset, you can usepandas.concat with a listcomp :或者，对于大型数据集，您可以将pandas.concat与listcomp一起使用：

out = (
        df1[[0]]
            .join(pd.concat([df1.merge(df2, on=0)
                                 .filter(regex=f"{p}_\w").agg(" ".join, axis=1)
                                 .rename(idx) for idx, p in enumerate(range(1, len(df1.columns)), start=1)],
                            axis=1))
     )

Output: Output：

print(out)

    0           1             2             3              4
0  b1  peter mark   house green     ash hello     plane band
1  b2     carl no   horse phone    paul spoon  knife goodbye
2  b3    mary red  apple cherry  linda charly  carrot hammer
3  b4   hank good       car yes       herb ok     beer simon

Answer 3

Reading many csv files into a single DF is a pretty common answer, and is the first part of your question.将许多 csv 个文件读入单个 DF 是一个很常见的答案，并且是您问题的第一部分。 You can find a suitable answer here .您可以在这里找到合适的答案。

After that, in an effort to allow you to perform this on all of the files at the same time, you can melt and pivot with a custom agg function like so:之后，为了让您同时对所有文件执行此操作，您可以使用自定义 agg function melt 和 pivot，如下所示：

import glob import pandas as pd导入 glob 导入 pandas 作为 pd

# See the linked answer if you need help finding csv files in a different directory
all_files = glob.glob('*.csv'))
df = pd.concat((pd.read_csv(f) for f in all_files))


output = df.melt(id_vars='0')
           .pivot_table(index='0', 
                        columns='variable',
                        values='value',
                        aggfunc=lambda x: ' '.join(x))

将多个 csv 文件中的字符串沿 x 和 y 轴连接成一个数据帧（在 pandas 中）

问题描述

3 个解决方案

解决方案1
3 2023-01-14 15:28:12

解决方案2
2 2023-01-14 15:49:37

解决方案3
2 2023-01-14 15:52:23

将多个 csv 文件中的字符串沿 x 和 y 轴连接成一个数据帧（在 pandas 中）

问题描述

3 个解决方案

解决方案1 3 2023-01-14 15:28:12

解决方案2 2 2023-01-14 15:49:37

解决方案3 2 2023-01-14 15:52:23

解决方案1
3 2023-01-14 15:28:12

解决方案2
2 2023-01-14 15:49:37

解决方案3
2 2023-01-14 15:52:23