简体   繁体   English

将多个 csv 文件中的字符串沿 x 和 y 轴连接成一个数据帧(在 pandas 中)

[英]Concatenate strings in multiple csv files into one datafram along x and y axis (in pandas)

I have a folder with many csv files.我有一个包含许多 csv 个文件的文件夹。 They all look similar, they all have the same names for columns and rows.它们看起来都很相似,它们的列和行都具有相同的名称。 They all have strings as values in their cells.它们的单元格中都有字符串作为值。 I want to concatenate them along columns AND rows so that all the strings are concatenated into their respective cells.我想沿着列和行连接它们,以便所有字符串都连接到它们各自的单元格中。

Example:例子:

file1.csv

0 0 1 1个 2 2个 3 3个 4 4个
b1 b1 peter彼得 house ash plane飞机
b2 b2 carl卡尔 horse paul保罗 knife
b3 b3 mary玛丽 apple苹果 linda琳达 carrot萝卜
b4 b4 hank汉克 car herb草本植物 beer啤酒

file2.csv

0 0 1 1个 2 2个 3 3个 4 4个
b1 b1 mark标记 green绿色 hello你好 band乐队
b2 b2 no phone电话 spoon勺子 goodbye再见
b3 b3 red红色的 cherry樱桃 charly查理 hammer锤子
b4 b4 good好的 yes是的 ok simon西蒙

What I want is this result with no delimiter between the string values:我想要的是字符串值之间没有分隔符的结果:

concatenated.csv

0 0 1 1个 2 2个 3 3个 4 4个
b1 b1 peter mark彼得马克 house green房子绿色 ash hello灰 你好 plane band平面带
b2 b2 carl no卡尔不 horse phone马电话 paul spoon保罗汤匙 knife goodbye刀再见
b3 b3 mary red玛丽红 apple cherry苹果樱桃 linda charly琳达·查理 carrot hammer胡萝卜锤
b4 b4 hank good谢谢 car yes车是的 herb ok香草还行 beer simon啤酒西蒙

And I don't know how to do this in pandas in a jupyter notebook.而且我不知道如何在 pandas 中在 jupyter 笔记本中执行此操作。

I have tried a couple of things but all of them either kept a seperate set of rows or of columns.我已经尝试了几件事,但所有这些都保留了一组单独的行或列。

If these are your dataframes:如果这些是您的数据框:

df1_data = {
    1 : ['peter', 'carl', 'mary', 'hank'],
    2 : ['house', 'horse','apple', 'car']
}
df1 = pd.DataFrame(df1_data)

df2_data = {
    1 : ['mark', 'no',   'red',   'good'],
    2 : ['green','phone','cherry','yes' ]
}
df2 = pd.DataFrame(df2_data)

df1: df1:

       1      2
0  peter  house
1   carl  horse
2   mary  apple
3   hank    car

df2: df2:

      1       2
0  mark   green
1    no   phone
2   red  cherry
3  good     yes

You can reach your requested dataframe like this:您可以像这样联系您请求的 dataframe:

df = pd.DataFrame()
df[1] = df1[1] + ' ' + df2[1]
df[2] = df1[2] + ' ' + df2[2]
print(df)

Output: Output:

            1             2
0  peter mark   house green
1     carl no   horse phone
2    mary red  apple cherry
3   hank good       car yes

Loop for csv files:循环 csv 个文件:

Now, if you have a lot of csv files with names like file1.csv and file2.csv and so on, you can save them all in d like this:现在,如果你有很多 csv 文件,名称如file1.csvfile2.csv等等,你可以将它们全部保存在d中,如下所示:

d = {}
for i in range(1,#N): 
  d[i] = pd.read_csv('.../file'+str(i)+'.csv')
#N is the number of csv files. (because I started from 1, you have to add 1 to N)

And build the dataframe you want like this:并像这样构建您想要的 dataframe:

concatenated_df = pd.DataFrame()

for i in range(1,#N):
  concatenated_df[i] = d[1].iloc[:,i] + ' ' + d[2].iloc[:,i] + ...
#N is the number of columns here.

If performance is not an issue, you can use pandas.DataFrame.applymap with pandas.Series.add :如果性能不是问题,您可以使用pandas.DataFrame.applymappandas.Series.add

out = df1[[0]].join(df1.iloc[:, 1:].applymap(lambda v: f"{v} ").add(df2.iloc[:, 1:]))

Or, for a large dataset, you can usepandas.concat with a listcomp :或者,对于大型数据集,您可以将pandas.concatlistcomp一起使用:

out = (
        df1[[0]]
            .join(pd.concat([df1.merge(df2, on=0)
                                 .filter(regex=f"{p}_\w").agg(" ".join, axis=1)
                                 .rename(idx) for idx, p in enumerate(range(1, len(df1.columns)), start=1)],
                            axis=1))
     )

Output: Output:

print(out)

    0           1             2             3              4
0  b1  peter mark   house green     ash hello     plane band
1  b2     carl no   horse phone    paul spoon  knife goodbye
2  b3    mary red  apple cherry  linda charly  carrot hammer
3  b4   hank good       car yes       herb ok     beer simon

Reading many csv files into a single DF is a pretty common answer, and is the first part of your question.将许多 csv 个文件读入单个 DF 是一个很常见的答案,并且是您问题的第一部分。 You can find a suitable answer here .您可以在这里找到合适的答案。

After that, in an effort to allow you to perform this on all of the files at the same time, you can melt and pivot with a custom agg function like so:之后,为了让您同时对所有文件执行此操作,您可以使用自定义 agg function melt 和 pivot,如下所示:

import glob import pandas as pd导入 glob 导入 pandas 作为 pd

# See the linked answer if you need help finding csv files in a different directory
all_files = glob.glob('*.csv'))
df = pd.concat((pd.read_csv(f) for f in all_files))


output = df.melt(id_vars='0')
           .pivot_table(index='0', 
                        columns='variable',
                        values='value',
                        aggfunc=lambda x: ' '.join(x))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 pandas 导入多个 csv 文件并连接成一个 DataFrame - How to import multiple csv files and concatenate into one DataFrame using pandas 按创建日期过滤多个 csv 文件并连接成一个 pandas DataFrame - Filtering multiple csv files by creation date and concatenate into one pandas DataFrame 无法将多个 csv 文件导入到 Pandas 中并在 Python 中连接为一个 DataFrame - Failed to import multiple csv files into pandas and concatenate into one DataFrame in Python 不完整 将多个 csv 文件导入 pandas 并拼接成一个 DataFrame - Not full Import multiple csv files into pandas and concatenate into one DataFrame 将多个CSV文件导入pandas并拼接成一个DataFrame - Import multiple CSV files into pandas and concatenate into one DataFrame 将CSV文件与熊猫连接 - Concatenate CSV files with pandas Pandas DataFrame 沿新轴连接 - Pandas DataFrame concatenate along new axis 如何沿 x 轴连接 4 个 numpy 矩阵? - How to concatenate 4 numpy matrices along the x axis? 如何沿 X 轴绘制“_”对象但改变 Y 轴上的值?(在 python、matplotlib、pandas 中) - how to plot “_” objects along the X axis but varying the values ​on the Y axis ?(in python, matplotlib, pandas) Import multiple csv files into pandas and concatenate into one DataFrame where 1st column same in all csv and no headers of data just file name - Import multiple csv files into pandas and concatenate into one DataFrame where 1st column same in all csv and no headers of data just file name
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM