[英]Concatenate strings in multiple csv files into one datafram along x and y axis (in pandas)
I have a folder with many csv files.我有一个包含许多 csv 个文件的文件夹。 They all look similar, they all have the same names for columns and rows.
它们看起来都很相似,它们的列和行都具有相同的名称。 They all have strings as values in their cells.
它们的单元格中都有字符串作为值。 I want to concatenate them along columns AND rows so that all the strings are concatenated into their respective cells.
我想沿着列和行连接它们,以便所有字符串都连接到它们各自的单元格中。
Example:例子:
file1.csv
0 ![]() |
1 ![]() |
2 ![]() |
3 ![]() |
4 ![]() |
---|---|---|---|---|
b1 ![]() |
peter![]() |
house![]() |
ash![]() |
plane![]() |
b2 ![]() |
carl![]() |
horse![]() |
paul![]() |
knife![]() |
b3 ![]() |
mary![]() |
apple![]() |
linda![]() |
carrot![]() |
b4 ![]() |
hank![]() |
car![]() |
herb![]() |
beer![]() |
file2.csv
0 ![]() |
1 ![]() |
2 ![]() |
3 ![]() |
4 ![]() |
---|---|---|---|---|
b1 ![]() |
mark![]() |
green![]() |
hello![]() |
band![]() |
b2 ![]() |
no![]() |
phone![]() |
spoon![]() |
goodbye![]() |
b3 ![]() |
red![]() |
cherry![]() |
charly![]() |
hammer![]() |
b4 ![]() |
good![]() |
yes![]() |
ok![]() |
simon![]() |
What I want is this result with no delimiter between the string values:我想要的是字符串值之间没有分隔符的结果:
concatenated.csv
0 ![]() |
1 ![]() |
2 ![]() |
3 ![]() |
4 ![]() |
---|---|---|---|---|
b1 ![]() |
peter mark![]() |
house green![]() |
ash hello![]() |
plane band![]() |
b2 ![]() |
carl no![]() |
horse phone![]() |
paul spoon![]() |
knife goodbye![]() |
b3 ![]() |
mary red![]() |
apple cherry![]() |
linda charly![]() |
carrot hammer![]() |
b4 ![]() |
hank good![]() |
car yes![]() |
herb ok![]() |
beer simon![]() |
And I don't know how to do this in pandas in a jupyter notebook.而且我不知道如何在 pandas 中在 jupyter 笔记本中执行此操作。
I have tried a couple of things but all of them either kept a seperate set of rows or of columns.我已经尝试了几件事,但所有这些都保留了一组单独的行或列。
If these are your dataframes:如果这些是您的数据框:
df1_data = {
1 : ['peter', 'carl', 'mary', 'hank'],
2 : ['house', 'horse','apple', 'car']
}
df1 = pd.DataFrame(df1_data)
df2_data = {
1 : ['mark', 'no', 'red', 'good'],
2 : ['green','phone','cherry','yes' ]
}
df2 = pd.DataFrame(df2_data)
df1: df1:
1 2
0 peter house
1 carl horse
2 mary apple
3 hank car
df2: df2:
1 2
0 mark green
1 no phone
2 red cherry
3 good yes
You can reach your requested dataframe like this:您可以像这样联系您请求的 dataframe:
df = pd.DataFrame()
df[1] = df1[1] + ' ' + df2[1]
df[2] = df1[2] + ' ' + df2[2]
print(df)
Output: Output:
1 2
0 peter mark house green
1 carl no horse phone
2 mary red apple cherry
3 hank good car yes
Loop for csv files:循环 csv 个文件:
Now, if you have a lot of csv files with names like file1.csv
and file2.csv
and so on, you can save them all in d
like this:现在,如果你有很多 csv 文件,名称如
file1.csv
和file2.csv
等等,你可以将它们全部保存在d
中,如下所示:
d = {}
for i in range(1,#N):
d[i] = pd.read_csv('.../file'+str(i)+'.csv')
#N is the number of csv files. (because I started from 1, you have to add 1 to N)
And build the dataframe you want like this:并像这样构建您想要的 dataframe:
concatenated_df = pd.DataFrame()
for i in range(1,#N):
concatenated_df[i] = d[1].iloc[:,i] + ' ' + d[2].iloc[:,i] + ...
#N is the number of columns here.
If performance is not an issue, you can use pandas.DataFrame.applymap
with pandas.Series.add
:如果性能不是问题,您可以使用
pandas.DataFrame.applymap
和pandas.Series.add
:
out = df1[[0]].join(df1.iloc[:, 1:].applymap(lambda v: f"{v} ").add(df2.iloc[:, 1:]))
Or, for a large dataset, you can usepandas.concat
with a listcomp :或者,对于大型数据集,您可以将
pandas.concat
与listcomp一起使用:
out = (
df1[[0]]
.join(pd.concat([df1.merge(df2, on=0)
.filter(regex=f"{p}_\w").agg(" ".join, axis=1)
.rename(idx) for idx, p in enumerate(range(1, len(df1.columns)), start=1)],
axis=1))
)
Output: Output:
print(out)
0 1 2 3 4
0 b1 peter mark house green ash hello plane band
1 b2 carl no horse phone paul spoon knife goodbye
2 b3 mary red apple cherry linda charly carrot hammer
3 b4 hank good car yes herb ok beer simon
Reading many csv files into a single DF is a pretty common answer, and is the first part of your question.将许多 csv 个文件读入单个 DF 是一个很常见的答案,并且是您问题的第一部分。 You can find a suitable answer here .
您可以在这里找到合适的答案。
After that, in an effort to allow you to perform this on all of the files at the same time, you can melt and pivot with a custom agg function like so:之后,为了让您同时对所有文件执行此操作,您可以使用自定义 agg function melt 和 pivot,如下所示:
import glob import pandas as pd导入 glob 导入 pandas 作为 pd
# See the linked answer if you need help finding csv files in a different directory
all_files = glob.glob('*.csv'))
df = pd.concat((pd.read_csv(f) for f in all_files))
output = df.melt(id_vars='0')
.pivot_table(index='0',
columns='variable',
values='value',
aggfunc=lambda x: ' '.join(x))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.